ConCPDP: A Cross‐Project Defect Prediction Method Integrating Contrastive Pretraining and Category Boundary Adjustment.

Uložené v:
Podrobná bibliografia
Názov: ConCPDP: A Cross‐Project Defect Prediction Method Integrating Contrastive Pretraining and Category Boundary Adjustment.
Autori: Song, Hengjie, Pan, Yufei, Guo, Feng, Zhang, Xue, Ma, Le, Jiang, Siyu, Galli, Antonio
Zdroj: IET Software (Wiley-Blackwell); 11/13/2024, Vol. 2024, p1-19, 19p
Predmety: DATA augmentation, FEATURE extraction, COMPUTER software testing, STATISTICAL correlation, PREDICTION models, DEEP learning
Abstrakt: Software defect prediction (SDP) is a crucial phase preceding the launch of software products. Cross‐project defect prediction (CPDP) is introduced for the anticipation of defects in novel projects lacking defect labels. CPDP can use defect information of mature projects to speed up defect prediction for new projects. So that developers can quickly get the defect information of the new project, so that they can test the software project pertinently. At present, the predominant approaches in CPDP rely on deep learning, and the performance of the ultimate model is notably affected by the quality of the training dataset. However, the dataset of CPDP not only has few samples but also has almost no label information in new projects, which makes the general deep‐learning‐based CPDP model not ideal. In addition, most of the current CPDP models do not fully consider the enrichment of classification boundary samples after cross‐domain, leading to suboptimal predictive capabilities of the model. To overcome these obstacles, we present contrastive learning pretraining for CPDP (ConCPDP), a CPDP method integrating contrastive pretraining and category boundary adjustment. We first perform data augmentation on the source and target domain code files and then extract the enhanced data as an abstract syntax tree (AST). The AST is then transformed into an integer sequence using specific mapping rules, serving as input for the subsequent neural network. A neural network based on bidirectional long short‐term memory (Bi‐LSTM) will receive an integer sequence and output a feature vector. Then, the feature vectors are input into the contrastive module to optimise the feature extraction network. The pretrained feature extractor can be fine‐tuned by the maximum mean discrepancy (MMD) between the feature distribution of the source domain and the target domain and the binary classification loss on the source domain. This paper conducts a large number of experiments on the PROMISE dataset, which is commonly used for CPDP, to validate ConCPDP's efficacy, achieving superior results in terms of F1 measure, area under curve (AUC), and Matthew's correlation coefficient (MCC). [ABSTRACT FROM AUTHOR]
Copyright of IET Software (Wiley-Blackwell) is the property of Wiley-Blackwell and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Databáza: Complementary Index
Buďte prvý, kto okomentuje tento záznam!
Najprv sa musíte prihlásiť.