Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning

Software defect prediction (SDP) plays an important role in allocating testing resources reasonably, reducing testing costs, and ensuring software quality. However, software metrics used for SDP are almost entirely traditional features compared with deep representations (DPs) from deep learning. Alt...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Information and software technology Ročník 96; s. 94 - 111
Hlavní autori: Tong, Haonan, Liu, Bin, Wang, Shihai
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Elsevier B.V 01.04.2018
Predmet:
ISSN:0950-5849, 1873-6025
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Software defect prediction (SDP) plays an important role in allocating testing resources reasonably, reducing testing costs, and ensuring software quality. However, software metrics used for SDP are almost entirely traditional features compared with deep representations (DPs) from deep learning. Although stacked denoising autoencoders (SDAEs) are powerful for feature learning and have been successfully applied in other fields, to the best of our knowledge, it has not been investigated in the field of SDP. Meanwhile, class-imbalance is still a pressing problem needing to be addressed. In this paper, we propose a novel SDP approach, SDAEsTSE, which takes advantages of SDAEs and ensemble learning, namely the proposed two-stage ensemble (TSE). Our method mainly includes two phases: the deep learning phase and two-stage ensemble (TSE) phase. We first use SDAEs to extract the DPs from the traditional software metrics, and then a novel ensemble learning approach, TSE, is proposed to address the class-imbalance problem. Experiments are performed on 12 NASA datasets to demonstrate the effectiveness of DPs, the proposed TSE, and SDAEsTSE, respectively. The performance is evaluated in terms of F-measure, the area under the curve (AUC), and Matthews correlation coefficient (MCC). Generally, DPs, TSE, and SDAEsTSE contribute to significantly higher performance compared with corresponding traditional metrics, classic ensemble methods, and benchmark SDP models. It can be concluded that (1) deep representations are promising for SDP compared with traditional software metrics, (2) TSE is more effective for addressing the class-imbalance problem in SDP compared with classic ensemble learning methods, and (3) the proposed SDAEsTSE is significantly effective for SDP.
ISSN:0950-5849
1873-6025
DOI:10.1016/j.infsof.2017.11.008