An Unsupervised Software Fault Prediction Approach Using Threshold Derivation

Software fault prediction models help the software quality assurance team to manage the resources optimally during software maintenance. Most of the recently proposed fault prediction approaches are helpful on labeled datasets only. Recently, several threshold-based software fault prediction approac...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on reliability Vol. 71; no. 2; pp. 911 - 932
Main Authors:	Kumar, Rakesh, Chaturvedi, Amrita, Kailasam, Lakshmanan
Format:	Journal Article
Language:	English
Published:	New York IEEE 01.06.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Algorithms Clustering Codes Correlation coefficients Datasets Derivation Fault diagnosis Labeling Measurement Metric threshold post-hoc Nemenyi test Prediction models Predictive models Quality assurance random forest (RF) Software software fault prediction Software metrics Training unsupervised learning
ISSN:	0018-9529, 1558-1721
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Software fault prediction models help the software quality assurance team to manage the resources optimally during software maintenance. Most of the recently proposed fault prediction approaches are helpful on labeled datasets only. Recently, several threshold-based software fault prediction approaches have been proposed. However, these approaches do not incorporate the distribution of software metrics for metric threshold derivation; hence, they demonstrate poor performance. To fill this gap, we develop an automated fault prediction approach, namely threshold clustering labeling/threshold clustering labeling plus (TCLP), which does not need a labeled dataset. It can identify the faulty and nonfaulty artifacts on unlabeled datasets by self-learning. Our proposed approach is an extension of the state-of-the-art technique known as CLAMI. Unlike CLAMI, we derive the metrics threshold using logarithmic transformation. Thereafter, we label the instances into binary classes (faulty/nonfaulty) using the metric threshold values. TCLP extends this approach one step further by performing fault prediction using a random forest algorithm. The empirical evaluation of the proposed approach on 28 datasets (with the different number of metrics and granularity) collected from five software groups shows that the proposed unsupervised method obtains significantly better results than those of the state-of-the-art methods. The proposed approach impressively enhances the performance of CLAMI in terms of accuracy, F-measure, and Mathew's correlation coefficient.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0018-9529 1558-1721
DOI:	10.1109/TR.2022.3151125