Classifying Software Changes: Clean or Buggy?

This paper introduces a new technique for finding latent software bugs called change classification. Change classification uses a machine learning classifier to determine whether a new software change is more similar to prior buggy changes, or clean changes. In this manner, change classification pre...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on software engineering Vol. 34; no. 2; pp. 181 - 196
Main Authors: Kim, Sunghun, Whitehead, E. James, Zhang, Yi
Format: Journal Article
Language:English
Published: New York IEEE 01.03.2008
IEEE Computer Society
Subjects:
ISSN:0098-5589, 1939-3520
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper introduces a new technique for finding latent software bugs called change classification. Change classification uses a machine learning classifier to determine whether a new software change is more similar to prior buggy changes, or clean changes. In this manner, change classification predicts the existence of bugs in software changes. The classifier is trained using features (in the machine learning sense) extracted from the revision history of a software project, as stored in its software configuration management repository. The trained classifier can classify changes as buggy or clean with 78% accuracy and 65% buggy change recall (on average). Change classification has several desirable qualities: (1) the prediction granularity is small (a change to a single file), (2) predictions do not require semantic information about the source code, (3) the technique works for a broad array of project types and programming languages, and (4) predictions can be made immediately upon completion of a change. Contributions of the paper include a description of the change classification approach, techniques for extracting features from source code and change histories, a characterization of the performance of change classification across 12 open source projects, and evaluation of the predictive power of different groups of features.
Bibliography:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-2
content type line 23
ISSN:0098-5589
1939-3520
DOI:10.1109/TSE.2007.70773