Comparing the effectiveness of several modeling methods for fault prediction

We compare the effectiveness of four modeling methods—negative binomial regression, recursive partitioning, random forests and Bayesian additive regression trees—for predicting the files likely to contain the most faults for 28 to 35 releases of three large industrial software systems. Predictor var...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Empirical software engineering : an international journal Ročník 15; číslo 3; s. 277 - 295
Hlavní autoři: Weyuker, Elaine J., Ostrand, Thomas J., Bell, Robert M.
Médium: Journal Article
Jazyk:angličtina
Vydáno: Boston Springer US 01.06.2010
Springer Nature B.V
Témata:
ISSN:1382-3256, 1573-7616, 1573-7616
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:We compare the effectiveness of four modeling methods—negative binomial regression, recursive partitioning, random forests and Bayesian additive regression trees—for predicting the files likely to contain the most faults for 28 to 35 releases of three large industrial software systems. Predictor variables included lines of code, file age, faults in the previous release, changes in the previous two releases, and programming language. To compare the effectiveness of the different models, we use two metrics—the percent of faults contained in the top 20% of files identified by the model, and a new, more general metric, the fault-percentile-average . The negative binomial regression and random forests models performed significantly better than recursive partitioning and Bayesian additive regression trees, as assessed by either of the metrics. For each of the three systems, the negative binomial and random forests models identified 20% of the files in each release that contained an average of 76% to 94% of the faults.
Bibliografie:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
ISSN:1382-3256
1573-7616
1573-7616
DOI:10.1007/s10664-009-9111-2