The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs

The field of automated software repair lacks a set of common benchmark problems. Although benchmark sets are used widely throughout computer science, existing benchmarks are not easily adapted to the problem of automatic defect repair, which has several special requirements. Most important of these...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE transactions on software engineering Ročník 41; číslo 12; s. 1236 - 1256
Hlavní autoři:	Le Goues, Claire, Holtschulte, Neal, Smith, Edward K., Brun, Yuriy, Devanbu, Premkumar, Forrest, Stephanie, Weimer, Westley
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	New York IEEE 01.12.2015 IEEE Computer Society
Témata:	<sc xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">IntroClass <sc xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">ManyBugs Algorithms Automated program repair Automation benchmark Benchmark testing Benchmarks C (programming language) C language Categories Computer bugs Computer programs Computer science Datasets Debugging Defects Electronic mail Maintenance engineering Repair Reproducibility Software Software systems Studies subject defect IntroClass Automated program repair ManyBugs reproducibility benchmark subject defect
ISSN:	0098-5589, 1939-3520
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	The field of automated software repair lacks a set of common benchmark problems. Although benchmark sets are used widely throughout computer science, existing benchmarks are not easily adapted to the problem of automatic defect repair, which has several special requirements. Most important of these is the need for benchmark programs with reproducible, important defects and a deterministic method for assessing if those defects have been repaired. This article details the need for a new set of benchmarks, outlines requirements, and then presents two datasets, ManyBugs and IntroClass, consisting between them of 1,183 defects in 15 C programs. Each dataset is designed to support the comparative evaluation of automatic repair algorithms asking a variety of experimental questions. The datasets have empirically defined guarantees of reproducibility and benchmark quality, and each study object is categorized to facilitate qualitative evaluation and comparisons by category of bug or program. The article presents baseline experimental results on both datasets for three existing repair methods, GenProg, AE, and TrpAutoRepair, to reduce the burden on researchers who adopt these datasets for their own comparative evaluations.
Bibliografie:	SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23
ISSN:	0098-5589 1939-3520
DOI:	10.1109/TSE.2015.2454513