Zobraziť v EDS

The Patch Overfitting Problem in Automated Program Repair: Practical Magnitude and a Baseline for Realistic Benchmarking

Uložené v:

Podrobná bibliografia
Názov:	The Patch Overfitting Problem in Automated Program Repair: Practical Magnitude and a Baseline for Realistic Benchmarking
Autori:	Justyna Petke, Matias Martinez, Maria Kechagia, Aldeida Aleti, Federica Sarro
Zdroj:	UPCommons. Portal del coneixement obert de la UPC Universitat Politècnica de Catalunya (UPC) Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering
Informácie o vydavateľovi:	ACM, 2024.
Rok vydania:	2024
Predmety:	Automation, Evaluation strategies, Àrees temàtiques de la UPC::Informàtica::Enginyeria del software, Empirical studies, Over fitting problem, Repair techniques, Automated program repair, Patch assessment, Program debugging, Software testing, Software bug
Popis:	Automated program repair techniques aim to generate patches for software bugs, mainly relying on testing to check their validity. The generation of a large number of such plausible yet incorrect patches is widely believed to hinder wider application of APR in practice, which has motivated research in automated patch assessment. We reflect on the validity of this motivation and carry out an empirical study to analyse the extent to which 10 APR tools suffer from the overfitting problem in practice. We observe that the number of plausible patches generated by any of the APR tools analysed for a given bug from the Defects4J dataset is remarkably low, a median of 2, indicating that a developer only needs to consider 2 patches in most cases to be confident to find a fix or confirming its nonexistence. This study unveils that the overfitting problem might not be as bad as previously thought. We reflect on current evaluation strategies of automated patch assessment techniques and propose a Random Selection baseline to assess whether and when using such techniques is beneficial for reducing human effort. We advocate future work should evaluate the benefit arising from patch overfitting assessment usage against the random baseline.
Druh dokumentu:	Article Conference object
Popis súboru:	application/pdf
DOI:	10.1145/3663529.3663776
Prístupová URL adresa:	https://hdl.handle.net/2117/421157 https://doi.org/10.1145/3663529.3663776
Rights:	CC BY
Prístupové číslo:	edsair.doi.dedup.....e508dc4cbcc5489c7dc1b317c83cca7d
Databáza:	OpenAIRE

View record at OpenAIRE

Nájsť tento článok vo Web of Science

Popis
Abstrakt:	Automated program repair techniques aim to generate patches for software bugs, mainly relying on testing to check their validity. The generation of a large number of such plausible yet incorrect patches is widely believed to hinder wider application of APR in practice, which has motivated research in automated patch assessment. We reflect on the validity of this motivation and carry out an empirical study to analyse the extent to which 10 APR tools suffer from the overfitting problem in practice. We observe that the number of plausible patches generated by any of the APR tools analysed for a given bug from the Defects4J dataset is remarkably low, a median of 2, indicating that a developer only needs to consider 2 patches in most cases to be confident to find a fix or confirming its nonexistence. This study unveils that the overfitting problem might not be as bad as previously thought. We reflect on current evaluation strategies of automated patch assessment techniques and propose a Random Selection baseline to assess whether and when using such techniques is beneficial for reducing human effort. We advocate future work should evaluate the benefit arising from patch overfitting assessment usage against the random baseline.
DOI:	10.1145/3663529.3663776