Automatic Program Repair: A Comparative Study of LLMs on QuixBugs

Saved in:
Bibliographic Details
Title: Automatic Program Repair: A Comparative Study of LLMs on QuixBugs
Authors: Poonam Ponde
Source: International Journal of Intelligent Systems and Applications in Engineering; Vol. 12 No. 23s (2024); 3381 – 3386
Publisher Information: International Journal of Intelligent Systems and Applications in Engineering, 2024.
Publication Year: 2024
Subject Terms: Bugs, Debugging, Automatic Program Repair, ChatGPT, Gemini
Description: Software bugs are errors or flaws in a program's code that can lead to incorrect or unexpected behavior, making their detection and resolution crucial for reliable and secure software development. Debugging is a human-centric, time-consuming and resource-intensive process, making it one of the most expensive phases in software development. Automatic Program Repair (APR) is an emerging area of research that aims to automatically fix software bugs with minimal human intervention. Traditional APR tools use search-based or learning-based techniques to find software bugs based on test suites and bug patterns, thereby having heavy reliance on test cases. AI-driven APR tools are trained on large-scale codebases, open-source bug-fix histories, and benchmarks like QuixBugs. They can analyze buggy code, fix bugs and generate code patches that are syntactically and semantically correct. This reduces the debugging time and improves software reliability The QuixBugs benchmark has 40 programs from the Quixey Challenge in two languages: Python and Java. Each program contains a one-line defect and failing testcases. This paper presents a comparative study of APR techniques on the QuixBugs benchmark, which includes 40 buggy programs in both Python and Java. This study evaluates and compares the automatic bug fixing capability of LLMs such as ChatGPT and Google Gemini on the QuixBugs benchmark, thereby contributing to the understanding of LLMs’ role in automatic program repair.
Document Type: Article
File Description: application/pdf
Language: English
ISSN: 2147-6799
Access URL: https://www.ijisae.org/index.php/IJISAE/article/view/7707
Rights: CC BY SA
Accession Number: edsair.issn21476799..6f1da3e1dfe76c46fc520e9a9ad91fdc
Database: OpenAIRE
Description
Abstract:Software bugs are errors or flaws in a program's code that can lead to incorrect or unexpected behavior, making their detection and resolution crucial for reliable and secure software development. Debugging is a human-centric, time-consuming and resource-intensive process, making it one of the most expensive phases in software development. Automatic Program Repair (APR) is an emerging area of research that aims to automatically fix software bugs with minimal human intervention. Traditional APR tools use search-based or learning-based techniques to find software bugs based on test suites and bug patterns, thereby having heavy reliance on test cases. AI-driven APR tools are trained on large-scale codebases, open-source bug-fix histories, and benchmarks like QuixBugs. They can analyze buggy code, fix bugs and generate code patches that are syntactically and semantically correct. This reduces the debugging time and improves software reliability The QuixBugs benchmark has 40 programs from the Quixey Challenge in two languages: Python and Java. Each program contains a one-line defect and failing testcases. This paper presents a comparative study of APR techniques on the QuixBugs benchmark, which includes 40 buggy programs in both Python and Java. This study evaluates and compares the automatic bug fixing capability of LLMs such as ChatGPT and Google Gemini on the QuixBugs benchmark, thereby contributing to the understanding of LLMs’ role in automatic program repair.
ISSN:21476799