Refactoring Programs Using Large Language Models with Few-Shot Examples

A less complex and more straightforward program is a crucial factor that enhances its maintainability and makes writing secure and bug-free programs easier. However, due to its heavy workload and the risks of breaking the working programs, programmers are reluctant to do code refactoring, and thus,...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings / Asia Pacific Software Engineering Conference s. 151 - 160
Hlavní autoři:	Shirafuji, Atsushi, Oda, Yusuke, Suzuki, Jun, Morishita, Makoto, Watanobe, Yutaka
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 04.12.2023
Témata:	code refactoring Codes Complexity theory Education few-shot prompting Filtering large language models Measurement Programming programming education software complexity Writing
ISSN:	2640-0715
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	A less complex and more straightforward program is a crucial factor that enhances its maintainability and makes writing secure and bug-free programs easier. However, due to its heavy workload and the risks of breaking the working programs, programmers are reluctant to do code refactoring, and thus, it also causes the loss of potential learning experiences. To mitigate this, we demonstrate the application of using a large language model (LLM), GPT-3.5, to suggest less complex versions of the user-written Python program, aiming to encourage users to learn how to write better programs. We propose a method to leverage the prompting with few-shot examples of the LLM by selecting the best-suited code refactoring examples for each target programming problem based on the prior evaluation of prompting with the one-shot example. The quantitative evaluation shows that 95.68% of programs can be refactored by generating 10 candidates each, resulting in a 17.35% reduction in the average cyclomatic complexity and a 25.84% decrease in the average number of lines after filtering only generated programs that are semantically correct. Further-more, the qualitative evaluation shows outstanding capability in code formatting, while unnecessary behaviors such as deleting or translating comments are also observed.
ISSN:	2640-0715
DOI:	10.1109/APSEC60848.2023.00025