Towards Understanding the Characteristics of Code Generation Errors Made by Large Language Models

Large Language Models (LLMs) have demonstrated unprecedented capabilities in code generation. However, there remains a limited understanding of code generation errors that LLMs can produce. To bridge the gap, we conducted an in-depth analysis of code generation errors across six representative LLMs...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Proceedings / International Conference on Software Engineering s. 2587 - 2599
Hlavní autori: Wang, Zhijie, Zhou, Zijie, Song, Da, Huang, Yuheng, Chen, Shengmai, Ma, Lei, Zhang, Tianyi
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 26.04.2025
Predmet:
ISSN:1558-1225
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Large Language Models (LLMs) have demonstrated unprecedented capabilities in code generation. However, there remains a limited understanding of code generation errors that LLMs can produce. To bridge the gap, we conducted an in-depth analysis of code generation errors across six representative LLMs on the HumanEval dataset. Specifically, we first employed open coding and thematic analysis to distill a comprehensive taxonomy of code generation errors. We analyzed two dimensions of error characteristics-semantic characteristics and syntactic characteristics. Our analysis revealed that LLMs often made non-trivial, multi-line code generation errors in various locations and with various root causes. We further analyzed the correlation between these errors and task complexity as well as test pass rate. Our findings highlighted several challenges in locating and fixing code generation errors made by LLMs. In the end, we discussed several future directions to address these challenges.
ISSN:1558-1225
DOI:10.1109/ICSE55347.2025.00180