Assessing the Performance of AI-Generated Code: A Case Study on GitHub Copilot
The integration of Large Language Models (LLMs) into software development tools like GitHub Copilot holds the promise of transforming code generation processes. While AI-driven code generation presents numerous advantages for software development, code generated by large language models may introduc...
Uloženo v:
| Vydáno v: | Proceedings - International Symposium on Software Reliability Engineering s. 216 - 227 |
|---|---|
| Hlavní autoři: | , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
28.10.2024
|
| Témata: | |
| ISSN: | 2332-6549 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | The integration of Large Language Models (LLMs) into software development tools like GitHub Copilot holds the promise of transforming code generation processes. While AI-driven code generation presents numerous advantages for software development, code generated by large language models may introduce challenges related to security, privacy, and copyright issues. However, the performance implications of AI-generated code remain insufficiently explored. This study conducts an empirical analysis focusing on the performance regressions of code generated by GitHub Copilot across three distinct datasets: HumanEval, AixBench, and MBPP. We adopt a comprehensive methodology encompassing static and dynamic performance analyses to assess the effectiveness of the generated code. Our findings reveal that although the generated code is functionally correct, it frequently exhibits performance regressions compared to code solutions crafted by humans. We further investigate the code-level root causes responsible for these performance regressions. We identify four major root causes, i.e., inefficient function calls, inefficient looping, inefficient algorithm, and inefficient use of language features. We further identify a total of ten sub-categories of root causes attributed to the performance regressions of generated code. Additionally, we explore prompt engineering as a potential strategy for optimizing performance. The outcomes suggest that meticulous prompt designs can enhance the performance of AI-generated code. This research offers valuable insights contributing to a more comprehensive understanding of AI-assisted code generation. |
|---|---|
| ISSN: | 2332-6549 |
| DOI: | 10.1109/ISSRE62328.2024.00030 |