Comparing Large Language Models and Human Programmers for Generating Programming Code
The performance of seven large language models (LLMs) in generating programming code using various prompt strategies, programming languages, and task difficulties is systematically evaluated. GPT‐4 substantially outperforms other LLMs, including Gemini Ultra and Claude 2. The coding performance of G...
Uloženo v:
| Vydáno v: | Advanced science Ročník 12; číslo 8; s. e2412279 - n/a |
|---|---|
| Hlavní autoři: | , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Germany
John Wiley & Sons, Inc
01.02.2025
John Wiley and Sons Inc Wiley |
| Témata: | |
| ISSN: | 2198-3844, 2198-3844 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | The performance of seven large language models (LLMs) in generating programming code using various prompt strategies, programming languages, and task difficulties is systematically evaluated. GPT‐4 substantially outperforms other LLMs, including Gemini Ultra and Claude 2. The coding performance of GPT‐4 varies considerably with different prompt strategies. In most LeetCode and GeeksforGeeks coding contests evaluated in this study, GPT‐4, employing the optimal prompt strategy, outperforms 85 percent of human participants in a competitive environment, many of whom are students and professionals with moderate programming experience. GPT‐4 demonstrates strong capabilities in translating code between different programming languages and in learning from past errors. The computational efficiency of the code generated by GPT‐4 is comparable to that of human programmers. GPT‐4 is also capable of handling broader programming tasks, including front‐end design and database operations. These results suggest that GPT‐4 has the potential to serve as a reliable assistant in programming code generation and software development. A programming assistant is designed based on an optimal prompt strategy to facilitate the practical use of LLMs for programming.
The performance of seven large language models is evaluated for programming code generation across various prompts, languages, and task difficulties. GPT‐4 consistently outperforms other models and excels in tasks such as code translation, error learning, and efficient code generation. It surpasses 85% of human participants in most LeetCode and GeeksforGeeks contests when using the optimal prompt strategy. |
|---|---|
| Bibliografie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ISSN: | 2198-3844 2198-3844 |
| DOI: | 10.1002/advs.202412279 |