Comparing Large Language Models and Human Programmers for Generating Programming Code

The performance of seven large language models (LLMs) in generating programming code using various prompt strategies, programming languages, and task difficulties is systematically evaluated. GPT‐4 substantially outperforms other LLMs, including Gemini Ultra and Claude 2. The coding performance of G...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Advanced science Jg. 12; H. 8; S. e2412279 - n/a
Hauptverfasser:	Hou, Wenpin, Ji, Zhicheng
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Germany John Wiley & Sons, Inc 01.02.2025 John Wiley and Sons Inc Wiley
Schlagworte:	artificial intelligence Computer programming Datasets Design Feedback Humans human‐computer interaction Large Language Models Performance evaluation Programming Languages Python Software Software development large language models human‐computer interaction computer programming artificial intelligence
ISSN:	2198-3844, 2198-3844
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The performance of seven large language models (LLMs) in generating programming code using various prompt strategies, programming languages, and task difficulties is systematically evaluated. GPT‐4 substantially outperforms other LLMs, including Gemini Ultra and Claude 2. The coding performance of GPT‐4 varies considerably with different prompt strategies. In most LeetCode and GeeksforGeeks coding contests evaluated in this study, GPT‐4, employing the optimal prompt strategy, outperforms 85 percent of human participants in a competitive environment, many of whom are students and professionals with moderate programming experience. GPT‐4 demonstrates strong capabilities in translating code between different programming languages and in learning from past errors. The computational efficiency of the code generated by GPT‐4 is comparable to that of human programmers. GPT‐4 is also capable of handling broader programming tasks, including front‐end design and database operations. These results suggest that GPT‐4 has the potential to serve as a reliable assistant in programming code generation and software development. A programming assistant is designed based on an optimal prompt strategy to facilitate the practical use of LLMs for programming. The performance of seven large language models is evaluated for programming code generation across various prompts, languages, and task difficulties. GPT‐4 consistently outperforms other models and excels in tasks such as code translation, error learning, and efficient code generation. It surpasses 85% of human participants in most LeetCode and GeeksforGeeks contests when using the optimal prompt strategy.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2198-3844 2198-3844
DOI:	10.1002/advs.202412279