Who is the better AI software engineer? A comparative study of AI models in coding tasks.

Saved in:
Bibliographic Details
Title: Who is the better AI software engineer? A comparative study of AI models in coding tasks.
Authors: Zhijun, Peng1,2 (AUTHOR) 20035699@qq.com, Daud, Siti Norbaya2 (AUTHOR) norbayadaud@segi.edu.my, Yang, Yanyan2 (AUTHOR) hnysyanyan@163.com
Source: AIP Conference Proceedings. 2025, Vol. 3324 Issue 1, p1-11. 11p.
Subject Terms: *COMPUTER software development, *SOFTWARE engineers, *MACHINE learning, *COMPUTER performance, *COMPUTER science education, *HUMAN-artificial intelligence interaction, *COMPUTATIONAL complexity
Abstract: This study evaluates the programming capabilities of five advanced AI models through competitive programming tasks and open-ended software development challenges. The research assesses their performance on 15 problems from the 2023 Chinese Collegiate Programming Contest and two practical tasks: developing a tile-flipping game and a Reversi game with AI opponent functionality. The evaluation framework combines automated judging for competitive tasks with functional suitability assessment and static code quality analysis for open-ended tasks. Results indicate that top models like Claude 3.5 Sonnet and ChatGPT-4o approach human-level performance in competitive programming, while demonstrating strong capabilities in practical application development. However, performance gaps emerge in complex, open-ended tasks. The study reveals a clear hierarchy among AI models, with the top performers consistently outperforming others across various tasks. The research also highlights the impact of task complexity on AI performance, with some models struggling significantly as complexity increases. This comprehensive evaluation provides insights into the current state of AI programming abilities, highlighting areas for improvement and raising questions about the future role of AI in software development and computer science education. The authors suggest future research directions, including expanding the scope of evaluations and exploring human-AI collaboration in real-world programming contexts. [ABSTRACT FROM AUTHOR]
Database: Academic Search Index
Description
Abstract:This study evaluates the programming capabilities of five advanced AI models through competitive programming tasks and open-ended software development challenges. The research assesses their performance on 15 problems from the 2023 Chinese Collegiate Programming Contest and two practical tasks: developing a tile-flipping game and a Reversi game with AI opponent functionality. The evaluation framework combines automated judging for competitive tasks with functional suitability assessment and static code quality analysis for open-ended tasks. Results indicate that top models like Claude 3.5 Sonnet and ChatGPT-4o approach human-level performance in competitive programming, while demonstrating strong capabilities in practical application development. However, performance gaps emerge in complex, open-ended tasks. The study reveals a clear hierarchy among AI models, with the top performers consistently outperforming others across various tasks. The research also highlights the impact of task complexity on AI performance, with some models struggling significantly as complexity increases. This comprehensive evaluation provides insights into the current state of AI programming abilities, highlighting areas for improvement and raising questions about the future role of AI in software development and computer science education. The authors suggest future research directions, including expanding the scope of evaluations and exploring human-AI collaboration in real-world programming contexts. [ABSTRACT FROM AUTHOR]
ISSN:0094243X
DOI:10.1063/5.0290644