Automatic Unit Test Generation for Programming Assignments Using Large Language Models

Programming knowledge is a crucial aspect of computer science education, and unit testing is commonly employed to automatically assess programming assignments. Instructors and teaching assistants typically invest considerable efforts in writing unit tests, which may still be vulnerable to human over...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE/ACM International Conference on Software Engineering: Software Engineering Education and Training (Online) pp. 242 - 252
Main Authors:	Zheng, Kaisheng, Shen, Yuanyang, Tao, Yida
Format:	Conference Proceeding
Language:	English
Published:	IEEE 27.04.2025
Subjects:	Computer bugs Large language models programming assignments Programming profession Reproducibility of results Semantics Syntactics Test pattern generators Testing Unit test generation Usability Writing
ISSN:	2832-7578
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	Programming knowledge is a crucial aspect of computer science education, and unit testing is commonly employed to automatically assess programming assignments. Instructors and teaching assistants typically invest considerable efforts in writing unit tests, which may still be vulnerable to human oversight and mistakes. In this work, we explored the feasibility of using Large Language Models (LLMs) to automate the assessment of programming assignments. In particular, we proposed two approaches: the plain approach that uses GPT-4o-mini in a vanilla setting, and the augmented approach that integrates additional strategies such as tailored prompts with syntax and semantic constraints, and a feedback mechanism with information on test-effectiveness metrics. We evaluate the two approaches on six real-world programming assignments from an introductory-level programming course at our university. Compared to the plain approach, the augmented approach improves the usability and effectiveness of the generated unit tests, reducing 85 % compilation errors while enhancing the statement coverage and mutation scores by 1.7 x and 2.1 x, respectively. In addition, the augmented approach also complements human-written tests by covering additional program behaviors. In a case study of 1296 students' submissions that pass human-written tests, the augmented approach successfully detected new bugs in 13 % submissions, with an accuracy of 27 %. These results not only demonstrate the potentials of LLMs in generating useful unit tests for programming assignments, but also highlight the strategies that can effectively enhance LLMs' capabilities to augment human-written tests, offering practical benefits for both educators and students.
AbstractList	Programming knowledge is a crucial aspect of computer science education, and unit testing is commonly employed to automatically assess programming assignments. Instructors and teaching assistants typically invest considerable efforts in writing unit tests, which may still be vulnerable to human oversight and mistakes. In this work, we explored the feasibility of using Large Language Models (LLMs) to automate the assessment of programming assignments. In particular, we proposed two approaches: the plain approach that uses GPT-4o-mini in a vanilla setting, and the augmented approach that integrates additional strategies such as tailored prompts with syntax and semantic constraints, and a feedback mechanism with information on test-effectiveness metrics. We evaluate the two approaches on six real-world programming assignments from an introductory-level programming course at our university. Compared to the plain approach, the augmented approach improves the usability and effectiveness of the generated unit tests, reducing 85 % compilation errors while enhancing the statement coverage and mutation scores by 1.7 x and 2.1 x, respectively. In addition, the augmented approach also complements human-written tests by covering additional program behaviors. In a case study of 1296 students' submissions that pass human-written tests, the augmented approach successfully detected new bugs in 13 % submissions, with an accuracy of 27 %. These results not only demonstrate the potentials of LLMs in generating useful unit tests for programming assignments, but also highlight the strategies that can effectively enhance LLMs' capabilities to augment human-written tests, offering practical benefits for both educators and students.
Author	Zheng, Kaisheng Shen, Yuanyang Tao, Yida
Author_xml	– sequence: 1 givenname: Kaisheng surname: Zheng fullname: Zheng, Kaisheng email: 12110722@mail.sustech.edu.cn organization: Southern University of Science and Technology,Department of Computer Science and Engineering,Shenzhen,China – sequence: 2 givenname: Yuanyang surname: Shen fullname: Shen, Yuanyang email: 12112217@mail.sustech.edu.cn organization: Southern University of Science and Technology,Department of Computer Science and Engineering,Shenzhen,China – sequence: 3 givenname: Yida surname: Tao fullname: Tao, Yida email: taoyd@sustech.edu.cn organization: Southern University of Science and Technology,Department of Computer Science and Engineering,Shenzhen,China
BookMark	eNotUFFLwzAYjKLgnP0HCvkDnV_ytU3yWMacQkXBzdeRpkmJrKkk3YP_3g59uTuOu3u4W3IVxmAJeWCwYgzU4_pjs9lVFZaw4sDLFQAguyCZEkoishIFKHlJFlwiz0Up5A3JUvo6xzhjgqkF-axP0zjoyRu6D36iO5smurXBxtkbA3VjpO9x7KMeBh96Wqfk-zDYMCW6T2en0bG3M4b-pGfxOnb2mO7ItdPHZLN_XpL902a3fs6bt-3Lum5yz4Sccqcdx1ZVTnWmcyi1YJ12aCstlAEEZYwFg61zAAwraYBB4cqibEULBStwSe7_dr219vAd_aDjz2E-hxfF3PgFAtxVag
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/CSEET66350.2025.00031
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9798331537098
EISSN	2832-7578
EndPage	252
ExternalDocumentID	11024401
Genre	orig-research
GroupedDBID	6IE 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL
ID	FETCH-LOGICAL-i178t-faf23b96f9dcdf38a71daf3e6a79c0309cce0c3bff001368c0104f545b7b04143
IEDL.DBID	RIE
ISICitedReferencesCount	0
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001556376200023&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Jun 18 06:01:31 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i178t-faf23b96f9dcdf38a71daf3e6a79c0309cce0c3bff001368c0104f545b7b04143
PageCount	11
ParticipantIDs	ieee_primary_11024401
PublicationCentury	2000
PublicationDate	2025-April-27
PublicationDateYYYYMMDD	2025-04-27
PublicationDate_xml	– month: 04 year: 2025 text: 2025-April-27 day: 27
PublicationDecade	2020
PublicationTitle	IEEE/ACM International Conference on Software Engineering: Software Engineering Education and Training (Online)
PublicationTitleAbbrev	CSEET
PublicationYear	2025
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0003211719
Score	2.2900047
Snippet	Programming knowledge is a crucial aspect of computer science education, and unit testing is commonly employed to automatically assess programming assignments....
SourceID	ieee
SourceType	Publisher
StartPage	242
SubjectTerms	Computer bugs Large language models programming assignments Programming profession Reproducibility of results Semantics Syntactics Test pattern generators Testing Unit test generation Usability Writing
Title	Automatic Unit Test Generation for Programming Assignments Using Large Language Models
URI	https://ieeexplore.ieee.org/document/11024401
WOSCitedRecordID	wos001556376200023&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFA46PHhSceJvcvBaly5dX3OU0eFhjIFTdhtJmkhBW9la_37fS-f04sFLKQ2h8AL53q_ve4zdAYKGxzAjEkamUZJmjga520iPTJoVTrrYB53ZKcxm2XKp5luyeuDCOOdC85m7p9dQyy9q21KqbIBQhWhEbK19AOjIWruEisRQBmK1ZenEQg3GT3m-IEQVGAcOKXciaJbcrykqAUQmR__8_THr_9Dx-HwHNCdsz1Wn7OWhbeqgt8rJb-QLvN55JyJNtubojNImar56x10cz6F87QhtPPQJ8Ck1geOzS1hymor2tumz50m-GD9G2yEJURlD1kRe-6E0KvWqsIWXmYa40F66VIOyVD-x1gkrjfdBni2zFIB59JsMGJGgt3TGelVduXPGtaR1lUgtbQIjwKvPABFPPSJaIZIL1iejrD46HYzVtz0u__h-xQ7J7lR7GcI16zXr1t2wA_vZlJv1bTi9LyLFmzQ
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFA8yBT2pOPHbHLzWpU3aNEcZGxPrGFhlt5GkiQx0k63z7_e9dE4vHryU0BAI70F-7-v3HiE3EkDDg5sRMcOzSGS5w0HuNtKpyfLKcRf70Ge2kMNhPh6r0ZqsHrgwzrlQfOZucRly-dXcrjBU1gGoAjRCttZ2KkQSN3StTUiFgzMjY7Xm6cRMdbpPvV6JmMrAE0wwesJwmtyvOSoBRvr7_7zAAWn_EPLoaAM1h2TLzY7Iy92qnoeOqxQtR1rCA0-bNtIobQrmKB7C8qt3OEVBE9PXhtJGQ6UALbAMHL5NyJLiXLS3ZZs893tldxCtxyRE01jmdeS1T7hRmVeVrTzPtYwr7bnLtFQWMyjWOma58T40aMstumAeLCcjDRNgLx2T1mw-cyeEao77SnDNrZCphMfPSKSeesC0iolT0kahTD6aThiTb3mc_fH_muwOysdiUtwPH87JHuoAMzGJvCCterFyl2THftbT5eIqaPILZSeeew
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+International+Conference+on+Software+Engineering%3A+Software+Engineering+Education+and+Training+%28Online%29&rft.atitle=Automatic+Unit+Test+Generation+for+Programming+Assignments+Using+Large+Language+Models&rft.au=Zheng%2C+Kaisheng&rft.au=Shen%2C+Yuanyang&rft.au=Tao%2C+Yida&rft.date=2025-04-27&rft.pub=IEEE&rft.eissn=2832-7578&rft.spage=242&rft.epage=252&rft_id=info:doi/10.1109%2FCSEET66350.2025.00031&rft.externalDocID=11024401