Automatic Unit Test Generation for Programming Assignments Using Large Language Models

Programming knowledge is a crucial aspect of computer science education, and unit testing is commonly employed to automatically assess programming assignments. Instructors and teaching assistants typically invest considerable efforts in writing unit tests, which may still be vulnerable to human over...

Full description

Saved in:
Bibliographic Details
Published in:IEEE/ACM International Conference on Software Engineering: Software Engineering Education and Training (Online) pp. 242 - 252
Main Authors: Zheng, Kaisheng, Shen, Yuanyang, Tao, Yida
Format: Conference Proceeding
Language:English
Published: IEEE 27.04.2025
Subjects:
ISSN:2832-7578
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Programming knowledge is a crucial aspect of computer science education, and unit testing is commonly employed to automatically assess programming assignments. Instructors and teaching assistants typically invest considerable efforts in writing unit tests, which may still be vulnerable to human oversight and mistakes. In this work, we explored the feasibility of using Large Language Models (LLMs) to automate the assessment of programming assignments. In particular, we proposed two approaches: the plain approach that uses GPT-4o-mini in a vanilla setting, and the augmented approach that integrates additional strategies such as tailored prompts with syntax and semantic constraints, and a feedback mechanism with information on test-effectiveness metrics. We evaluate the two approaches on six real-world programming assignments from an introductory-level programming course at our university. Compared to the plain approach, the augmented approach improves the usability and effectiveness of the generated unit tests, reducing 85 % compilation errors while enhancing the statement coverage and mutation scores by 1.7 x and 2.1 x, respectively. In addition, the augmented approach also complements human-written tests by covering additional program behaviors. In a case study of 1296 students' submissions that pass human-written tests, the augmented approach successfully detected new bugs in 13 % submissions, with an accuracy of 27 %. These results not only demonstrate the potentials of LLMs in generating useful unit tests for programming assignments, but also highlight the strategies that can effectively enhance LLMs' capabilities to augment human-written tests, offering practical benefits for both educators and students.
AbstractList Programming knowledge is a crucial aspect of computer science education, and unit testing is commonly employed to automatically assess programming assignments. Instructors and teaching assistants typically invest considerable efforts in writing unit tests, which may still be vulnerable to human oversight and mistakes. In this work, we explored the feasibility of using Large Language Models (LLMs) to automate the assessment of programming assignments. In particular, we proposed two approaches: the plain approach that uses GPT-4o-mini in a vanilla setting, and the augmented approach that integrates additional strategies such as tailored prompts with syntax and semantic constraints, and a feedback mechanism with information on test-effectiveness metrics. We evaluate the two approaches on six real-world programming assignments from an introductory-level programming course at our university. Compared to the plain approach, the augmented approach improves the usability and effectiveness of the generated unit tests, reducing 85 % compilation errors while enhancing the statement coverage and mutation scores by 1.7 x and 2.1 x, respectively. In addition, the augmented approach also complements human-written tests by covering additional program behaviors. In a case study of 1296 students' submissions that pass human-written tests, the augmented approach successfully detected new bugs in 13 % submissions, with an accuracy of 27 %. These results not only demonstrate the potentials of LLMs in generating useful unit tests for programming assignments, but also highlight the strategies that can effectively enhance LLMs' capabilities to augment human-written tests, offering practical benefits for both educators and students.
Author Zheng, Kaisheng
Shen, Yuanyang
Tao, Yida
Author_xml – sequence: 1
  givenname: Kaisheng
  surname: Zheng
  fullname: Zheng, Kaisheng
  email: 12110722@mail.sustech.edu.cn
  organization: Southern University of Science and Technology,Department of Computer Science and Engineering,Shenzhen,China
– sequence: 2
  givenname: Yuanyang
  surname: Shen
  fullname: Shen, Yuanyang
  email: 12112217@mail.sustech.edu.cn
  organization: Southern University of Science and Technology,Department of Computer Science and Engineering,Shenzhen,China
– sequence: 3
  givenname: Yida
  surname: Tao
  fullname: Tao, Yida
  email: taoyd@sustech.edu.cn
  organization: Southern University of Science and Technology,Department of Computer Science and Engineering,Shenzhen,China
BookMark eNotUFFLwzAYjKLgnP0HCvkDnV_ytU3yWMacQkXBzdeRpkmJrKkk3YP_3g59uTuOu3u4W3IVxmAJeWCwYgzU4_pjs9lVFZaw4sDLFQAguyCZEkoishIFKHlJFlwiz0Up5A3JUvo6xzhjgqkF-axP0zjoyRu6D36iO5smurXBxtkbA3VjpO9x7KMeBh96Wqfk-zDYMCW6T2en0bG3M4b-pGfxOnb2mO7ItdPHZLN_XpL902a3fs6bt-3Lum5yz4Sccqcdx1ZVTnWmcyi1YJ12aCstlAEEZYwFg61zAAwraYBB4cqibEULBStwSe7_dr219vAd_aDjz2E-hxfF3PgFAtxVag
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CSEET66350.2025.00031
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798331537098
EISSN 2832-7578
EndPage 252
ExternalDocumentID 11024401
Genre orig-research
GroupedDBID 6IE
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-i178t-faf23b96f9dcdf38a71daf3e6a79c0309cce0c3bff001368c0104f545b7b04143
IEDL.DBID RIE
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001556376200023&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Jun 18 06:01:31 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i178t-faf23b96f9dcdf38a71daf3e6a79c0309cce0c3bff001368c0104f545b7b04143
PageCount 11
ParticipantIDs ieee_primary_11024401
PublicationCentury 2000
PublicationDate 2025-April-27
PublicationDateYYYYMMDD 2025-04-27
PublicationDate_xml – month: 04
  year: 2025
  text: 2025-April-27
  day: 27
PublicationDecade 2020
PublicationTitle IEEE/ACM International Conference on Software Engineering: Software Engineering Education and Training (Online)
PublicationTitleAbbrev CSEET
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003211719
Score 2.2900047
Snippet Programming knowledge is a crucial aspect of computer science education, and unit testing is commonly employed to automatically assess programming assignments....
SourceID ieee
SourceType Publisher
StartPage 242
SubjectTerms Computer bugs
Large language models
programming assignments
Programming profession
Reproducibility of results
Semantics
Syntactics
Test pattern generators
Testing
Unit test generation
Usability
Writing
Title Automatic Unit Test Generation for Programming Assignments Using Large Language Models
URI https://ieeexplore.ieee.org/document/11024401
WOSCitedRecordID wos001556376200023&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFA46PHhSceJvcvBaly5dX3OU0eFhjIFTdhtJmkhBW9la_37fS-f04sFLKQ2h8AL53q_ve4zdAYKGxzAjEkamUZJmjga520iPTJoVTrrYB53ZKcxm2XKp5luyeuDCOOdC85m7p9dQyy9q21KqbIBQhWhEbK19AOjIWruEisRQBmK1ZenEQg3GT3m-IEQVGAcOKXciaJbcrykqAUQmR__8_THr_9Dx-HwHNCdsz1Wn7OWhbeqgt8rJb-QLvN55JyJNtubojNImar56x10cz6F87QhtPPQJ8Ck1geOzS1hymor2tumz50m-GD9G2yEJURlD1kRe-6E0KvWqsIWXmYa40F66VIOyVD-x1gkrjfdBni2zFIB59JsMGJGgt3TGelVduXPGtaR1lUgtbQIjwKvPABFPPSJaIZIL1iejrD46HYzVtz0u__h-xQ7J7lR7GcI16zXr1t2wA_vZlJv1bTi9LyLFmzQ
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFA8yBT2pOPHbHLzWpU3aNEcZGxPrGFhlt5GkiQx0k63z7_e9dE4vHryU0BAI70F-7-v3HiE3EkDDg5sRMcOzSGS5w0HuNtKpyfLKcRf70Ge2kMNhPh6r0ZqsHrgwzrlQfOZucRly-dXcrjBU1gGoAjRCttZ2KkQSN3StTUiFgzMjY7Xm6cRMdbpPvV6JmMrAE0wwesJwmtyvOSoBRvr7_7zAAWn_EPLoaAM1h2TLzY7Iy92qnoeOqxQtR1rCA0-bNtIobQrmKB7C8qt3OEVBE9PXhtJGQ6UALbAMHL5NyJLiXLS3ZZs893tldxCtxyRE01jmdeS1T7hRmVeVrTzPtYwr7bnLtFQWMyjWOma58T40aMstumAeLCcjDRNgLx2T1mw-cyeEao77SnDNrZCphMfPSKSeesC0iolT0kahTD6aThiTb3mc_fH_muwOysdiUtwPH87JHuoAMzGJvCCterFyl2THftbT5eIqaPILZSeeew
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+International+Conference+on+Software+Engineering%3A+Software+Engineering+Education+and+Training+%28Online%29&rft.atitle=Automatic+Unit+Test+Generation+for+Programming+Assignments+Using+Large+Language+Models&rft.au=Zheng%2C+Kaisheng&rft.au=Shen%2C+Yuanyang&rft.au=Tao%2C+Yida&rft.date=2025-04-27&rft.pub=IEEE&rft.eissn=2832-7578&rft.spage=242&rft.epage=252&rft_id=info:doi/10.1109%2FCSEET66350.2025.00031&rft.externalDocID=11024401