Automatic Unit Test Generation for Programming Assignments Using Large Language Models
Programming knowledge is a crucial aspect of computer science education, and unit testing is commonly employed to automatically assess programming assignments. Instructors and teaching assistants typically invest considerable efforts in writing unit tests, which may still be vulnerable to human over...
Saved in:
| Published in: | IEEE/ACM International Conference on Software Engineering: Software Engineering Education and Training (Online) pp. 242 - 252 |
|---|---|
| Main Authors: | , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
27.04.2025
|
| Subjects: | |
| ISSN: | 2832-7578 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Programming knowledge is a crucial aspect of computer science education, and unit testing is commonly employed to automatically assess programming assignments. Instructors and teaching assistants typically invest considerable efforts in writing unit tests, which may still be vulnerable to human oversight and mistakes. In this work, we explored the feasibility of using Large Language Models (LLMs) to automate the assessment of programming assignments. In particular, we proposed two approaches: the plain approach that uses GPT-4o-mini in a vanilla setting, and the augmented approach that integrates additional strategies such as tailored prompts with syntax and semantic constraints, and a feedback mechanism with information on test-effectiveness metrics. We evaluate the two approaches on six real-world programming assignments from an introductory-level programming course at our university. Compared to the plain approach, the augmented approach improves the usability and effectiveness of the generated unit tests, reducing 85 % compilation errors while enhancing the statement coverage and mutation scores by 1.7 x and 2.1 x, respectively. In addition, the augmented approach also complements human-written tests by covering additional program behaviors. In a case study of 1296 students' submissions that pass human-written tests, the augmented approach successfully detected new bugs in 13 % submissions, with an accuracy of 27 %. These results not only demonstrate the potentials of LLMs in generating useful unit tests for programming assignments, but also highlight the strategies that can effectively enhance LLMs' capabilities to augment human-written tests, offering practical benefits for both educators and students. |
|---|---|
| AbstractList | Programming knowledge is a crucial aspect of computer science education, and unit testing is commonly employed to automatically assess programming assignments. Instructors and teaching assistants typically invest considerable efforts in writing unit tests, which may still be vulnerable to human oversight and mistakes. In this work, we explored the feasibility of using Large Language Models (LLMs) to automate the assessment of programming assignments. In particular, we proposed two approaches: the plain approach that uses GPT-4o-mini in a vanilla setting, and the augmented approach that integrates additional strategies such as tailored prompts with syntax and semantic constraints, and a feedback mechanism with information on test-effectiveness metrics. We evaluate the two approaches on six real-world programming assignments from an introductory-level programming course at our university. Compared to the plain approach, the augmented approach improves the usability and effectiveness of the generated unit tests, reducing 85 % compilation errors while enhancing the statement coverage and mutation scores by 1.7 x and 2.1 x, respectively. In addition, the augmented approach also complements human-written tests by covering additional program behaviors. In a case study of 1296 students' submissions that pass human-written tests, the augmented approach successfully detected new bugs in 13 % submissions, with an accuracy of 27 %. These results not only demonstrate the potentials of LLMs in generating useful unit tests for programming assignments, but also highlight the strategies that can effectively enhance LLMs' capabilities to augment human-written tests, offering practical benefits for both educators and students. |
| Author | Zheng, Kaisheng Shen, Yuanyang Tao, Yida |
| Author_xml | – sequence: 1 givenname: Kaisheng surname: Zheng fullname: Zheng, Kaisheng email: 12110722@mail.sustech.edu.cn organization: Southern University of Science and Technology,Department of Computer Science and Engineering,Shenzhen,China – sequence: 2 givenname: Yuanyang surname: Shen fullname: Shen, Yuanyang email: 12112217@mail.sustech.edu.cn organization: Southern University of Science and Technology,Department of Computer Science and Engineering,Shenzhen,China – sequence: 3 givenname: Yida surname: Tao fullname: Tao, Yida email: taoyd@sustech.edu.cn organization: Southern University of Science and Technology,Department of Computer Science and Engineering,Shenzhen,China |
| BookMark | eNotUFFLwzAYjKLgnP0HCvkDnV_ytU3yWMacQkXBzdeRpkmJrKkk3YP_3g59uTuOu3u4W3IVxmAJeWCwYgzU4_pjs9lVFZaw4sDLFQAguyCZEkoishIFKHlJFlwiz0Up5A3JUvo6xzhjgqkF-axP0zjoyRu6D36iO5smurXBxtkbA3VjpO9x7KMeBh96Wqfk-zDYMCW6T2en0bG3M4b-pGfxOnb2mO7ItdPHZLN_XpL902a3fs6bt-3Lum5yz4Sccqcdx1ZVTnWmcyi1YJ12aCstlAEEZYwFg61zAAwraYBB4cqibEULBStwSe7_dr219vAd_aDjz2E-hxfF3PgFAtxVag |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/CSEET66350.2025.00031 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798331537098 |
| EISSN | 2832-7578 |
| EndPage | 252 |
| ExternalDocumentID | 11024401 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL |
| ID | FETCH-LOGICAL-i178t-faf23b96f9dcdf38a71daf3e6a79c0309cce0c3bff001368c0104f545b7b04143 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001556376200023&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Jun 18 06:01:31 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i178t-faf23b96f9dcdf38a71daf3e6a79c0309cce0c3bff001368c0104f545b7b04143 |
| PageCount | 11 |
| ParticipantIDs | ieee_primary_11024401 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-April-27 |
| PublicationDateYYYYMMDD | 2025-04-27 |
| PublicationDate_xml | – month: 04 year: 2025 text: 2025-April-27 day: 27 |
| PublicationDecade | 2020 |
| PublicationTitle | IEEE/ACM International Conference on Software Engineering: Software Engineering Education and Training (Online) |
| PublicationTitleAbbrev | CSEET |
| PublicationYear | 2025 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0003211719 |
| Score | 2.2900047 |
| Snippet | Programming knowledge is a crucial aspect of computer science education, and unit testing is commonly employed to automatically assess programming assignments.... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 242 |
| SubjectTerms | Computer bugs Large language models programming assignments Programming profession Reproducibility of results Semantics Syntactics Test pattern generators Testing Unit test generation Usability Writing |
| Title | Automatic Unit Test Generation for Programming Assignments Using Large Language Models |
| URI | https://ieeexplore.ieee.org/document/11024401 |
| WOSCitedRecordID | wos001556376200023&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFA46PHhSceJvcvBaly5dX3OU0eFhjIFTdhtJmkhBW9la_37fS-f04sFLKQ2h8AL53q_ve4zdAYKGxzAjEkamUZJmjga520iPTJoVTrrYB53ZKcxm2XKp5luyeuDCOOdC85m7p9dQyy9q21KqbIBQhWhEbK19AOjIWruEisRQBmK1ZenEQg3GT3m-IEQVGAcOKXciaJbcrykqAUQmR__8_THr_9Dx-HwHNCdsz1Wn7OWhbeqgt8rJb-QLvN55JyJNtubojNImar56x10cz6F87QhtPPQJ8Ck1geOzS1hymor2tumz50m-GD9G2yEJURlD1kRe-6E0KvWqsIWXmYa40F66VIOyVD-x1gkrjfdBni2zFIB59JsMGJGgt3TGelVduXPGtaR1lUgtbQIjwKvPABFPPSJaIZIL1iejrD46HYzVtz0u__h-xQ7J7lR7GcI16zXr1t2wA_vZlJv1bTi9LyLFmzQ |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFA8yBT2pOPHbHLzWpU3aNEcZGxPrGFhlt5GkiQx0k63z7_e9dE4vHryU0BAI70F-7-v3HiE3EkDDg5sRMcOzSGS5w0HuNtKpyfLKcRf70Ge2kMNhPh6r0ZqsHrgwzrlQfOZucRly-dXcrjBU1gGoAjRCttZ2KkQSN3StTUiFgzMjY7Xm6cRMdbpPvV6JmMrAE0wwesJwmtyvOSoBRvr7_7zAAWn_EPLoaAM1h2TLzY7Iy92qnoeOqxQtR1rCA0-bNtIobQrmKB7C8qt3OEVBE9PXhtJGQ6UALbAMHL5NyJLiXLS3ZZs893tldxCtxyRE01jmdeS1T7hRmVeVrTzPtYwr7bnLtFQWMyjWOma58T40aMstumAeLCcjDRNgLx2T1mw-cyeEao77SnDNrZCphMfPSKSeesC0iolT0kahTD6aThiTb3mc_fH_muwOysdiUtwPH87JHuoAMzGJvCCterFyl2THftbT5eIqaPILZSeeew |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+International+Conference+on+Software+Engineering%3A+Software+Engineering+Education+and+Training+%28Online%29&rft.atitle=Automatic+Unit+Test+Generation+for+Programming+Assignments+Using+Large+Language+Models&rft.au=Zheng%2C+Kaisheng&rft.au=Shen%2C+Yuanyang&rft.au=Tao%2C+Yida&rft.date=2025-04-27&rft.pub=IEEE&rft.eissn=2832-7578&rft.spage=242&rft.epage=252&rft_id=info:doi/10.1109%2FCSEET66350.2025.00031&rft.externalDocID=11024401 |