A Large-Scale Empirical Study on Fine-Tuning Large Language Models for Unit Testing

Unit testing plays a pivotal role in software development, improving software quality and reliability. However, generating effective test cases manually is time-consuming, prompting interest in unit testing research. Recently, Large Language Models (LLMs) have shown potential in various unit testing...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings of the ACM on software engineering Ročník 2; číslo ISSTA; s. 1678 - 1700
Hlavní autoři:	Shang, Ye, Zhang, Quanjun, Fang, Chunrong, Gu, Siqi, Zhou, Jianyi, Chen, Zhenyu
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	New York, NY, USA ACM 22.06.2025
Témata:	Software and its engineering Software testing and debugging Unit Testing Large Language Model AI for SE Software Testing
ISSN:	2994-970X, 2994-970X
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	Unit testing plays a pivotal role in software development, improving software quality and reliability. However, generating effective test cases manually is time-consuming, prompting interest in unit testing research. Recently, Large Language Models (LLMs) have shown potential in various unit testing tasks, including test generation, assertion generation, and test evolution, but existing studies are limited in scope and lack a systematic evaluation of the effectiveness of LLMs. To bridge this gap, we present a large-scale empirical study on fine-tuning LLMs for unit testing. Our study involves three unit testing tasks, five benchmarks, eight evaluation metrics, and 37 popular LLMs across various architectures and sizes, consuming over 3,000 NVIDIA A100 GPU hours. We focus on three key research questions: (1) the performance of LLMs compared to state-of-the-art methods, (2) the impact of different factors on LLM performance, and (3) the effectiveness of fine-tuning versus prompt engineering. Our findings reveal that LLMs outperform existing state-of-the-art approaches on all three unit testing tasks across nearly all metrics, highlighting the potential of fine-tuning LLMs in unit testing tasks. Furthermore, large-scale, decoder-only models achieve the best results across tasks, while encoder-decoder models perform better under the same parameter scale. Additionally, the comparison of the performance between fine-tuning and prompt engineering approaches reveals the considerable potential capability of the prompt engineering approach in unit testing tasks. We then discuss the concerned issues on the test generation task, including data leakage issues, bug detection capabilities, and metrics comparisons. Finally, we further pinpoint carious practical guidelines for LLM-based approaches to unit testing tasks in the near future. Overall, our work demonstrates the promising future of fine-tuning LLMs on unit testing tasks and reduces the manual efforts of unit testing experts in practical scenarios.
AbstractList	Unit testing plays a pivotal role in software development, improving software quality and reliability. However, generating effective test cases manually is time-consuming, prompting interest in unit testing research. Recently, Large Language Models (LLMs) have shown potential in various unit testing tasks, including test generation, assertion generation, and test evolution, but existing studies are limited in scope and lack a systematic evaluation of the effectiveness of LLMs. To bridge this gap, we present a large-scale empirical study on fine-tuning LLMs for unit testing. Our study involves three unit testing tasks, five benchmarks, eight evaluation metrics, and 37 popular LLMs across various architectures and sizes, consuming over 3,000 NVIDIA A100 GPU hours. We focus on three key research questions: (1) the performance of LLMs compared to state-of-the-art methods, (2) the impact of different factors on LLM performance, and (3) the effectiveness of fine-tuning versus prompt engineering. Our findings reveal that LLMs outperform existing state-of-the-art approaches on all three unit testing tasks across nearly all metrics, highlighting the potential of fine-tuning LLMs in unit testing tasks. Furthermore, large-scale, decoder-only models achieve the best results across tasks, while encoder-decoder models perform better under the same parameter scale. Additionally, the comparison of the performance between fine-tuning and prompt engineering approaches reveals the considerable potential capability of the prompt engineering approach in unit testing tasks. We then discuss the concerned issues on the test generation task, including data leakage issues, bug detection capabilities, and metrics comparisons. Finally, we further pinpoint carious practical guidelines for LLM-based approaches to unit testing tasks in the near future. Overall, our work demonstrates the promising future of fine-tuning LLMs on unit testing tasks and reduces the manual efforts of unit testing experts in practical scenarios. Unit testing plays a pivotal role in software development, improving software quality and reliability. However, generating effective test cases manually is time-consuming, prompting interest in unit testing research. Recently, Large Language Models (LLMs) have shown potential in various unit testing tasks, including test generation, assertion generation, and test evolution, but existing studies are limited in scope and lack a systematic evaluation of the effectiveness of LLMs. To bridge this gap, we present a large-scale empirical study on fine-tuning LLMs for unit testing. Our study involves three unit testing tasks, five benchmarks, eight evaluation metrics, and 37 popular LLMs across various architectures and sizes, consuming over 3,000 NVIDIA A100 GPU hours. We focus on three key research questions: (1) the performance of LLMs compared to state-of-the-art methods, (2) the impact of different factors on LLM performance, and (3) the effectiveness of fine-tuning versus prompt engineering. Our findings reveal that LLMs outperform existing state-of-the-art approaches on all three unit testing tasks across nearly all metrics, highlighting the potential of fine-tuning LLMs in unit testing tasks. Furthermore, large-scale, decoder-only models achieve the best results across tasks, while encoder-decoder models perform better under the same parameter scale. Additionally, the comparison of the performance between fine-tuning and prompt engineering approaches reveals the considerable potential capability of the prompt engineering approach in unit testing tasks. We then discuss the concerned issues on the test generation task, including data leakage issues, bug detection capabilities, and metrics comparisons. Finally, we further pinpoint carious practical guidelines for LLM-based approaches to unit testing tasks in the near future. Overall, our work demonstrates the promising future of fine-tuning LLMs on unit testing tasks and reduces the manual efforts of unit testing experts in practical scenarios.
ArticleNumber	ISSTA074
Author	Gu, Siqi Zhang, Quanjun Chen, Zhenyu Zhou, Jianyi Shang, Ye Fang, Chunrong
Author_xml	– sequence: 1 givenname: Ye orcidid: 0009-0000-8699-8075 surname: Shang fullname: Shang, Ye email: 522023320132@smail.nju.edu.cn organization: Nanjing University, Nanjing, China – sequence: 2 givenname: Quanjun orcidid: 0000-0002-2495-3805 surname: Zhang fullname: Zhang, Quanjun email: quanjun.zhang@smail.nju.edu.cn organization: Nanjing University of Science and Technology, Nanjing, China – sequence: 3 givenname: Chunrong orcidid: 0000-0002-9930-7111 surname: Fang fullname: Fang, Chunrong email: fangchunrong@nju.edu.cn organization: Nanjing University, Nanjing, China – sequence: 4 givenname: Siqi orcidid: 0000-0001-5514-6734 surname: Gu fullname: Gu, Siqi email: siqi.gu@smail.nju.edu.cn organization: Nanjing University, Nanjing, China – sequence: 5 givenname: Jianyi orcidid: 0000-0002-4867-5416 surname: Zhou fullname: Zhou, Jianyi email: zhoujianyi2@huawei.com organization: Huawei Cloud Computing Technologies, Beijing, China – sequence: 6 givenname: Zhenyu orcidid: 0000-0002-9592-7022 surname: Chen fullname: Chen, Zhenyu email: zychen@nju.edu.cn organization: Shenzhen Research Institute of Nanjing University, Shenzhen, China
BookMark	eNptkE1LAzEQhoNUsNbi3VNunqJJNtvdHEtpVah46Ba8LZOvEtlmS7J76L93pVVEvMw8zDy8h_cajUIbLEK3jD4wJvLHrOClzNkFGnMpBZEFfR_94is0TemDUjpcGCvoGG3meA1xZ8lGQ2Pxcn_w0Q-IN11vjrgNeOWDJVUffNid1GGGXQ8DvLbGNgm7NuJt8B2ubOoG7QZdOmiSnZ73BG1Xy2rxTNZvTy-L-ZoA45wRKGiW88yUnFlVUkENFNnMlZIZ4wxV2ipthFF5rlzmQGgxs0xRJV0OQnKXTdD9KVfHNqVoXX2Ifg_xWDNaf9VRn-sYTPLH1L6Dzrehi-Cbf_y7kw96_xP6_fwEgbdqaA
CitedBy_id	crossref_primary_10_3390_make7030097
Cites_doi	10.1145/1390630.1390635 10.1109/ASE.2015.49 10.1109/TSE.2023.3334955 10.1145/3510003.3510149 10.1109/ISSRE.2014.11 10.1109/TSE.2024.3368208 10.1007/s10009-014-0355-9 10.1145/2025113.2025179 10.1145/3631974 10.1145/3699598 10.1016/j.jss.2022.111419 10.1109/TDSC.2023.3308897 10.1145/1950365.1950396
ContentType	Journal Article
Copyright	Owner/Author
Copyright_xml	– notice: Owner/Author
DBID	AAYXX CITATION
DOI	10.1145/3728951
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList	CrossRef
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	2994-970X
EndPage	1700
ExternalDocumentID	10_1145_3728951 3728951
GrantInformation_xml	– fundername: National Natural Science Foundation of China grantid: U24A20337, 61932012, 62372228 funderid: https://doi.org/10.13039/501100001809 – fundername: Science, Technology and Innovation Commission of Shenzhen Municipality grantid: CJGJZD20200617103001003, 2021Szvup057 funderid: https://doi.org/10.13039/501100010877 – fundername: CCF-Huawei Populus Grove Fund grantid: CCF-HuaweiSE202304, CCF-HuaweiSY202306
GroupedDBID	AAKMM ACM AEJOY AKRVB ALMA_UNASSIGNED_HOLDINGS LHSKQ M~E AAYXX CITATION
ID	FETCH-LOGICAL-a1221-a703523d821eb8040da736f891ddfd0bcebcd4db55bf3fa4c46e1b0b9f5a492f3
ISSN	2994-970X
IngestDate	Tue Nov 18 21:57:37 EST 2025 Sat Nov 29 07:43:49 EST 2025 Mon Jul 14 20:48:59 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	ISSTA
Keywords	Unit Testing Large Language Model AI for SE Software Testing
Language	English
License	This work is licensed under Creative Commons Attribution International 4.0.
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-a1221-a703523d821eb8040da736f891ddfd0bcebcd4db55bf3fa4c46e1b0b9f5a492f3
ORCID	0000-0002-4867-5416 0000-0002-9592-7022 0000-0001-5514-6734 0009-0000-8699-8075 0000-0002-2495-3805 0000-0002-9930-7111
OpenAccessLink	https://dl.acm.org/doi/10.1145/3728951
PageCount	23
ParticipantIDs	crossref_primary_10_1145_3728951 crossref_citationtrail_10_1145_3728951 acm_primary_3728951
PublicationCentury	2000
PublicationDate	20250622 2025-06-22
PublicationDateYYYYMMDD	2025-06-22
PublicationDate_xml	– month: 06 year: 2025 text: 20250622 day: 22
PublicationDecade	2020
PublicationPlace	New York, NY, USA
PublicationPlace_xml	– name: New York, NY, USA
PublicationTitle	Proceedings of the ACM on software engineering
PublicationTitleAbbrev	ACM PACMSE
PublicationYear	2025
Publisher	ACM
Publisher_xml	– name: ACM
References	2024. Commons Lang. https://commons.apache.org/proper/commons-lang Tao Xie, Darko Marinov, Wolfram Schulte, and David Notkin. 2005. Symstra: A Framework for Generating Object-Oriented Unit Tests Using Symbolic Execution. In Tools and Algorithms for the Construction and Analysis of Systems, 11th International Conference, TACAS 2005, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2005,, April 4-8, 2005, Proceedings (Lecture Notes in Computer Science, Vol. 3440). Springer, 365–381. A. Jefferson Offutt and Aynur Abdurazik. 1999. Generating Tests from UML Specifications. In « UML» ’99: The Unified Modeling Language - Beyond the Standard, Second International Conference, Fort Collins, CO, USA, October 28-30, 1999, Proceedings (Lecture Notes in Computer Science, Vol. 1723). Springer, 416–429. Luciano Baresi and Matteo Miraz. 2010. TestFul: Automatic Unit-Test Generation for Java Classes. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2, ICSE 2010, Cape Town, South Africa, 1-8 May 2010. ACM, 281–284. Gordon Fraser and Andrea Arcuri. 2011. Evosuite: Automatic Test Suite Generation for Object-Oriented Software. In SIGSOFT/FSE’11 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-19) and ESEC’11: 13th European Software Engineering Conference (ESEC-13), Szeged, Hungary, September 5-9, 2011. ACM, 416–419. Siqi Gu, Chunrong Fang, Quanjun Zhang, Fangyuan Tian, Jianyi Zhou, and Zhenyu Chen. 2024. TestART: Improving LLM-based Unit Test via Co-evolution of Automated Generation and Repair Iteration. CoRR, abs/2408.03095 (2024), arXiv–2408. Soneya Binta Hossain, Antonio Filieri, Matthew B. Dwyer, Sebastian G. Elbaum, and Willem Visser. 2023. Neural-Based Test Oracle Generation: A Large-Scale Evaluation and Lessons Learned. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, CA, USA, December 3-9, 2023. ACM, 120–132. 2024. JFreeChart. https://jfree.org/jfreechart Toufique Ahmed, Kunal Suresh Pai, Premkumar T. Devanbu, and Earl T. Barr. 2024. Automatic Semantic Augmentation of Language Model Prompts (for Code Summarization). In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024. ACM, 220:1–220:13. Soneya Binta Hossain and Matthew B. Dwyer. 2024. TOGLL: Correct and Strong Test Oracle Generation with LLMs. CoRR, abs/2405.03786 (2024), arXiv–2405. Quanjun Zhang, Ye Shang, Chunrong Fang, Siqi Gu, Jianyi Zhou, and Zhenyu Chen. 2024. TestBench: Evaluating Class-Level Test Case Generation Capability of Large Language Models. arXiv e-prints, abs/2409.17561 (2024), arXiv–2409. Quanjun Zhang, Chunrong Fang, Yang Xie, Yuxiang Ma, Weisong Sun, Yun Yang, and Zhenyu Chen. 2024. A Systematic Literature Review on Large Language Models for Automated Program Repair. CoRR, abs/2405.01466 (2024), arXiv–2405. Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, Shao Kun Deng, and Neel Sundaresan. 2020. Unit Test Case Generation with Transformers and Focal Context. arXiv e-prints, abs/2009.05617 (2020), arXiv–2009. Angelo Gargantini and Constance L. Heitmeyer. 1999. Using Model Checking to Generate Tests from Requirements Specifications. In Software Engineering - ESEC/FSE’99, 7th European Software Engineering Conference, Held Jointly with the 7th ACM SIGSOFT Symposium on the Foundations of Software Engineering, Toulouse, France, September 1999, Proceedings (Lecture Notes in Computer Science, Vol. 1687). Springer, 146–162. 2024. google gson. https://github.com/google/gson Yue Wang, Weishi Wang, Shafiq R. Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021. Association for Computational Linguistics, 8696–8708. Quanjun Zhang, Chunrong Fang, Weisong Sun, Shengcheng Yu, Yutao Xu, and Yulei Liu. 2022. Test Case Prioritization Using Partial Attention. J. Syst. Softw., 192 (2022), 111419. Cristian Cadar, Patrice Godefroid, Sarfraz Khurshid, Corina S. Pasareanu, Koushik Sen, Nikolai Tillmann, and Willem Visser. 2011. Symbolic Execution for Software Testing in Practice: Preliminary Assessment. In Proceedings of the 33rd International Conference on Software Engineering, ICSE 2011, Waikiki, Honolulu , HI, USA, May 21-28, 2011. ACM, 1066–1071. Weifeng Sun, Meng Yan, Zhongxin Liu, Xin Xia, Yan Lei, and David Lo. 2023. Revisiting the Identification of the Co-evolution of Production and Test Code. ACM Trans. Softw. Eng. Methodol., 32, 6 (2023), 152:1–152:37. Michele Tufano, Shao Kun Deng, Neel Sundaresan, and Alexey Svyatkovskiy. 2022. METHODS2TEST: A Dataset of Focal Methods Mapped to Test Cases. In 19th IEEE/ACM International Conference on Mining Software Repositories, MSR 2022, Pittsburgh, PA, USA, May 23-24, 2022. ACM, 299–303. Quanjun Zhang, Chunrong Fang, Yuxiang Ma, Weisong Sun, and Zhenyu Chen. 2024. A Survey of Learning-based Automated Program Repair. ACM Trans. Softw. Eng. Methodol., 33, 2 (2024), 55:1–55:69. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. 5998–6008. Eduard Paul Enoiu, Adnan Causevic, Thomas J. Ostrand, Elaine J. Weyuker, Daniel Sundmark, and Paul Pettersson. 2016. Automated Test Generation Using Model Checking: An Industrial Evaluation. Int. J. Softw. Tools Technol. Transf., 18, 3 (2016), 335–353. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6-12, 2002, Philadelphia, PA, USA. ACL, 311–318. Elizabeth Dinella, Gabriel Ryan, Todd Mytkowicz, and Shuvendu K. Lahiri. 2022. TOGA: A Neural Method for Test Oracle Generation. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 2130–2141. Chao Ni, Xiaoya Wang, Liushan Chen, Dehai Zhao, Zhengong Cai, Shaohua Wang, and Xiaohu Yang. 2024. CasModaTest: A Cascaded and Model-Agnostic Self-Directed Framework for Unit Test Generation. CoRR, abs/2406.15743 (2024), arXiv–2406. Saranya Alagarsamy, Chakkrit Tantithamthavorn, and Aldeida Aleti. 2024. A3Test: Assertion-Augmented Automated Test Case Generation. Inf. Softw. Technol., 176 (2024), 107565. Ermira Daka and Gordon Fraser. 2014. A Survey on Unit Testing Practices and Problems. In 25th IEEE International Symposium on Software Reliability Engineering, ISSRE 2014, Naples, Italy, November 3-6, 2014. IEEE Computer Society, 201–211. Vitaly Chipounov, Volodymyr Kuznetsov, and George Candea. 2011. S2E: a Platform for In-Vivo Multi-Path Analysis of Software Systems. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2011, Newport Beach, CA, USA, March 5-11, 2011. ACM, 265–278. Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, and Ryan Lowe. 2022. Training Language Models to Follow Instructions with Human Feedback. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. 35, 27730–27744. Quanjun Zhang, Chunrong Fang, Yang Xie, Yaxin Zhang, Yun Yang, Weisong Sun, Shengcheng Yu, and Zhenyu Chen. 2023. A Survey on Large Language Models for Software Engineering. CoRR, abs/2312.15223 (2023), arXiv–2312. Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2023. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. Carlos Pacheco and Michael D. Ernst. 2007. Randoop: Feedback-Directed Random Testing for Java. In Companion to the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2007, October 21-25, 2007, Montreal, Quebec, Canada. ACM, 815–816. Lei Ma, Cyrille Artho, Cheng Zhang, Hiroyuki Sato, Johannes Gmeiner, and Rudolf Ramler. 2015. GRT: Program-Analysis-Guided Random Testing (T). In 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9-13, 2015. IEEE Computer Society, 212–223. Siddhartha R. Dalal, Ashish Jain, Nachimuthu Karunanithi, J. M. Leaton, Christopher M. Lott, Gardner C. Patton, and Bruce M. Horowitz. 1999. Model-Based Testing in Practice. In Proceedings of the 1999 International Conference on Software Engineering, ICSE’ 99, Los Angeles, CA, USA, May 16-22, 1999. ACM, 285–294. Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Y. Wu, Y. K. Li, Fuli Luo, Yingfei Xiong, and Wenfeng Liang. 2024. DeepSeek-Coder: When the Large Language Model Meets Programming - The Rise of Code Intelligence. CoRR, abs/2401.14196 (2024), arXiv–2401. Quanjun Zhang, Chunrong Fang, Tongke Zhang, Bowen Yu, Weisong Sun, and Zhenyu Chen. 2023. Gamma: Revisiting Template-Based Automated Program Repair Via Mask Prediction. In 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Luxembourg, September 11-15, 2023. IEEE, Baresi Luciano (e_1_2_1_8_1) 2010; 2 Wang Yue (e_1_2_1_47_1) 2023 Zhang Quanjun (e_1_2_1_60_1) 2023 Hossain Soneya Binta (e_1_2_1_23_1) 2023 Ni Chao (e_1_2_1_30_1) 2024 Dinella Elizabeth (e_1_2_1_14_1) 2022 Hou Xinyi (e_1_2_1_24_1) 2023 Yang Lin (e_1_2_1_52_1) 2024 Liu Zhongxin (e_1_2_1_28_1) 2023 Alagarsamy Saranya (e_1_2_1_7_1) 2024; 176 Gargantini Angelo (e_1_2_1_18_1) 1999 Li Raymond (e_1_2_1_26_1) 2023 e_1_2_1_64_1 Zhang Quanjun (e_1_2_1_62_1) 2023 Pacheco Carlos (e_1_2_1_34_1) 2007 Tufano Michele (e_1_2_1_43_1) 2022 He Yibo (e_1_2_1_21_1) 2024 Vaswani Ashish (e_1_2_1_45_1) 2017 Gu Siqi (e_1_2_1_19_1) 2024 e_1_2_1_54_1 Ren Shuo (e_1_2_1_38_1) 2020 Zhao Wayne Xin (e_1_2_1_66_1) 2023 Cadar Cristian (e_1_2_1_9_1) 2011 e_1_2_1_12_1 e_1_2_1_4_1 e_1_2_1_2_1 e_1_2_1_39_1 Hossain Soneya Binta (e_1_2_1_22_1) 2024 Raffel Colin (e_1_2_1_37_1) 2020; 21 e_1_2_1_58_1 Papineni Kishore (e_1_2_1_35_1) 2002 Wang Yue (e_1_2_1_48_1) 2021 Ahmed Toufique (e_1_2_1_6_1) 2024 Hu Xing (e_1_2_1_25_1) 2023 Liu Jun (e_1_2_1_27_1) 2024 Jefferson Offutt A. (e_1_2_1_32_1) 1999; 429 Nijkamp Erik (e_1_2_1_31_1) 2023 e_1_2_1_40_1 Feng Zhangyin (e_1_2_1_16_1) 2020; 1547 Zhang Quanjun (e_1_2_1_63_1) 2024 e_1_2_1_46_1 e_1_2_1_61_1 Guo Daya (e_1_2_1_20_1) 2024 Zhang Quanjun (e_1_2_1_59_1) 2024 Yuan Zhiqiang (e_1_2_1_56_1) 2024 e_1_2_1_29_1 Yuan Wei (e_1_2_1_55_1) 2022 Dalal Siddhartha R. (e_1_2_1_13_1) 1999 Chen Yinghao (e_1_2_1_10_1) 2024 Sun Weifeng (e_1_2_1_42_1) 2023; 32 Ouyang Long (e_1_2_1_33_1) 2022 Yaraghi Ahmadreza Saboor (e_1_2_1_53_1) 2024 e_1_2_1_5_1 e_1_2_1_57_1 e_1_2_1_3_1 Watson Cody (e_1_2_1_49_1) 2020 Zhang Quanjun (e_1_2_1_65_1) 2023 e_1_2_1_1_1 e_1_2_1_11_1 Xie Tao (e_1_2_1_51_1) 2005 e_1_2_1_17_1 Tufano Michele (e_1_2_1_44_1) 2020 e_1_2_1_15_1 e_1_2_1_36_1 Sun Weisong (e_1_2_1_41_1) 2023 Wei Yuxiang (e_1_2_1_50_1) 2024
References_xml	– reference: Corina S. Pasareanu, Peter C. Mehlitz, David H. Bushnell, Karen Gundy-Burlet, Michael R. Lowry, Suzette Person, and Mark Pape. 2008. Combining Unit-Level Symbolic Execution and System-Level Concrete Execution for Testing NASA Software. In Proceedings of the ACM/SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2008, Seattle, WA, USA, July 20-24, 2008. ACM, 15–26. – reference: Gordon Fraser and Andrea Arcuri. 2011. Evosuite: Automatic Test Suite Generation for Object-Oriented Software. In SIGSOFT/FSE’11 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-19) and ESEC’11: 13th European Software Engineering Conference (ESEC-13), Szeged, Hungary, September 5-9, 2011. ACM, 416–419. – reference: Quanjun Zhang, Chunrong Fang, Bowen Yu, Weisong Sun, Tongke Zhang, and Zhenyu Chen. 2024. Pre-Trained Model-Based Automated Software Vulnerability Repair: How Far are We? IEEE Trans. Dependable Secur. Comput., 21, 4 (2024), 2507–2525. – reference: Jun Liu, Jiwei Yan, Yuanyuan Xie, Jun Yan, and Jian Zhang. 2024. Augmenting LLMs to Repair Obsolete Test Cases with Static Collector and Neural Reranker. CoRR, abs/2407.03625 (2024), arXiv–2407. – reference: Hao Yu, Yiling Lou, Ke Sun, Dezhi Ran, Tao Xie, Dan Hao, Ying Li, Ge Li, and Qianxiang Wang. 2022. Automated Assertion Generation via Information Retrieval and Its Integration with Deep learning. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 163–174. – reference: Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. 2020. CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. CoRR, abs/2009.10297 (2020), arXiv–2009. – reference: Cristian Cadar, Patrice Godefroid, Sarfraz Khurshid, Corina S. Pasareanu, Koushik Sen, Nikolai Tillmann, and Willem Visser. 2011. Symbolic Execution for Software Testing in Practice: Preliminary Assessment. In Proceedings of the 33rd International Conference on Software Engineering, ICSE 2011, Waikiki, Honolulu , HI, USA, May 21-28, 2011. ACM, 1066–1071. – reference: Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, and Zican Dong. 2023. A Survey of Large Language Models. arXiv e-prints, abs/2303.18223 (2023), arXiv–2303. – reference: Angelo Gargantini and Constance L. Heitmeyer. 1999. Using Model Checking to Generate Tests from Requirements Specifications. In Software Engineering - ESEC/FSE’99, 7th European Software Engineering Conference, Held Jointly with the 7th ACM SIGSOFT Symposium on the Foundations of Software Engineering, Toulouse, France, September 1999, Proceedings (Lecture Notes in Computer Science, Vol. 1687). Springer, 146–162. – reference: 2024. Commons CLI. https://commons.apache.org/proper/commons-lang/ – reference: Toufique Ahmed, Kunal Suresh Pai, Premkumar T. Devanbu, and Earl T. Barr. 2024. Automatic Semantic Augmentation of Language Model Prompts (for Code Summarization). In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024. ACM, 220:1–220:13. – reference: Yinghao Chen, Zehao Hu, Chen Zhi, Junxiao Han, Shuiguang Deng, and Jianwei Yin. 2024. ChatUniTest: A Framework for LLM-Based Test Generation. In Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, FSE 2024, Porto de Galinhas, Brazil, July 15-19, 2024. ACM, 572–576. – reference: Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, and Jenny Chim. 2023. StarCoder: May the Source Be with You!. Trans. Mach. Learn. Res., 2023 (2023). – reference: Eduard Paul Enoiu, Adnan Causevic, Thomas J. Ostrand, Elaine J. Weyuker, Daniel Sundmark, and Paul Pettersson. 2016. Automated Test Generation Using Model Checking: An Industrial Evaluation. Int. J. Softw. Tools Technol. Transf., 18, 3 (2016), 335–353. – reference: Luciano Baresi and Matteo Miraz. 2010. TestFul: Automatic Unit-Test Generation for Java Classes. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2, ICSE 2010, Cape Town, South Africa, 1-8 May 2010. ACM, 281–284. – reference: Soneya Binta Hossain and Matthew B. Dwyer. 2024. TOGLL: Correct and Strong Test Oracle Generation with LLMs. CoRR, abs/2405.03786 (2024), arXiv–2405. – reference: 2024. google gson. https://github.com/google/gson – reference: Yue Wang, Hung Le, Akhilesh Gotmare, Nghi D. Q. Bui, Junnan Li, and Steven C. H. Hoi. 2023. CodeT5+: Open Code Large Language Models for Code Understanding and Generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023. Association for Computational Linguistics, 1069–1088. – reference: Ahmadreza Saboor Yaraghi, Darren Holden, Nafiseh Kahani, and Lionel C. Briand. 2024. Automated Test Case Repair Using Language Models. CoRR, abs/2401.06765 (2024), arXiv–2401. – reference: Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6-12, 2002, Philadelphia, PA, USA. ACL, 311–318. – reference: Quanjun Zhang, Ye Shang, Chunrong Fang, Siqi Gu, Jianyi Zhou, and Zhenyu Chen. 2024. TestBench: Evaluating Class-Level Test Case Generation Capability of Large Language Models. arXiv e-prints, abs/2409.17561 (2024), arXiv–2409. – reference: Ermira Daka and Gordon Fraser. 2014. A Survey on Unit Testing Practices and Problems. In 25th IEEE International Symposium on Software Reliability Engineering, ISSRE 2014, Naples, Italy, November 3-6, 2014. IEEE Computer Society, 201–211. – reference: Lin Yang, Chen Yang, Shutao Gao, Weijing Wang, Bo Wang, Qihao Zhu, Xiao Chu, Jianyi Zhou, Guangtai Liang, Qianxiang Wang, and Junjie Chen. 2024. An Empirical Study of Unit Test Generation with Large Language Models. CoRR, abs/2406.18181 (2024), arXiv–2406. – reference: Xing Hu, Zhuang Liu, Xin Xia, Zhongxin Liu, Tongtong Xu, and Xiaohu Yang. 2023. Identify and Update Test Cases When Production Code Changes: A Transformer-Based Approach. In 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Luxembourg, September 11-15, 2023. IEEE, 1111–1122. – reference: Zhongxin Liu, Kui Liu, Xin Xia, and Xiaohu Yang. 2023. Towards More Realistic Evaluation for Neural Test Oracle Generation. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023, Seattle, WA, USA, July 17-21, 2023. ACM, 589–600. – reference: Quanjun Zhang, Weifeng Sun, Chunrong Fang, Bowen Yu, Hongyan Li, Meng Yan, Jianyi Zhou, and Zhenyu Chen. 2024. Exploring Automated Assertion Generation via Large Language Models. ACM Trans. Softw. Eng. Methodol., Oct., issn:1049-331X https://doi.org/10.1145/3699598 Just Accepted 10.1145/3699598 – reference: 2024. Commons CSV. https://commons.apache.org/proper/commons-csv/ – reference: Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020 (Findings of ACL, Vol. EMNLP 2020). Association for Computational Linguistics, 1536–1547. – reference: Elizabeth Dinella, Gabriel Ryan, Todd Mytkowicz, and Shuvendu K. Lahiri. 2022. TOGA: A Neural Method for Test Oracle Generation. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 2130–2141. – reference: 2024. Commons Lang. https://commons.apache.org/proper/commons-lang/ – reference: Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton-Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, and Gabriel Synnaeve. 2023. Code Llama: Open Foundation Models for Code. CoRR, abs/2308.12950 (2023), arXiv–2308. – reference: Vitaly Chipounov, Volodymyr Kuznetsov, and George Candea. 2011. S2E: a Platform for In-Vivo Multi-Path Analysis of Software Systems. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2011, Newport Beach, CA, USA, March 5-11, 2011. ACM, 265–278. – reference: Quanjun Zhang, Chunrong Fang, Yang Xie, Yuxiang Ma, Weisong Sun, Yun Yang, and Zhenyu Chen. 2024. A Systematic Literature Review on Large Language Models for Automated Program Repair. CoRR, abs/2405.01466 (2024), arXiv–2405. – reference: Siqi Gu, Chunrong Fang, Quanjun Zhang, Fangyuan Tian, Jianyi Zhou, and Zhenyu Chen. 2024. TestART: Improving LLM-based Unit Test via Co-evolution of Automated Generation and Repair Iteration. CoRR, abs/2408.03095 (2024), arXiv–2408. – reference: Yibo He, Jiaming Huang, Hao Yu, and Tao Xie. 2024. An Empirical Study on Focal Methods in Deep-Learning-Based Approaches for Assertion Generation. Proc. ACM Softw. Eng., 1, FSE (2024), 1750–1771. – reference: Quanjun Zhang, Chunrong Fang, Tongke Zhang, Bowen Yu, Weisong Sun, and Zhenyu Chen. 2023. Gamma: Revisiting Template-Based Automated Program Repair Via Mask Prediction. In 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Luxembourg, September 11-15, 2023. IEEE, 535–547. – reference: Yuxiang Wei, Zhe Wang, Jiawei Liu, Yifeng Ding, and Lingming Zhang. 2024. Magicoder: Empowering Code Generation with OSS-Instruct. In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024. OpenReview.net. – reference: 2024. JFreeChart. https://jfree.org/jfreechart/ – reference: Saranya Alagarsamy, Chakkrit Tantithamthavorn, and Aldeida Aleti. 2024. A3Test: Assertion-Augmented Automated Test Case Generation. Inf. Softw. Technol., 176 (2024), 107565. – reference: Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2023. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. – reference: Soneya Binta Hossain, Antonio Filieri, Matthew B. Dwyer, Sebastian G. Elbaum, and Willem Visser. 2023. Neural-Based Test Oracle Generation: A Large-Scale Evaluation and Lessons Learned. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, CA, USA, December 3-9, 2023. ACM, 120–132. – reference: A. Jefferson Offutt and Aynur Abdurazik. 1999. Generating Tests from UML Specifications. In « UML» ’99: The Unified Modeling Language - Beyond the Standard, Second International Conference, Fort Collins, CO, USA, October 28-30, 1999, Proceedings (Lecture Notes in Computer Science, Vol. 1723). Springer, 416–429. – reference: Quanjun Zhang, Chunrong Fang, Weisong Sun, Shengcheng Yu, Yutao Xu, and Yulei Liu. 2022. Test Case Prioritization Using Partial Attention. J. Syst. Softw., 192 (2022), 111419. – reference: Siddhartha R. Dalal, Ashish Jain, Nachimuthu Karunanithi, J. M. Leaton, Christopher M. Lott, Gardner C. Patton, and Bruce M. Horowitz. 1999. Model-Based Testing in Practice. In Proceedings of the 1999 International Conference on Software Engineering, ICSE’ 99, Los Angeles, CA, USA, May 16-22, 1999. ACM, 285–294. – reference: Quanjun Zhang, Chunrong Fang, Yuxiang Ma, Weisong Sun, and Zhenyu Chen. 2024. A Survey of Learning-based Automated Program Repair. ACM Trans. Softw. Eng. Methodol., 33, 2 (2024), 55:1–55:69. – reference: Quanjun Zhang, Tongke Zhang, Juan Zhai, Chunrong Fang, Bowen Yu, Weisong Sun, and Zhenyu Chen. 2023. A Critical Review of Large Language Model on Software Engineering: An Example from ChatGPT and Automated Program Repair. CoRR, abs/2310.08879 (2023), arXiv–2310. – reference: Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res., 21 (2020), 140:1–140:67. – reference: Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, Shao Kun Deng, and Neel Sundaresan. 2020. Unit Test Case Generation with Transformers and Focal Context. arXiv e-prints, abs/2009.05617 (2020), arXiv–2009. – reference: Wei Yuan, Quanjun Zhang, Tieke He, Chunrong Fang, Nguyen Quoc Viet Hung, Xiaodong Hao, and Hongzhi Yin. 2022. CIRCLE: Continual Repair across Programming Languages. In ISSTA ’22: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, South Korea, July 18 - 22, 2022. ACM, 678–690. – reference: Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, and Ryan Lowe. 2022. Training Language Models to Follow Instructions with Human Feedback. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. 35, 27730–27744. – reference: Quanjun Zhang, Chunrong Fang, Yang Xie, Yaxin Zhang, Yun Yang, Weisong Sun, Shengcheng Yu, and Zhenyu Chen. 2023. A Survey on Large Language Models for Software Engineering. CoRR, abs/2312.15223 (2023), arXiv–2312. – reference: Max Schäfer, Sarah Nadi, Aryaz Eghbali, and Frank Tip. 2024. An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation. IEEE Trans. Software Eng., 50, 1 (2024), 85–105. – reference: Junjie Wang, Yuchao Huang, Chunyang Chen, Zhe Liu, Song Wang, and Qing Wang. 2024. Software Testing With Large Language Models: Survey, Landscape, and Vision. IEEE Trans. Software Eng., 50, 4 (2024), 911–936. – reference: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. 5998–6008. – reference: Carlos Pacheco and Michael D. Ernst. 2007. Randoop: Feedback-Directed Random Testing for Java. In Companion to the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2007, October 21-25, 2007, Montreal, Quebec, Canada. ACM, 815–816. – reference: Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Y. Wu, Y. K. Li, Fuli Luo, Yingfei Xiong, and Wenfeng Liang. 2024. DeepSeek-Coder: When the Large Language Model Meets Programming - The Rise of Code Intelligence. CoRR, abs/2401.14196 (2024), arXiv–2401. – reference: Weisong Sun, Chunrong Fang, Yudu You, Yuchen Chen, Yi Liu, Chong Wang, Jian Zhang, Quanjun Zhang, Hanwei Qian, Wei Zhao, Yang Liu, and Zhenyu Chen. 2023. A Prompt Learning Framework for Source Code Summarization. CoRR, abs/2312.16066 (2023), arXiv–2312. – reference: Cody Watson, Michele Tufano, Kevin Moran, Gabriele Bavota, and Denys Poshyvanyk. 2020. On Learning Meaningful Assert Statements for Unit Test Cases. In ICSE ’20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020. ACM, 1398–1409. – reference: Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John C. Grundy, and Haoyu Wang. 2023. Large Language Models for Software Engineering: A Systematic Literature Review. CoRR, abs/2308.10620 (2023), arXiv–2308. – reference: Michele Tufano, Shao Kun Deng, Neel Sundaresan, and Alexey Svyatkovskiy. 2022. METHODS2TEST: A Dataset of Focal Methods Mapped to Test Cases. In 19th IEEE/ACM International Conference on Mining Software Repositories, MSR 2022, Pittsburgh, PA, USA, May 23-24, 2022. ACM, 299–303. – reference: Lei Ma, Cyrille Artho, Cheng Zhang, Hiroyuki Sato, Johannes Gmeiner, and Rudolf Ramler. 2015. GRT: Program-Analysis-Guided Random Testing (T). In 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9-13, 2015. IEEE Computer Society, 212–223. – reference: Tao Xie, Darko Marinov, Wolfram Schulte, and David Notkin. 2005. Symstra: A Framework for Generating Object-Oriented Unit Tests Using Symbolic Execution. In Tools and Algorithms for the Construction and Analysis of Systems, 11th International Conference, TACAS 2005, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2005,, April 4-8, 2005, Proceedings (Lecture Notes in Computer Science, Vol. 3440). Springer, 365–381. – reference: Weifeng Sun, Meng Yan, Zhongxin Liu, Xin Xia, Yan Lei, and David Lo. 2023. Revisiting the Identification of the Co-evolution of Production and Test Code. ACM Trans. Softw. Eng. Methodol., 32, 6 (2023), 152:1–152:37. – reference: Chao Ni, Xiaoya Wang, Liushan Chen, Dehai Zhao, Zhengong Cai, Shaohua Wang, and Xiaohu Yang. 2024. CasModaTest: A Cascaded and Model-Agnostic Self-Directed Framework for Unit Test Generation. CoRR, abs/2406.15743 (2024), arXiv–2406. – reference: Yue Wang, Weishi Wang, Shafiq R. Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021. Association for Computational Linguistics, 8696–8708. – reference: Zhiqiang Yuan, Mingwei Liu, Shiji Ding, Kaixin Wang, Yixuan Chen, Xin Peng, and Yiling Lou. 2024. Evaluating and Improving ChatGPT for Unit Test Generation. Proc. ACM Softw. Eng., 1, FSE (2024), 1703–1726. – ident: e_1_2_1_36_1 doi: 10.1145/1390630.1390635 – volume: 2 volume-title: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - year: 2010 ident: e_1_2_1_8_1 – ident: e_1_2_1_29_1 doi: 10.1109/ASE.2015.49 – volume-title: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics year: 2002 ident: e_1_2_1_35_1 – volume-title: Briand year: 2024 ident: e_1_2_1_53_1 – volume-title: Augmenting LLMs to Repair Obsolete Test Cases with Static Collector and Neural Reranker. CoRR, abs/2407.03625 year: 2024 ident: e_1_2_1_27_1 – volume-title: Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022 year: 2022 ident: e_1_2_1_33_1 – volume-title: A Critical Review of Large Language Model on Software Engineering: An Example from ChatGPT and Automated Program Repair. CoRR, abs/2310.08879 year: 2023 ident: e_1_2_1_65_1 – volume: 429 volume-title: Second International Conference year: 1999 ident: e_1_2_1_32_1 – ident: e_1_2_1_40_1 doi: 10.1109/TSE.2023.3334955 – volume-title: Dwyer year: 2024 ident: e_1_2_1_22_1 – ident: e_1_2_1_4_1 – volume-title: Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024 year: 2024 ident: e_1_2_1_6_1 – volume-title: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017 year: 2017 ident: e_1_2_1_45_1 – ident: e_1_2_1_54_1 doi: 10.1145/3510003.3510149 – volume-title: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis year: 2022 ident: e_1_2_1_55_1 – volume-title: TestART: Improving LLM-based Unit Test via Co-evolution of Automated Generation and Repair Iteration. CoRR, abs/2408.03095 year: 2024 ident: e_1_2_1_19_1 – volume-title: Proc. ACM Softw. Eng., 1, FSE year: 2024 ident: e_1_2_1_21_1 – volume-title: 42nd International Conference on Software Engineering year: 2020 ident: e_1_2_1_49_1 – volume-title: A Systematic Literature Review on Large Language Models for Automated Program Repair. CoRR, abs/2405.01466 year: 2024 ident: e_1_2_1_59_1 – volume: 176 start-page: 107565 year: 2024 ident: e_1_2_1_7_1 article-title: A3Test publication-title: Assertion-Augmented Automated Test Case Generation. Inf. Softw. Technol. – volume-title: A Survey of Large Language Models. arXiv e-prints, abs/2303.18223 year: 2023 ident: e_1_2_1_66_1 – volume-title: Magicoder: Empowering Code Generation with OSS-Instruct. In Forty-first International Conference on Machine Learning, ICML 2024 year: 2024 ident: e_1_2_1_50_1 – ident: e_1_2_1_12_1 doi: 10.1109/ISSRE.2014.11 – volume-title: CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In The Eleventh International Conference on Learning Representations, ICLR 2023 year: 2023 ident: e_1_2_1_31_1 – ident: e_1_2_1_46_1 doi: 10.1109/TSE.2024.3368208 – volume-title: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023 year: 2023 ident: e_1_2_1_28_1 – ident: e_1_2_1_2_1 – ident: e_1_2_1_5_1 – ident: e_1_2_1_15_1 doi: 10.1007/s10009-014-0355-9 – ident: e_1_2_1_17_1 doi: 10.1145/2025113.2025179 – start-page: 2023 volume-title: Trans. Mach. Learn. Res. year: 2023 ident: e_1_2_1_26_1 – volume-title: A Survey on Large Language Models for Software Engineering. CoRR, abs/2312.15223 year: 2023 ident: e_1_2_1_60_1 – volume-title: 19th IEEE/ACM International Conference on Mining Software Repositories, MSR 2022 year: 2022 ident: e_1_2_1_43_1 – ident: e_1_2_1_57_1 doi: 10.1145/3631974 – volume-title: CasModaTest: A Cascaded and Model-Agnostic Self-Directed Framework for Unit Test Generation. CoRR, abs/2406.15743 year: 2024 ident: e_1_2_1_30_1 – volume-title: TOGA: A Neural Method for Test Oracle Generation. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022 year: 2022 ident: e_1_2_1_14_1 – volume-title: 7th European Software Engineering Conference, Held Jointly with the 7th ACM SIGSOFT Symposium on the Foundations of Software Engineering year: 1999 ident: e_1_2_1_18_1 – ident: e_1_2_1_64_1 doi: 10.1145/3699598 – volume-title: Symstra: A Framework for Generating Object-Oriented Unit Tests Using Symbolic Execution. In Tools and Algorithms for the Construction and Analysis of Systems, 11th International Conference, TACAS year: 2005 ident: e_1_2_1_51_1 – volume-title: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023 year: 2023 ident: e_1_2_1_23_1 – volume-title: Shao Kun Deng, and Neel Sundaresan year: 2020 ident: e_1_2_1_44_1 – volume-title: Large Language Models for Software Engineering: A Systematic Literature Review. CoRR, abs/2308.10620 year: 2023 ident: e_1_2_1_24_1 – volume-title: OOPSLA 2007 year: 2007 ident: e_1_2_1_34_1 – volume-title: Identify and Update Test Cases When Production Code Changes: A Transformer-Based Approach. In 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023 year: 2023 ident: e_1_2_1_25_1 – ident: e_1_2_1_1_1 – volume-title: DeepSeek-Coder: When the Large Language Model Meets Programming - The Rise of Code Intelligence. CoRR, abs/2401.14196 year: 2024 ident: e_1_2_1_20_1 – ident: e_1_2_1_58_1 doi: 10.1016/j.jss.2022.111419 – ident: e_1_2_1_61_1 doi: 10.1109/TDSC.2023.3308897 – volume-title: TestBench: Evaluating Class-Level Test Case Generation Capability of Large Language Models. arXiv e-prints, abs/2409.17561 year: 2024 ident: e_1_2_1_63_1 – volume: 32 year: 2023 ident: e_1_2_1_42_1 article-title: Revisiting the Identification of the Co-evolution of Production and Test Code publication-title: ACM Trans. Softw. Eng. Methodol. – volume-title: Proceedings of the 1999 International Conference on Software Engineering, ICSE’ 99 year: 1999 ident: e_1_2_1_13_1 – ident: e_1_2_1_11_1 doi: 10.1145/1950365.1950396 – volume: 1547 volume-title: CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020 (Findings of ACL year: 2020 ident: e_1_2_1_16_1 – ident: e_1_2_1_3_1 – volume-title: An Empirical Study of Unit Test Generation with Large Language Models. CoRR, abs/2406.18181 year: 2024 ident: e_1_2_1_52_1 – volume-title: Gamma: Revisiting Template-Based Automated Program Repair Via Mask Prediction. In 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023 year: 2023 ident: e_1_2_1_62_1 – volume-title: Proceedings of the 33rd International Conference on Software Engineering, ICSE 2011, Waikiki, Honolulu , HI, USA year: 2011 ident: e_1_2_1_9_1 – volume-title: CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. CoRR, abs/2009.10297 year: 2020 ident: e_1_2_1_38_1 – volume-title: ChatUniTest: A Framework for LLM-Based Test Generation. In Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, FSE 2024, Porto de Galinhas year: 2024 ident: e_1_2_1_10_1 – volume: 21 year: 2020 ident: e_1_2_1_37_1 article-title: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer publication-title: J. Mach. Learn. Res. – volume-title: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana year: 2021 ident: e_1_2_1_48_1 – volume-title: Proc. ACM Softw. Eng., 1, FSE year: 2024 ident: e_1_2_1_56_1 – ident: e_1_2_1_39_1 – volume-title: A Prompt Learning Framework for Source Code Summarization. CoRR, abs/2312.16066 year: 2023 ident: e_1_2_1_41_1 – volume-title: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023 year: 2023 ident: e_1_2_1_47_1
SSID	ssj0002991170
Score	2.2956054
Snippet	Unit testing plays a pivotal role in software development, improving software quality and reliability. However, generating effective test cases manually is...
SourceID	crossref acm
SourceType	Enrichment Source Index Database Publisher
StartPage	1678
SubjectTerms	Software and its engineering Software testing and debugging
SubjectTermsDisplay	Software and its engineering -- Software testing and debugging
Title	A Large-Scale Empirical Study on Fine-Tuning Large Language Models for Unit Testing
URI	https://dl.acm.org/doi/10.1145/3728951
Volume	2
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2994-970X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002991170 issn: 2994-970X databaseCode: M~E dateStart: 20240101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwELaWwoELjwJieckHxGUVETvO6xituoBEq6IEaTmt4kfEVtt06TZtT_0b_F1mYicbVpWAAxcrscZWNPPJMxmNvyHkrfEVEyxVHviyyhORL7xESaTcZ9qPudHKL9tmE_HRUTKfp8ej0c_uLszlKq7r5Po6Xf9XU8McGBuvzv6DuftNYQKewegwgtlh_CvDZ5PPWN3t5aB9Mzk4XS8tC0hu6aPryQwCS69o2oRIKwqjTVq2ndFWLUNDG4xOCuTgcL7NRbDHvcfbdPUF2fQQ993AiX6FhWRmy3HYJ3C6vPS3Hkh9qvpLU9YnTQ_SmZuefsdCxe0WH5o2U7v8sRxmKniIFVV8kLzkSEOcxv7c-p5b5tyJzAfA-5TnRTY4YVlkW_44b430grd7AoGkGUEMP5SO0fY3ru0dH9hXJtp72uHCLbxD7vI4TLFW8PBmm7yDr8aOPdi4sPt-eyEb1753azHeUaeDeGcQuBSPyAP3x0Ezi5THZGTqffKw6-ZB3eH-hOQZHQCH9sChLXDoWU0HwLGitAMOtcChAByKwKEOOE_J19lBMf3ouZYbXsk4Z14ZIz9uoBPOjEzggNdlHERVkjKtK-1LZaTSQsswlFVQlUKJyDDpy7QKS5HyKnhG9uqz2jwn1PiJjJRISwiThExAUaFiUugQ3gIRRWOyD-pZrC2pSqfwMXnXqWuhHEs9NktZLXYsMya0F-z22BF58WeRl-T-FqmvyN7FeWNek3vq8mK5OX_TGv0XumN9Ng
linkProvider	ISSN International Centre
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Large-Scale+Empirical+Study+on+Fine-Tuning+Large+Language+Models+for+Unit+Testing&rft.jtitle=Proceedings+of+the+ACM+on+software+engineering&rft.au=Shang%2C+Ye&rft.au=Zhang%2C+Quanjun&rft.au=Fang%2C+Chunrong&rft.au=Gu%2C+Siqi&rft.date=2025-06-22&rft.issn=2994-970X&rft.eissn=2994-970X&rft.volume=2&rft.issue=ISSTA&rft.spage=1678&rft.epage=1700&rft_id=info:doi/10.1145%2F3728951&rft.externalDBID=n%2Fa&rft.externalDocID=10_1145_3728951
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2994-970X&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2994-970X&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2994-970X&client=summon