A Large-Scale Empirical Study on Fine-Tuning Large Language Models for Unit Testing

Unit testing plays a pivotal role in software development, improving software quality and reliability. However, generating effective test cases manually is time-consuming, prompting interest in unit testing research. Recently, Large Language Models (LLMs) have shown potential in various unit testing...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the ACM on software engineering Jg. 2; H. ISSTA; S. 1678 - 1700
Hauptverfasser: Shang, Ye, Zhang, Quanjun, Fang, Chunrong, Gu, Siqi, Zhou, Jianyi, Chen, Zhenyu
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York, NY, USA ACM 22.06.2025
Schlagworte:
ISSN:2994-970X, 2994-970X
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Unit testing plays a pivotal role in software development, improving software quality and reliability. However, generating effective test cases manually is time-consuming, prompting interest in unit testing research. Recently, Large Language Models (LLMs) have shown potential in various unit testing tasks, including test generation, assertion generation, and test evolution, but existing studies are limited in scope and lack a systematic evaluation of the effectiveness of LLMs. To bridge this gap, we present a large-scale empirical study on fine-tuning LLMs for unit testing. Our study involves three unit testing tasks, five benchmarks, eight evaluation metrics, and 37 popular LLMs across various architectures and sizes, consuming over 3,000 NVIDIA A100 GPU hours. We focus on three key research questions: (1) the performance of LLMs compared to state-of-the-art methods, (2) the impact of different factors on LLM performance, and (3) the effectiveness of fine-tuning versus prompt engineering. Our findings reveal that LLMs outperform existing state-of-the-art approaches on all three unit testing tasks across nearly all metrics, highlighting the potential of fine-tuning LLMs in unit testing tasks. Furthermore, large-scale, decoder-only models achieve the best results across tasks, while encoder-decoder models perform better under the same parameter scale. Additionally, the comparison of the performance between fine-tuning and prompt engineering approaches reveals the considerable potential capability of the prompt engineering approach in unit testing tasks. We then discuss the concerned issues on the test generation task, including data leakage issues, bug detection capabilities, and metrics comparisons. Finally, we further pinpoint carious practical guidelines for LLM-based approaches to unit testing tasks in the near future. Overall, our work demonstrates the promising future of fine-tuning LLMs on unit testing tasks and reduces the manual efforts of unit testing experts in practical scenarios.
AbstractList Unit testing plays a pivotal role in software development, improving software quality and reliability. However, generating effective test cases manually is time-consuming, prompting interest in unit testing research. Recently, Large Language Models (LLMs) have shown potential in various unit testing tasks, including test generation, assertion generation, and test evolution, but existing studies are limited in scope and lack a systematic evaluation of the effectiveness of LLMs. To bridge this gap, we present a large-scale empirical study on fine-tuning LLMs for unit testing. Our study involves three unit testing tasks, five benchmarks, eight evaluation metrics, and 37 popular LLMs across various architectures and sizes, consuming over 3,000 NVIDIA A100 GPU hours. We focus on three key research questions: (1) the performance of LLMs compared to state-of-the-art methods, (2) the impact of different factors on LLM performance, and (3) the effectiveness of fine-tuning versus prompt engineering. Our findings reveal that LLMs outperform existing state-of-the-art approaches on all three unit testing tasks across nearly all metrics, highlighting the potential of fine-tuning LLMs in unit testing tasks. Furthermore, large-scale, decoder-only models achieve the best results across tasks, while encoder-decoder models perform better under the same parameter scale. Additionally, the comparison of the performance between fine-tuning and prompt engineering approaches reveals the considerable potential capability of the prompt engineering approach in unit testing tasks. We then discuss the concerned issues on the test generation task, including data leakage issues, bug detection capabilities, and metrics comparisons. Finally, we further pinpoint carious practical guidelines for LLM-based approaches to unit testing tasks in the near future. Overall, our work demonstrates the promising future of fine-tuning LLMs on unit testing tasks and reduces the manual efforts of unit testing experts in practical scenarios.
Unit testing plays a pivotal role in software development, improving software quality and reliability. However, generating effective test cases manually is time-consuming, prompting interest in unit testing research. Recently, Large Language Models (LLMs) have shown potential in various unit testing tasks, including test generation, assertion generation, and test evolution, but existing studies are limited in scope and lack a systematic evaluation of the effectiveness of LLMs. To bridge this gap, we present a large-scale empirical study on fine-tuning LLMs for unit testing. Our study involves three unit testing tasks, five benchmarks, eight evaluation metrics, and 37 popular LLMs across various architectures and sizes, consuming over 3,000 NVIDIA A100 GPU hours. We focus on three key research questions: (1) the performance of LLMs compared to state-of-the-art methods, (2) the impact of different factors on LLM performance, and (3) the effectiveness of fine-tuning versus prompt engineering. Our findings reveal that LLMs outperform existing state-of-the-art approaches on all three unit testing tasks across nearly all metrics, highlighting the potential of fine-tuning LLMs in unit testing tasks. Furthermore, large-scale, decoder-only models achieve the best results across tasks, while encoder-decoder models perform better under the same parameter scale. Additionally, the comparison of the performance between fine-tuning and prompt engineering approaches reveals the considerable potential capability of the prompt engineering approach in unit testing tasks. We then discuss the concerned issues on the test generation task, including data leakage issues, bug detection capabilities, and metrics comparisons. Finally, we further pinpoint carious practical guidelines for LLM-based approaches to unit testing tasks in the near future. Overall, our work demonstrates the promising future of fine-tuning LLMs on unit testing tasks and reduces the manual efforts of unit testing experts in practical scenarios.
ArticleNumber ISSTA074
Author Gu, Siqi
Zhang, Quanjun
Chen, Zhenyu
Zhou, Jianyi
Shang, Ye
Fang, Chunrong
Author_xml – sequence: 1
  givenname: Ye
  orcidid: 0009-0000-8699-8075
  surname: Shang
  fullname: Shang, Ye
  email: 522023320132@smail.nju.edu.cn
  organization: Nanjing University, Nanjing, China
– sequence: 2
  givenname: Quanjun
  orcidid: 0000-0002-2495-3805
  surname: Zhang
  fullname: Zhang, Quanjun
  email: quanjun.zhang@smail.nju.edu.cn
  organization: Nanjing University of Science and Technology, Nanjing, China
– sequence: 3
  givenname: Chunrong
  orcidid: 0000-0002-9930-7111
  surname: Fang
  fullname: Fang, Chunrong
  email: fangchunrong@nju.edu.cn
  organization: Nanjing University, Nanjing, China
– sequence: 4
  givenname: Siqi
  orcidid: 0000-0001-5514-6734
  surname: Gu
  fullname: Gu, Siqi
  email: siqi.gu@smail.nju.edu.cn
  organization: Nanjing University, Nanjing, China
– sequence: 5
  givenname: Jianyi
  orcidid: 0000-0002-4867-5416
  surname: Zhou
  fullname: Zhou, Jianyi
  email: zhoujianyi2@huawei.com
  organization: Huawei Cloud Computing Technologies, Beijing, China
– sequence: 6
  givenname: Zhenyu
  orcidid: 0000-0002-9592-7022
  surname: Chen
  fullname: Chen, Zhenyu
  email: zychen@nju.edu.cn
  organization: Shenzhen Research Institute of Nanjing University, Shenzhen, China
BookMark eNptkE1LAzEQhoNUsNbi3VNunqJJNtvdHEtpVah46Ba8LZOvEtlmS7J76L93pVVEvMw8zDy8h_cajUIbLEK3jD4wJvLHrOClzNkFGnMpBZEFfR_94is0TemDUjpcGCvoGG3meA1xZ8lGQ2Pxcn_w0Q-IN11vjrgNeOWDJVUffNid1GGGXQ8DvLbGNgm7NuJt8B2ubOoG7QZdOmiSnZ73BG1Xy2rxTNZvTy-L-ZoA45wRKGiW88yUnFlVUkENFNnMlZIZ4wxV2ipthFF5rlzmQGgxs0xRJV0OQnKXTdD9KVfHNqVoXX2Ifg_xWDNaf9VRn-sYTPLH1L6Dzrehi-Cbf_y7kw96_xP6_fwEgbdqaA
CitedBy_id crossref_primary_10_3390_make7030097
Cites_doi 10.1145/1390630.1390635
10.1109/ASE.2015.49
10.1109/TSE.2023.3334955
10.1145/3510003.3510149
10.1109/ISSRE.2014.11
10.1109/TSE.2024.3368208
10.1007/s10009-014-0355-9
10.1145/2025113.2025179
10.1145/3631974
10.1145/3699598
10.1016/j.jss.2022.111419
10.1109/TDSC.2023.3308897
10.1145/1950365.1950396
ContentType Journal Article
Copyright Owner/Author
Copyright_xml – notice: Owner/Author
DBID AAYXX
CITATION
DOI 10.1145/3728951
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
CrossRef
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2994-970X
EndPage 1700
ExternalDocumentID 10_1145_3728951
3728951
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: U24A20337, 61932012, 62372228
  funderid: https://doi.org/10.13039/501100001809
– fundername: Science, Technology and Innovation Commission of Shenzhen Municipality
  grantid: CJGJZD20200617103001003, 2021Szvup057
  funderid: https://doi.org/10.13039/501100010877
– fundername: CCF-Huawei Populus Grove Fund
  grantid: CCF-HuaweiSE202304, CCF-HuaweiSY202306
GroupedDBID AAKMM
ACM
AEJOY
AKRVB
ALMA_UNASSIGNED_HOLDINGS
LHSKQ
M~E
AAYXX
CITATION
ID FETCH-LOGICAL-a1221-a703523d821eb8040da736f891ddfd0bcebcd4db55bf3fa4c46e1b0b9f5a492f3
ISSN 2994-970X
IngestDate Tue Nov 18 21:57:37 EST 2025
Sat Nov 29 07:43:49 EST 2025
Mon Jul 14 20:48:59 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue ISSTA
Keywords Unit Testing
Large Language Model
AI for SE
Software Testing
Language English
License This work is licensed under Creative Commons Attribution International 4.0.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-a1221-a703523d821eb8040da736f891ddfd0bcebcd4db55bf3fa4c46e1b0b9f5a492f3
ORCID 0000-0002-4867-5416
0000-0002-9592-7022
0000-0001-5514-6734
0009-0000-8699-8075
0000-0002-2495-3805
0000-0002-9930-7111
OpenAccessLink https://dl.acm.org/doi/10.1145/3728951
PageCount 23
ParticipantIDs crossref_primary_10_1145_3728951
crossref_citationtrail_10_1145_3728951
acm_primary_3728951
PublicationCentury 2000
PublicationDate 20250622
2025-06-22
PublicationDateYYYYMMDD 2025-06-22
PublicationDate_xml – month: 06
  year: 2025
  text: 20250622
  day: 22
PublicationDecade 2020
PublicationPlace New York, NY, USA
PublicationPlace_xml – name: New York, NY, USA
PublicationTitle Proceedings of the ACM on software engineering
PublicationTitleAbbrev ACM PACMSE
PublicationYear 2025
Publisher ACM
Publisher_xml – name: ACM
References 2024. Commons Lang. https://commons.apache.org/proper/commons-lang
Tao Xie, Darko Marinov, Wolfram Schulte, and David Notkin. 2005. Symstra: A Framework for Generating Object-Oriented Unit Tests Using Symbolic Execution. In Tools and Algorithms for the Construction and Analysis of Systems, 11th International Conference, TACAS 2005, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2005,, April 4-8, 2005, Proceedings (Lecture Notes in Computer Science, Vol. 3440). Springer, 365–381.
A. Jefferson Offutt and Aynur Abdurazik. 1999. Generating Tests from UML Specifications. In « UML» ’99: The Unified Modeling Language - Beyond the Standard, Second International Conference, Fort Collins, CO, USA, October 28-30, 1999, Proceedings (Lecture Notes in Computer Science, Vol. 1723). Springer, 416–429.
Luciano Baresi and Matteo Miraz. 2010. TestFul: Automatic Unit-Test Generation for Java Classes. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2, ICSE 2010, Cape Town, South Africa, 1-8 May 2010. ACM, 281–284.
Gordon Fraser and Andrea Arcuri. 2011. Evosuite: Automatic Test Suite Generation for Object-Oriented Software. In SIGSOFT/FSE’11 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-19) and ESEC’11: 13th European Software Engineering Conference (ESEC-13), Szeged, Hungary, September 5-9, 2011. ACM, 416–419.
Siqi Gu, Chunrong Fang, Quanjun Zhang, Fangyuan Tian, Jianyi Zhou, and Zhenyu Chen. 2024. TestART: Improving LLM-based Unit Test via Co-evolution of Automated Generation and Repair Iteration. CoRR, abs/2408.03095 (2024), arXiv–2408.
Soneya Binta Hossain, Antonio Filieri, Matthew B. Dwyer, Sebastian G. Elbaum, and Willem Visser. 2023. Neural-Based Test Oracle Generation: A Large-Scale Evaluation and Lessons Learned. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, CA, USA, December 3-9, 2023. ACM, 120–132.
2024. JFreeChart. https://jfree.org/jfreechart
Toufique Ahmed, Kunal Suresh Pai, Premkumar T. Devanbu, and Earl T. Barr. 2024. Automatic Semantic Augmentation of Language Model Prompts (for Code Summarization). In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024. ACM, 220:1–220:13.
Soneya Binta Hossain and Matthew B. Dwyer. 2024. TOGLL: Correct and Strong Test Oracle Generation with LLMs. CoRR, abs/2405.03786 (2024), arXiv–2405.
Quanjun Zhang, Ye Shang, Chunrong Fang, Siqi Gu, Jianyi Zhou, and Zhenyu Chen. 2024. TestBench: Evaluating Class-Level Test Case Generation Capability of Large Language Models. arXiv e-prints, abs/2409.17561 (2024), arXiv–2409.
Quanjun Zhang, Chunrong Fang, Yang Xie, Yuxiang Ma, Weisong Sun, Yun Yang, and Zhenyu Chen. 2024. A Systematic Literature Review on Large Language Models for Automated Program Repair. CoRR, abs/2405.01466 (2024), arXiv–2405.
Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, Shao Kun Deng, and Neel Sundaresan. 2020. Unit Test Case Generation with Transformers and Focal Context. arXiv e-prints, abs/2009.05617 (2020), arXiv–2009.
Angelo Gargantini and Constance L. Heitmeyer. 1999. Using Model Checking to Generate Tests from Requirements Specifications. In Software Engineering - ESEC/FSE’99, 7th European Software Engineering Conference, Held Jointly with the 7th ACM SIGSOFT Symposium on the Foundations of Software Engineering, Toulouse, France, September 1999, Proceedings (Lecture Notes in Computer Science, Vol. 1687). Springer, 146–162.
2024. google gson. https://github.com/google/gson
Yue Wang, Weishi Wang, Shafiq R. Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021. Association for Computational Linguistics, 8696–8708.
Quanjun Zhang, Chunrong Fang, Weisong Sun, Shengcheng Yu, Yutao Xu, and Yulei Liu. 2022. Test Case Prioritization Using Partial Attention. J. Syst. Softw., 192 (2022), 111419.
Cristian Cadar, Patrice Godefroid, Sarfraz Khurshid, Corina S. Pasareanu, Koushik Sen, Nikolai Tillmann, and Willem Visser. 2011. Symbolic Execution for Software Testing in Practice: Preliminary Assessment. In Proceedings of the 33rd International Conference on Software Engineering, ICSE 2011, Waikiki, Honolulu , HI, USA, May 21-28, 2011. ACM, 1066–1071.
Weifeng Sun, Meng Yan, Zhongxin Liu, Xin Xia, Yan Lei, and David Lo. 2023. Revisiting the Identification of the Co-evolution of Production and Test Code. ACM Trans. Softw. Eng. Methodol., 32, 6 (2023), 152:1–152:37.
Michele Tufano, Shao Kun Deng, Neel Sundaresan, and Alexey Svyatkovskiy. 2022. METHODS2TEST: A Dataset of Focal Methods Mapped to Test Cases. In 19th IEEE/ACM International Conference on Mining Software Repositories, MSR 2022, Pittsburgh, PA, USA, May 23-24, 2022. ACM, 299–303.
Quanjun Zhang, Chunrong Fang, Yuxiang Ma, Weisong Sun, and Zhenyu Chen. 2024. A Survey of Learning-based Automated Program Repair. ACM Trans. Softw. Eng. Methodol., 33, 2 (2024), 55:1–55:69.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. 5998–6008.
Eduard Paul Enoiu, Adnan Causevic, Thomas J. Ostrand, Elaine J. Weyuker, Daniel Sundmark, and Paul Pettersson. 2016. Automated Test Generation Using Model Checking: An Industrial Evaluation. Int. J. Softw. Tools Technol. Transf., 18, 3 (2016), 335–353.
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6-12, 2002, Philadelphia, PA, USA. ACL, 311–318.
Elizabeth Dinella, Gabriel Ryan, Todd Mytkowicz, and Shuvendu K. Lahiri. 2022. TOGA: A Neural Method for Test Oracle Generation. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 2130–2141.
Chao Ni, Xiaoya Wang, Liushan Chen, Dehai Zhao, Zhengong Cai, Shaohua Wang, and Xiaohu Yang. 2024. CasModaTest: A Cascaded and Model-Agnostic Self-Directed Framework for Unit Test Generation. CoRR, abs/2406.15743 (2024), arXiv–2406.
Saranya Alagarsamy, Chakkrit Tantithamthavorn, and Aldeida Aleti. 2024. A3Test: Assertion-Augmented Automated Test Case Generation. Inf. Softw. Technol., 176 (2024), 107565.
Ermira Daka and Gordon Fraser. 2014. A Survey on Unit Testing Practices and Problems. In 25th IEEE International Symposium on Software Reliability Engineering, ISSRE 2014, Naples, Italy, November 3-6, 2014. IEEE Computer Society, 201–211.
Vitaly Chipounov, Volodymyr Kuznetsov, and George Candea. 2011. S2E: a Platform for In-Vivo Multi-Path Analysis of Software Systems. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2011, Newport Beach, CA, USA, March 5-11, 2011. ACM, 265–278.
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, and Ryan Lowe. 2022. Training Language Models to Follow Instructions with Human Feedback. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. 35, 27730–27744.
Quanjun Zhang, Chunrong Fang, Yang Xie, Yaxin Zhang, Yun Yang, Weisong Sun, Shengcheng Yu, and Zhenyu Chen. 2023. A Survey on Large Language Models for Software Engineering. CoRR, abs/2312.15223 (2023), arXiv–2312.
Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2023. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
Carlos Pacheco and Michael D. Ernst. 2007. Randoop: Feedback-Directed Random Testing for Java. In Companion to the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2007, October 21-25, 2007, Montreal, Quebec, Canada. ACM, 815–816.
Lei Ma, Cyrille Artho, Cheng Zhang, Hiroyuki Sato, Johannes Gmeiner, and Rudolf Ramler. 2015. GRT: Program-Analysis-Guided Random Testing (T). In 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9-13, 2015. IEEE Computer Society, 212–223.
Siddhartha R. Dalal, Ashish Jain, Nachimuthu Karunanithi, J. M. Leaton, Christopher M. Lott, Gardner C. Patton, and Bruce M. Horowitz. 1999. Model-Based Testing in Practice. In Proceedings of the 1999 International Conference on Software Engineering, ICSE’ 99, Los Angeles, CA, USA, May 16-22, 1999. ACM, 285–294.
Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Y. Wu, Y. K. Li, Fuli Luo, Yingfei Xiong, and Wenfeng Liang. 2024. DeepSeek-Coder: When the Large Language Model Meets Programming - The Rise of Code Intelligence. CoRR, abs/2401.14196 (2024), arXiv–2401.
Quanjun Zhang, Chunrong Fang, Tongke Zhang, Bowen Yu, Weisong Sun, and Zhenyu Chen. 2023. Gamma: Revisiting Template-Based Automated Program Repair Via Mask Prediction. In 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Luxembourg, September 11-15, 2023. IEEE,
Baresi Luciano (e_1_2_1_8_1) 2010; 2
Wang Yue (e_1_2_1_47_1) 2023
Zhang Quanjun (e_1_2_1_60_1) 2023
Hossain Soneya Binta (e_1_2_1_23_1) 2023
Ni Chao (e_1_2_1_30_1) 2024
Dinella Elizabeth (e_1_2_1_14_1) 2022
Hou Xinyi (e_1_2_1_24_1) 2023
Yang Lin (e_1_2_1_52_1) 2024
Liu Zhongxin (e_1_2_1_28_1) 2023
Alagarsamy Saranya (e_1_2_1_7_1) 2024; 176
Gargantini Angelo (e_1_2_1_18_1) 1999
Li Raymond (e_1_2_1_26_1) 2023
e_1_2_1_64_1
Zhang Quanjun (e_1_2_1_62_1) 2023
Pacheco Carlos (e_1_2_1_34_1) 2007
Tufano Michele (e_1_2_1_43_1) 2022
He Yibo (e_1_2_1_21_1) 2024
Vaswani Ashish (e_1_2_1_45_1) 2017
Gu Siqi (e_1_2_1_19_1) 2024
e_1_2_1_54_1
Ren Shuo (e_1_2_1_38_1) 2020
Zhao Wayne Xin (e_1_2_1_66_1) 2023
Cadar Cristian (e_1_2_1_9_1) 2011
e_1_2_1_12_1
e_1_2_1_4_1
e_1_2_1_2_1
e_1_2_1_39_1
Hossain Soneya Binta (e_1_2_1_22_1) 2024
Raffel Colin (e_1_2_1_37_1) 2020; 21
e_1_2_1_58_1
Papineni Kishore (e_1_2_1_35_1) 2002
Wang Yue (e_1_2_1_48_1) 2021
Ahmed Toufique (e_1_2_1_6_1) 2024
Hu Xing (e_1_2_1_25_1) 2023
Liu Jun (e_1_2_1_27_1) 2024
Jefferson Offutt A. (e_1_2_1_32_1) 1999; 429
Nijkamp Erik (e_1_2_1_31_1) 2023
e_1_2_1_40_1
Feng Zhangyin (e_1_2_1_16_1) 2020; 1547
Zhang Quanjun (e_1_2_1_63_1) 2024
e_1_2_1_46_1
e_1_2_1_61_1
Guo Daya (e_1_2_1_20_1) 2024
Zhang Quanjun (e_1_2_1_59_1) 2024
Yuan Zhiqiang (e_1_2_1_56_1) 2024
e_1_2_1_29_1
Yuan Wei (e_1_2_1_55_1) 2022
Dalal Siddhartha R. (e_1_2_1_13_1) 1999
Chen Yinghao (e_1_2_1_10_1) 2024
Sun Weifeng (e_1_2_1_42_1) 2023; 32
Ouyang Long (e_1_2_1_33_1) 2022
Yaraghi Ahmadreza Saboor (e_1_2_1_53_1) 2024
e_1_2_1_5_1
e_1_2_1_57_1
e_1_2_1_3_1
Watson Cody (e_1_2_1_49_1) 2020
Zhang Quanjun (e_1_2_1_65_1) 2023
e_1_2_1_1_1
e_1_2_1_11_1
Xie Tao (e_1_2_1_51_1) 2005
e_1_2_1_17_1
Tufano Michele (e_1_2_1_44_1) 2020
e_1_2_1_15_1
e_1_2_1_36_1
Sun Weisong (e_1_2_1_41_1) 2023
Wei Yuxiang (e_1_2_1_50_1) 2024
References_xml – reference: Corina S. Pasareanu, Peter C. Mehlitz, David H. Bushnell, Karen Gundy-Burlet, Michael R. Lowry, Suzette Person, and Mark Pape. 2008. Combining Unit-Level Symbolic Execution and System-Level Concrete Execution for Testing NASA Software. In Proceedings of the ACM/SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2008, Seattle, WA, USA, July 20-24, 2008. ACM, 15–26.
– reference: Gordon Fraser and Andrea Arcuri. 2011. Evosuite: Automatic Test Suite Generation for Object-Oriented Software. In SIGSOFT/FSE’11 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-19) and ESEC’11: 13th European Software Engineering Conference (ESEC-13), Szeged, Hungary, September 5-9, 2011. ACM, 416–419.
– reference: Quanjun Zhang, Chunrong Fang, Bowen Yu, Weisong Sun, Tongke Zhang, and Zhenyu Chen. 2024. Pre-Trained Model-Based Automated Software Vulnerability Repair: How Far are We? IEEE Trans. Dependable Secur. Comput., 21, 4 (2024), 2507–2525.
– reference: Jun Liu, Jiwei Yan, Yuanyuan Xie, Jun Yan, and Jian Zhang. 2024. Augmenting LLMs to Repair Obsolete Test Cases with Static Collector and Neural Reranker. CoRR, abs/2407.03625 (2024), arXiv–2407.
– reference: Hao Yu, Yiling Lou, Ke Sun, Dezhi Ran, Tao Xie, Dan Hao, Ying Li, Ge Li, and Qianxiang Wang. 2022. Automated Assertion Generation via Information Retrieval and Its Integration with Deep learning. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 163–174.
– reference: Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. 2020. CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. CoRR, abs/2009.10297 (2020), arXiv–2009.
– reference: Cristian Cadar, Patrice Godefroid, Sarfraz Khurshid, Corina S. Pasareanu, Koushik Sen, Nikolai Tillmann, and Willem Visser. 2011. Symbolic Execution for Software Testing in Practice: Preliminary Assessment. In Proceedings of the 33rd International Conference on Software Engineering, ICSE 2011, Waikiki, Honolulu , HI, USA, May 21-28, 2011. ACM, 1066–1071.
– reference: Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, and Zican Dong. 2023. A Survey of Large Language Models. arXiv e-prints, abs/2303.18223 (2023), arXiv–2303.
– reference: Angelo Gargantini and Constance L. Heitmeyer. 1999. Using Model Checking to Generate Tests from Requirements Specifications. In Software Engineering - ESEC/FSE’99, 7th European Software Engineering Conference, Held Jointly with the 7th ACM SIGSOFT Symposium on the Foundations of Software Engineering, Toulouse, France, September 1999, Proceedings (Lecture Notes in Computer Science, Vol. 1687). Springer, 146–162.
– reference: 2024. Commons CLI. https://commons.apache.org/proper/commons-lang/
– reference: Toufique Ahmed, Kunal Suresh Pai, Premkumar T. Devanbu, and Earl T. Barr. 2024. Automatic Semantic Augmentation of Language Model Prompts (for Code Summarization). In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024. ACM, 220:1–220:13.
– reference: Yinghao Chen, Zehao Hu, Chen Zhi, Junxiao Han, Shuiguang Deng, and Jianwei Yin. 2024. ChatUniTest: A Framework for LLM-Based Test Generation. In Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, FSE 2024, Porto de Galinhas, Brazil, July 15-19, 2024. ACM, 572–576.
– reference: Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, and Jenny Chim. 2023. StarCoder: May the Source Be with You!. Trans. Mach. Learn. Res., 2023 (2023).
– reference: Eduard Paul Enoiu, Adnan Causevic, Thomas J. Ostrand, Elaine J. Weyuker, Daniel Sundmark, and Paul Pettersson. 2016. Automated Test Generation Using Model Checking: An Industrial Evaluation. Int. J. Softw. Tools Technol. Transf., 18, 3 (2016), 335–353.
– reference: Luciano Baresi and Matteo Miraz. 2010. TestFul: Automatic Unit-Test Generation for Java Classes. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2, ICSE 2010, Cape Town, South Africa, 1-8 May 2010. ACM, 281–284.
– reference: Soneya Binta Hossain and Matthew B. Dwyer. 2024. TOGLL: Correct and Strong Test Oracle Generation with LLMs. CoRR, abs/2405.03786 (2024), arXiv–2405.
– reference: 2024. google gson. https://github.com/google/gson
– reference: Yue Wang, Hung Le, Akhilesh Gotmare, Nghi D. Q. Bui, Junnan Li, and Steven C. H. Hoi. 2023. CodeT5+: Open Code Large Language Models for Code Understanding and Generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023. Association for Computational Linguistics, 1069–1088.
– reference: Ahmadreza Saboor Yaraghi, Darren Holden, Nafiseh Kahani, and Lionel C. Briand. 2024. Automated Test Case Repair Using Language Models. CoRR, abs/2401.06765 (2024), arXiv–2401.
– reference: Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6-12, 2002, Philadelphia, PA, USA. ACL, 311–318.
– reference: Quanjun Zhang, Ye Shang, Chunrong Fang, Siqi Gu, Jianyi Zhou, and Zhenyu Chen. 2024. TestBench: Evaluating Class-Level Test Case Generation Capability of Large Language Models. arXiv e-prints, abs/2409.17561 (2024), arXiv–2409.
– reference: Ermira Daka and Gordon Fraser. 2014. A Survey on Unit Testing Practices and Problems. In 25th IEEE International Symposium on Software Reliability Engineering, ISSRE 2014, Naples, Italy, November 3-6, 2014. IEEE Computer Society, 201–211.
– reference: Lin Yang, Chen Yang, Shutao Gao, Weijing Wang, Bo Wang, Qihao Zhu, Xiao Chu, Jianyi Zhou, Guangtai Liang, Qianxiang Wang, and Junjie Chen. 2024. An Empirical Study of Unit Test Generation with Large Language Models. CoRR, abs/2406.18181 (2024), arXiv–2406.
– reference: Xing Hu, Zhuang Liu, Xin Xia, Zhongxin Liu, Tongtong Xu, and Xiaohu Yang. 2023. Identify and Update Test Cases When Production Code Changes: A Transformer-Based Approach. In 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Luxembourg, September 11-15, 2023. IEEE, 1111–1122.
– reference: Zhongxin Liu, Kui Liu, Xin Xia, and Xiaohu Yang. 2023. Towards More Realistic Evaluation for Neural Test Oracle Generation. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023, Seattle, WA, USA, July 17-21, 2023. ACM, 589–600.
– reference: Quanjun Zhang, Weifeng Sun, Chunrong Fang, Bowen Yu, Hongyan Li, Meng Yan, Jianyi Zhou, and Zhenyu Chen. 2024. Exploring Automated Assertion Generation via Large Language Models. ACM Trans. Softw. Eng. Methodol., Oct., issn:1049-331X https://doi.org/10.1145/3699598 Just Accepted 10.1145/3699598
– reference: 2024. Commons CSV. https://commons.apache.org/proper/commons-csv/
– reference: Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020 (Findings of ACL, Vol. EMNLP 2020). Association for Computational Linguistics, 1536–1547.
– reference: Elizabeth Dinella, Gabriel Ryan, Todd Mytkowicz, and Shuvendu K. Lahiri. 2022. TOGA: A Neural Method for Test Oracle Generation. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 2130–2141.
– reference: 2024. Commons Lang. https://commons.apache.org/proper/commons-lang/
– reference: Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton-Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, and Gabriel Synnaeve. 2023. Code Llama: Open Foundation Models for Code. CoRR, abs/2308.12950 (2023), arXiv–2308.
– reference: Vitaly Chipounov, Volodymyr Kuznetsov, and George Candea. 2011. S2E: a Platform for In-Vivo Multi-Path Analysis of Software Systems. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2011, Newport Beach, CA, USA, March 5-11, 2011. ACM, 265–278.
– reference: Quanjun Zhang, Chunrong Fang, Yang Xie, Yuxiang Ma, Weisong Sun, Yun Yang, and Zhenyu Chen. 2024. A Systematic Literature Review on Large Language Models for Automated Program Repair. CoRR, abs/2405.01466 (2024), arXiv–2405.
– reference: Siqi Gu, Chunrong Fang, Quanjun Zhang, Fangyuan Tian, Jianyi Zhou, and Zhenyu Chen. 2024. TestART: Improving LLM-based Unit Test via Co-evolution of Automated Generation and Repair Iteration. CoRR, abs/2408.03095 (2024), arXiv–2408.
– reference: Yibo He, Jiaming Huang, Hao Yu, and Tao Xie. 2024. An Empirical Study on Focal Methods in Deep-Learning-Based Approaches for Assertion Generation. Proc. ACM Softw. Eng., 1, FSE (2024), 1750–1771.
– reference: Quanjun Zhang, Chunrong Fang, Tongke Zhang, Bowen Yu, Weisong Sun, and Zhenyu Chen. 2023. Gamma: Revisiting Template-Based Automated Program Repair Via Mask Prediction. In 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Luxembourg, September 11-15, 2023. IEEE, 535–547.
– reference: Yuxiang Wei, Zhe Wang, Jiawei Liu, Yifeng Ding, and Lingming Zhang. 2024. Magicoder: Empowering Code Generation with OSS-Instruct. In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024. OpenReview.net.
– reference: 2024. JFreeChart. https://jfree.org/jfreechart/
– reference: Saranya Alagarsamy, Chakkrit Tantithamthavorn, and Aldeida Aleti. 2024. A3Test: Assertion-Augmented Automated Test Case Generation. Inf. Softw. Technol., 176 (2024), 107565.
– reference: Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2023. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
– reference: Soneya Binta Hossain, Antonio Filieri, Matthew B. Dwyer, Sebastian G. Elbaum, and Willem Visser. 2023. Neural-Based Test Oracle Generation: A Large-Scale Evaluation and Lessons Learned. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, CA, USA, December 3-9, 2023. ACM, 120–132.
– reference: A. Jefferson Offutt and Aynur Abdurazik. 1999. Generating Tests from UML Specifications. In « UML» ’99: The Unified Modeling Language - Beyond the Standard, Second International Conference, Fort Collins, CO, USA, October 28-30, 1999, Proceedings (Lecture Notes in Computer Science, Vol. 1723). Springer, 416–429.
– reference: Quanjun Zhang, Chunrong Fang, Weisong Sun, Shengcheng Yu, Yutao Xu, and Yulei Liu. 2022. Test Case Prioritization Using Partial Attention. J. Syst. Softw., 192 (2022), 111419.
– reference: Siddhartha R. Dalal, Ashish Jain, Nachimuthu Karunanithi, J. M. Leaton, Christopher M. Lott, Gardner C. Patton, and Bruce M. Horowitz. 1999. Model-Based Testing in Practice. In Proceedings of the 1999 International Conference on Software Engineering, ICSE’ 99, Los Angeles, CA, USA, May 16-22, 1999. ACM, 285–294.
– reference: Quanjun Zhang, Chunrong Fang, Yuxiang Ma, Weisong Sun, and Zhenyu Chen. 2024. A Survey of Learning-based Automated Program Repair. ACM Trans. Softw. Eng. Methodol., 33, 2 (2024), 55:1–55:69.
– reference: Quanjun Zhang, Tongke Zhang, Juan Zhai, Chunrong Fang, Bowen Yu, Weisong Sun, and Zhenyu Chen. 2023. A Critical Review of Large Language Model on Software Engineering: An Example from ChatGPT and Automated Program Repair. CoRR, abs/2310.08879 (2023), arXiv–2310.
– reference: Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res., 21 (2020), 140:1–140:67.
– reference: Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, Shao Kun Deng, and Neel Sundaresan. 2020. Unit Test Case Generation with Transformers and Focal Context. arXiv e-prints, abs/2009.05617 (2020), arXiv–2009.
– reference: Wei Yuan, Quanjun Zhang, Tieke He, Chunrong Fang, Nguyen Quoc Viet Hung, Xiaodong Hao, and Hongzhi Yin. 2022. CIRCLE: Continual Repair across Programming Languages. In ISSTA ’22: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, South Korea, July 18 - 22, 2022. ACM, 678–690.
– reference: Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, and Ryan Lowe. 2022. Training Language Models to Follow Instructions with Human Feedback. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. 35, 27730–27744.
– reference: Quanjun Zhang, Chunrong Fang, Yang Xie, Yaxin Zhang, Yun Yang, Weisong Sun, Shengcheng Yu, and Zhenyu Chen. 2023. A Survey on Large Language Models for Software Engineering. CoRR, abs/2312.15223 (2023), arXiv–2312.
– reference: Max Schäfer, Sarah Nadi, Aryaz Eghbali, and Frank Tip. 2024. An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation. IEEE Trans. Software Eng., 50, 1 (2024), 85–105.
– reference: Junjie Wang, Yuchao Huang, Chunyang Chen, Zhe Liu, Song Wang, and Qing Wang. 2024. Software Testing With Large Language Models: Survey, Landscape, and Vision. IEEE Trans. Software Eng., 50, 4 (2024), 911–936.
– reference: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. 5998–6008.
– reference: Carlos Pacheco and Michael D. Ernst. 2007. Randoop: Feedback-Directed Random Testing for Java. In Companion to the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2007, October 21-25, 2007, Montreal, Quebec, Canada. ACM, 815–816.
– reference: Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Y. Wu, Y. K. Li, Fuli Luo, Yingfei Xiong, and Wenfeng Liang. 2024. DeepSeek-Coder: When the Large Language Model Meets Programming - The Rise of Code Intelligence. CoRR, abs/2401.14196 (2024), arXiv–2401.
– reference: Weisong Sun, Chunrong Fang, Yudu You, Yuchen Chen, Yi Liu, Chong Wang, Jian Zhang, Quanjun Zhang, Hanwei Qian, Wei Zhao, Yang Liu, and Zhenyu Chen. 2023. A Prompt Learning Framework for Source Code Summarization. CoRR, abs/2312.16066 (2023), arXiv–2312.
– reference: Cody Watson, Michele Tufano, Kevin Moran, Gabriele Bavota, and Denys Poshyvanyk. 2020. On Learning Meaningful Assert Statements for Unit Test Cases. In ICSE ’20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020. ACM, 1398–1409.
– reference: Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John C. Grundy, and Haoyu Wang. 2023. Large Language Models for Software Engineering: A Systematic Literature Review. CoRR, abs/2308.10620 (2023), arXiv–2308.
– reference: Michele Tufano, Shao Kun Deng, Neel Sundaresan, and Alexey Svyatkovskiy. 2022. METHODS2TEST: A Dataset of Focal Methods Mapped to Test Cases. In 19th IEEE/ACM International Conference on Mining Software Repositories, MSR 2022, Pittsburgh, PA, USA, May 23-24, 2022. ACM, 299–303.
– reference: Lei Ma, Cyrille Artho, Cheng Zhang, Hiroyuki Sato, Johannes Gmeiner, and Rudolf Ramler. 2015. GRT: Program-Analysis-Guided Random Testing (T). In 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9-13, 2015. IEEE Computer Society, 212–223.
– reference: Tao Xie, Darko Marinov, Wolfram Schulte, and David Notkin. 2005. Symstra: A Framework for Generating Object-Oriented Unit Tests Using Symbolic Execution. In Tools and Algorithms for the Construction and Analysis of Systems, 11th International Conference, TACAS 2005, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2005,, April 4-8, 2005, Proceedings (Lecture Notes in Computer Science, Vol. 3440). Springer, 365–381.
– reference: Weifeng Sun, Meng Yan, Zhongxin Liu, Xin Xia, Yan Lei, and David Lo. 2023. Revisiting the Identification of the Co-evolution of Production and Test Code. ACM Trans. Softw. Eng. Methodol., 32, 6 (2023), 152:1–152:37.
– reference: Chao Ni, Xiaoya Wang, Liushan Chen, Dehai Zhao, Zhengong Cai, Shaohua Wang, and Xiaohu Yang. 2024. CasModaTest: A Cascaded and Model-Agnostic Self-Directed Framework for Unit Test Generation. CoRR, abs/2406.15743 (2024), arXiv–2406.
– reference: Yue Wang, Weishi Wang, Shafiq R. Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021. Association for Computational Linguistics, 8696–8708.
– reference: Zhiqiang Yuan, Mingwei Liu, Shiji Ding, Kaixin Wang, Yixuan Chen, Xin Peng, and Yiling Lou. 2024. Evaluating and Improving ChatGPT for Unit Test Generation. Proc. ACM Softw. Eng., 1, FSE (2024), 1703–1726.
– ident: e_1_2_1_36_1
  doi: 10.1145/1390630.1390635
– volume: 2
  volume-title: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering -
  year: 2010
  ident: e_1_2_1_8_1
– ident: e_1_2_1_29_1
  doi: 10.1109/ASE.2015.49
– volume-title: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics
  year: 2002
  ident: e_1_2_1_35_1
– volume-title: Briand
  year: 2024
  ident: e_1_2_1_53_1
– volume-title: Augmenting LLMs to Repair Obsolete Test Cases with Static Collector and Neural Reranker. CoRR, abs/2407.03625
  year: 2024
  ident: e_1_2_1_27_1
– volume-title: Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022
  year: 2022
  ident: e_1_2_1_33_1
– volume-title: A Critical Review of Large Language Model on Software Engineering: An Example from ChatGPT and Automated Program Repair. CoRR, abs/2310.08879
  year: 2023
  ident: e_1_2_1_65_1
– volume: 429
  volume-title: Second International Conference
  year: 1999
  ident: e_1_2_1_32_1
– ident: e_1_2_1_40_1
  doi: 10.1109/TSE.2023.3334955
– volume-title: Dwyer
  year: 2024
  ident: e_1_2_1_22_1
– ident: e_1_2_1_4_1
– volume-title: Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024
  year: 2024
  ident: e_1_2_1_6_1
– volume-title: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017
  year: 2017
  ident: e_1_2_1_45_1
– ident: e_1_2_1_54_1
  doi: 10.1145/3510003.3510149
– volume-title: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis
  year: 2022
  ident: e_1_2_1_55_1
– volume-title: TestART: Improving LLM-based Unit Test via Co-evolution of Automated Generation and Repair Iteration. CoRR, abs/2408.03095
  year: 2024
  ident: e_1_2_1_19_1
– volume-title: Proc. ACM Softw. Eng., 1, FSE
  year: 2024
  ident: e_1_2_1_21_1
– volume-title: 42nd International Conference on Software Engineering
  year: 2020
  ident: e_1_2_1_49_1
– volume-title: A Systematic Literature Review on Large Language Models for Automated Program Repair. CoRR, abs/2405.01466
  year: 2024
  ident: e_1_2_1_59_1
– volume: 176
  start-page: 107565
  year: 2024
  ident: e_1_2_1_7_1
  article-title: A3Test
  publication-title: Assertion-Augmented Automated Test Case Generation. Inf. Softw. Technol.
– volume-title: A Survey of Large Language Models. arXiv e-prints, abs/2303.18223
  year: 2023
  ident: e_1_2_1_66_1
– volume-title: Magicoder: Empowering Code Generation with OSS-Instruct. In Forty-first International Conference on Machine Learning, ICML 2024
  year: 2024
  ident: e_1_2_1_50_1
– ident: e_1_2_1_12_1
  doi: 10.1109/ISSRE.2014.11
– volume-title: CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In The Eleventh International Conference on Learning Representations, ICLR 2023
  year: 2023
  ident: e_1_2_1_31_1
– ident: e_1_2_1_46_1
  doi: 10.1109/TSE.2024.3368208
– volume-title: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023
  year: 2023
  ident: e_1_2_1_28_1
– ident: e_1_2_1_2_1
– ident: e_1_2_1_5_1
– ident: e_1_2_1_15_1
  doi: 10.1007/s10009-014-0355-9
– ident: e_1_2_1_17_1
  doi: 10.1145/2025113.2025179
– start-page: 2023
  volume-title: Trans. Mach. Learn. Res.
  year: 2023
  ident: e_1_2_1_26_1
– volume-title: A Survey on Large Language Models for Software Engineering. CoRR, abs/2312.15223
  year: 2023
  ident: e_1_2_1_60_1
– volume-title: 19th IEEE/ACM International Conference on Mining Software Repositories, MSR 2022
  year: 2022
  ident: e_1_2_1_43_1
– ident: e_1_2_1_57_1
  doi: 10.1145/3631974
– volume-title: CasModaTest: A Cascaded and Model-Agnostic Self-Directed Framework for Unit Test Generation. CoRR, abs/2406.15743
  year: 2024
  ident: e_1_2_1_30_1
– volume-title: TOGA: A Neural Method for Test Oracle Generation. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022
  year: 2022
  ident: e_1_2_1_14_1
– volume-title: 7th European Software Engineering Conference, Held Jointly with the 7th ACM SIGSOFT Symposium on the Foundations of Software Engineering
  year: 1999
  ident: e_1_2_1_18_1
– ident: e_1_2_1_64_1
  doi: 10.1145/3699598
– volume-title: Symstra: A Framework for Generating Object-Oriented Unit Tests Using Symbolic Execution. In Tools and Algorithms for the Construction and Analysis of Systems, 11th International Conference, TACAS
  year: 2005
  ident: e_1_2_1_51_1
– volume-title: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023
  year: 2023
  ident: e_1_2_1_23_1
– volume-title: Shao Kun Deng, and Neel Sundaresan
  year: 2020
  ident: e_1_2_1_44_1
– volume-title: Large Language Models for Software Engineering: A Systematic Literature Review. CoRR, abs/2308.10620
  year: 2023
  ident: e_1_2_1_24_1
– volume-title: OOPSLA 2007
  year: 2007
  ident: e_1_2_1_34_1
– volume-title: Identify and Update Test Cases When Production Code Changes: A Transformer-Based Approach. In 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023
  year: 2023
  ident: e_1_2_1_25_1
– ident: e_1_2_1_1_1
– volume-title: DeepSeek-Coder: When the Large Language Model Meets Programming - The Rise of Code Intelligence. CoRR, abs/2401.14196
  year: 2024
  ident: e_1_2_1_20_1
– ident: e_1_2_1_58_1
  doi: 10.1016/j.jss.2022.111419
– ident: e_1_2_1_61_1
  doi: 10.1109/TDSC.2023.3308897
– volume-title: TestBench: Evaluating Class-Level Test Case Generation Capability of Large Language Models. arXiv e-prints, abs/2409.17561
  year: 2024
  ident: e_1_2_1_63_1
– volume: 32
  year: 2023
  ident: e_1_2_1_42_1
  article-title: Revisiting the Identification of the Co-evolution of Production and Test Code
  publication-title: ACM Trans. Softw. Eng. Methodol.
– volume-title: Proceedings of the 1999 International Conference on Software Engineering, ICSE’ 99
  year: 1999
  ident: e_1_2_1_13_1
– ident: e_1_2_1_11_1
  doi: 10.1145/1950365.1950396
– volume: 1547
  volume-title: CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020 (Findings of ACL
  year: 2020
  ident: e_1_2_1_16_1
– ident: e_1_2_1_3_1
– volume-title: An Empirical Study of Unit Test Generation with Large Language Models. CoRR, abs/2406.18181
  year: 2024
  ident: e_1_2_1_52_1
– volume-title: Gamma: Revisiting Template-Based Automated Program Repair Via Mask Prediction. In 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023
  year: 2023
  ident: e_1_2_1_62_1
– volume-title: Proceedings of the 33rd International Conference on Software Engineering, ICSE 2011, Waikiki, Honolulu , HI, USA
  year: 2011
  ident: e_1_2_1_9_1
– volume-title: CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. CoRR, abs/2009.10297
  year: 2020
  ident: e_1_2_1_38_1
– volume-title: ChatUniTest: A Framework for LLM-Based Test Generation. In Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, FSE 2024, Porto de Galinhas
  year: 2024
  ident: e_1_2_1_10_1
– volume: 21
  year: 2020
  ident: e_1_2_1_37_1
  article-title: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
  publication-title: J. Mach. Learn. Res.
– volume-title: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana
  year: 2021
  ident: e_1_2_1_48_1
– volume-title: Proc. ACM Softw. Eng., 1, FSE
  year: 2024
  ident: e_1_2_1_56_1
– ident: e_1_2_1_39_1
– volume-title: A Prompt Learning Framework for Source Code Summarization. CoRR, abs/2312.16066
  year: 2023
  ident: e_1_2_1_41_1
– volume-title: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023
  year: 2023
  ident: e_1_2_1_47_1
SSID ssj0002991170
Score 2.2956054
Snippet Unit testing plays a pivotal role in software development, improving software quality and reliability. However, generating effective test cases manually is...
SourceID crossref
acm
SourceType Enrichment Source
Index Database
Publisher
StartPage 1678
SubjectTerms Software and its engineering
Software testing and debugging
SubjectTermsDisplay Software and its engineering -- Software testing and debugging
Title A Large-Scale Empirical Study on Fine-Tuning Large Language Models for Unit Testing
URI https://dl.acm.org/doi/10.1145/3728951
Volume 2
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources (ISSN International Center)
  customDbUrl:
  eissn: 2994-970X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002991170
  issn: 2994-970X
  databaseCode: M~E
  dateStart: 20240101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwELaWwoELjwJiy0M-IC4ri8R5ODlGqy4gtVVRgrScVnHsiEXbdOk2bU_8Df4uM7HjDatKwIGLFVljK5r55BmPxt8Q8obzNIm1ipjWdcVCrQMmwRMxjkwvJeey9LuHwkfi5CSZz9PT0ehn_xbmaiWaJrm5Sdf_1dQwB8bGp7P_YG63KUzANxgdRjA7jH9l-GxyhNXdLAft68nh2XppWEByQx_dTGYQWLKi7RIinSiMJmnZdUZbdQwNXTA6KZCDw_o2G8GeOo-36esLsukx7ruBE_0aC8n0luPQJXD6vPQXBySXqv7Uls231oF0ZqenX7FQcbvF-7bL1C6_L4eZCh5hRRUfJC850hCnwpsb33PLnD2R-QB4H_O8yAYnrB-blj_WWyO94O2eIETSjEDAhdIy2v7Gtb3jA11lonmnHS3swjvkLhdRirWCxz-2yTv4a-zYg40L-_83D7Jx7Tu7FuOd6mwQ7wwCl-IReWBvHDQzSHlMRrrZJw_7bh7UHu5PSJ7RAXCoAw7tgEPPGzoAjhGlPXCoAQ4F4FAEDrXAeUo-zw6L6QdmW26w0ufcZ6VAftxAJdzXMoEDXpUiiOsk9ZWqlScrLSsVKhlFsg7qMqzCWPvSk2kdlWHK6-AZ2WvOG_2cUO1rnwulIT6FK3ctQGNc6TCVQghPefGY7IN6FmtDqtIrfEze9upaVJalHpulrBY7lhkT6gT7PXZEDv4s8oLc3yL1Jdm7vGj1K3Kvurpcbi5ed0b_Bd49fa0
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Large-Scale+Empirical+Study+on+Fine-Tuning+Large+Language+Models+for+Unit+Testing&rft.jtitle=Proceedings+of+the+ACM+on+software+engineering&rft.au=Shang%2C+Ye&rft.au=Zhang%2C+Quanjun&rft.au=Fang%2C+Chunrong&rft.au=Gu%2C+Siqi&rft.date=2025-06-22&rft.issn=2994-970X&rft.eissn=2994-970X&rft.volume=2&rft.issue=ISSTA&rft.spage=1678&rft.epage=1700&rft_id=info:doi/10.1145%2F3728951&rft.externalDBID=n%2Fa&rft.externalDocID=10_1145_3728951
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2994-970X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2994-970X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2994-970X&client=summon