Learning Deep Semantics for Test Completion

Writing tests is a time-consuming yet essential task during software development. We propose to leverage recent advances in deep learning for text and code generation to assist developers in writing tests. We formalize the novel task of test completion to automatically complete the next statement in...

Full description

Saved in:

Bibliographic Details
Published in:	Proceedings / International Conference on Software Engineering pp. 2111 - 2123
Main Authors:	Nie, Pengyu, Banerjee, Rahul, Li, Junyi Jessy, Mooney, Raymond J., Gligoric, Milos
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01.05.2023
Subjects:	Codes Deep learning deep neural networks Java Measurement Predictive models programming language semantics Semantics test completion Writing
ISSN:	1558-1225
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	Writing tests is a time-consuming yet essential task during software development. We propose to leverage recent advances in deep learning for text and code generation to assist developers in writing tests. We formalize the novel task of test completion to automatically complete the next statement in a test method based on the context of prior statements and the code under test. We develop TECo-a deep learning model using code semantics for test completion. The key insight underlying TECO is that predicting the next statement in a test method requires reasoning about code execution, which is hard to do with only syntax-level data that existing code completion models use. Teco extracts and uses six kinds of code semantics data, including the execution result of prior statements and the execution context of the test method. To provide a testbed for this new task, as well as to evaluate TECO, we collect a corpus of 130,934 test methods from 1,270 open-source Java projects. Our results show that Teco achieves an exact-match accuracy of 18, which is 29% higher than the best baseline using syntax-level data only. When measuring functional correctness of generated next statement, Teco can generate runnable code in 29% of the cases compared to 18% obtained by the best baseline. Moreover, Teco is sianificantly better than prior work on test oracle generation.
AbstractList	Writing tests is a time-consuming yet essential task during software development. We propose to leverage recent advances in deep learning for text and code generation to assist developers in writing tests. We formalize the novel task of test completion to automatically complete the next statement in a test method based on the context of prior statements and the code under test. We develop TECo-a deep learning model using code semantics for test completion. The key insight underlying TECO is that predicting the next statement in a test method requires reasoning about code execution, which is hard to do with only syntax-level data that existing code completion models use. Teco extracts and uses six kinds of code semantics data, including the execution result of prior statements and the execution context of the test method. To provide a testbed for this new task, as well as to evaluate TECO, we collect a corpus of 130,934 test methods from 1,270 open-source Java projects. Our results show that Teco achieves an exact-match accuracy of 18, which is 29% higher than the best baseline using syntax-level data only. When measuring functional correctness of generated next statement, Teco can generate runnable code in 29% of the cases compared to 18% obtained by the best baseline. Moreover, Teco is sianificantly better than prior work on test oracle generation.
Author	Gligoric, Milos Li, Junyi Jessy Banerjee, Rahul Nie, Pengyu Mooney, Raymond J.
Author_xml	– sequence: 1 givenname: Pengyu surname: Nie fullname: Nie, Pengyu email: pynie@utexas.edu organization: UT,Austin,USA – sequence: 2 givenname: Rahul surname: Banerjee fullname: Banerjee, Rahul email: rahulb517@utexas.edu organization: UT,Austin,USA – sequence: 3 givenname: Junyi Jessy surname: Li fullname: Li, Junyi Jessy email: jessy@utexas.edu organization: UT,Austin,USA – sequence: 4 givenname: Raymond J. surname: Mooney fullname: Mooney, Raymond J. email: mooney@utexas.edu organization: UT,Austin,USA – sequence: 5 givenname: Milos surname: Gligoric fullname: Gligoric, Milos email: gligoric@utexas.edu organization: UT,Austin,USA
BookMark	eNotj7FOwzAUAA0CiVLyBx2yowQ_2-_FHlEoUCkSQ8tc2eYFWWqcKsnC3wOC6W466W7FVR4zC7EBWQNI97Br91tjCVytpNK1lNDYC1G4xgIRGmwkuEuxAkRbgVJ4I4p5TkEiOAVa0krcd-ynnPJn-cR8Lvc8-LykOJf9OJUHnpeyHYfziZc05jtx3fvTzMU_1-L9eXtoX6vu7WXXPnaVV5aWypMJ5D0RhchKRjaoIwQm1_9KJB0sfYBGiMp4DIS6N9ijAidNBKPXYvPXTcx8PE9p8NPXEX7mFCmpvwEui0RS
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/ICSE48619.2023.00178
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	9781665457019 1665457015
EISSN	1558-1225
EndPage	2123
ExternalDocumentID	10172620
Genre	orig-research
GroupedDBID	-~X .4S .DC 123 23M 29O 5VS 6IE 6IF 6IH 6IK 6IL 6IM 6IN 8US AAJGR AAWTH ABLEC ADZIZ AFFNX ALMA_UNASSIGNED_HOLDINGS APO ARCSS AVWKF BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO EDO FEDTE I-F I07 IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS XOL
ID	FETCH-LOGICAL-a286t-a64b6aa666bce20ce453c1be69f53c1c63b86d1351c24a5b653f45f521904c143
IEDL.DBID	RIE
ISICitedReferencesCount	26
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001032629800169&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Aug 27 02:09:24 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a286t-a64b6aa666bce20ce453c1be69f53c1c63b86d1351c24a5b653f45f521904c143
PageCount	13
ParticipantIDs	ieee_primary_10172620
PublicationCentury	2000
PublicationDate	2023-May
PublicationDateYYYYMMDD	2023-05-01
PublicationDate_xml	– month: 05 year: 2023 text: 2023-May
PublicationDecade	2020
PublicationTitle	Proceedings / International Conference on Software Engineering
PublicationTitleAbbrev	ICSE
PublicationYear	2023
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssib051921306 ssj0006499
Score	2.486983
Snippet	Writing tests is a time-consuming yet essential task during software development. We propose to leverage recent advances in deep learning for text and code...
SourceID	ieee
SourceType	Publisher
StartPage	2111
SubjectTerms	Codes Deep learning deep neural networks Java Measurement Predictive models programming language semantics Semantics test completion Writing
Title	Learning Deep Semantics for Test Completion
URI	https://ieeexplore.ieee.org/document/10172620
WOSCitedRecordID	wos001032629800169&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LawIxEB6q9NCTfVj6JodeV3fdJLs5W6WFIoIWvEmSnRQPXUVXf38zcbX00ENvIRDIazIzSb7vA3jOreZaFCRAbETEndbepLiNiBpOuJhnpgjs-u_ZaJTPZmpcg9UDFgYRw-cz7FAxvOUXS7ulq7IubR8iUG9AI8uyPVjrsHkEEXul9GRYH8PSx_I1Vi6JVfetPxnw3KcLHRIM7wRZ-l-KKsGhDFv_7Mo5tH-geWx8dDoXcILlJbQO2gysNtWrI3HqJ3tBXLEJfvkpXNgN80Eqm3pXwKgNMW8vyzZ8DAfT_mtUCyNEupfLKtKSG6m1zzyMxV5skYvUJgalclSwMjW5LEh7z_b8ShgpUseF855axdz6COkamuWyxBtggieuELxwSW452aNOlVFoJKrE59l4C20a_Hy1576YH8Z990f9PZzR_O6_BD5As1pv8RFO7a5abNZPYcW-ARVAlCI
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEJ4omugJHxjf9uB1YR9t2Z4RAhEJCZhwI213aji4EB7-fjvLgvHgwVvTpElf05lp-30fwHNqNdciIwFiIwLutPYmxW1A1HDChbxpsoJdv98cDNLJRA1LsHqBhUHE4vMZ1qlYvOVnc7uhq7IGbR8iUD-EI8F5HG3hWrvtI4jaK6FHw_Iglj6aL9FyUagavdaozVOfMNRJMrxeCNP_0lQpXEqn-s_OnEHtB5zHhnu3cw4HmF9AdafOwEpjvdxTp36wF8QFG-Gnn8SZXTEfprKxdwaM2hD39jyvwXunPW51g1IaIdBxKteBltxIrX3uYSzGoUUuEhsZlMpRwcrEpDIj9T0b-7UwUiSOC-d9tQq59THSFVTyeY7XwASPXCZ45qLUcrJInSij0EhUkc-08QZqNPjpYst-Md2N-_aP-ic46Y7f-tN-b_B6B6c019sPgvdQWS83-ADH9ms9Wy0fi9X7Bv9Xl2k
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Software+Engineering&rft.atitle=Learning+Deep+Semantics+for+Test+Completion&rft.au=Nie%2C+Pengyu&rft.au=Banerjee%2C+Rahul&rft.au=Li%2C+Junyi+Jessy&rft.au=Mooney%2C+Raymond+J.&rft.date=2023-05-01&rft.pub=IEEE&rft.eissn=1558-1225&rft.spage=2111&rft.epage=2123&rft_id=info:doi/10.1109%2FICSE48619.2023.00178&rft.externalDocID=10172620