Learning Deep Semantics for Test Completion

Writing tests is a time-consuming yet essential task during software development. We propose to leverage recent advances in deep learning for text and code generation to assist developers in writing tests. We formalize the novel task of test completion to automatically complete the next statement in...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings / International Conference on Software Engineering pp. 2111 - 2123
Main Authors: Nie, Pengyu, Banerjee, Rahul, Li, Junyi Jessy, Mooney, Raymond J., Gligoric, Milos
Format: Conference Proceeding
Language:English
Published: IEEE 01.05.2023
Subjects:
ISSN:1558-1225
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Writing tests is a time-consuming yet essential task during software development. We propose to leverage recent advances in deep learning for text and code generation to assist developers in writing tests. We formalize the novel task of test completion to automatically complete the next statement in a test method based on the context of prior statements and the code under test. We develop TECo-a deep learning model using code semantics for test completion. The key insight underlying TECO is that predicting the next statement in a test method requires reasoning about code execution, which is hard to do with only syntax-level data that existing code completion models use. Teco extracts and uses six kinds of code semantics data, including the execution result of prior statements and the execution context of the test method. To provide a testbed for this new task, as well as to evaluate TECO, we collect a corpus of 130,934 test methods from 1,270 open-source Java projects. Our results show that Teco achieves an exact-match accuracy of 18, which is 29% higher than the best baseline using syntax-level data only. When measuring functional correctness of generated next statement, Teco can generate runnable code in 29% of the cases compared to 18% obtained by the best baseline. Moreover, Teco is sianificantly better than prior work on test oracle generation.
AbstractList Writing tests is a time-consuming yet essential task during software development. We propose to leverage recent advances in deep learning for text and code generation to assist developers in writing tests. We formalize the novel task of test completion to automatically complete the next statement in a test method based on the context of prior statements and the code under test. We develop TECo-a deep learning model using code semantics for test completion. The key insight underlying TECO is that predicting the next statement in a test method requires reasoning about code execution, which is hard to do with only syntax-level data that existing code completion models use. Teco extracts and uses six kinds of code semantics data, including the execution result of prior statements and the execution context of the test method. To provide a testbed for this new task, as well as to evaluate TECO, we collect a corpus of 130,934 test methods from 1,270 open-source Java projects. Our results show that Teco achieves an exact-match accuracy of 18, which is 29% higher than the best baseline using syntax-level data only. When measuring functional correctness of generated next statement, Teco can generate runnable code in 29% of the cases compared to 18% obtained by the best baseline. Moreover, Teco is sianificantly better than prior work on test oracle generation.
Author Gligoric, Milos
Li, Junyi Jessy
Banerjee, Rahul
Nie, Pengyu
Mooney, Raymond J.
Author_xml – sequence: 1
  givenname: Pengyu
  surname: Nie
  fullname: Nie, Pengyu
  email: pynie@utexas.edu
  organization: UT,Austin,USA
– sequence: 2
  givenname: Rahul
  surname: Banerjee
  fullname: Banerjee, Rahul
  email: rahulb517@utexas.edu
  organization: UT,Austin,USA
– sequence: 3
  givenname: Junyi Jessy
  surname: Li
  fullname: Li, Junyi Jessy
  email: jessy@utexas.edu
  organization: UT,Austin,USA
– sequence: 4
  givenname: Raymond J.
  surname: Mooney
  fullname: Mooney, Raymond J.
  email: mooney@utexas.edu
  organization: UT,Austin,USA
– sequence: 5
  givenname: Milos
  surname: Gligoric
  fullname: Gligoric, Milos
  email: gligoric@utexas.edu
  organization: UT,Austin,USA
BookMark eNotj7FOwzAUAA0CiVLyBx2yowQ_2-_FHlEoUCkSQ8tc2eYFWWqcKsnC3wOC6W466W7FVR4zC7EBWQNI97Br91tjCVytpNK1lNDYC1G4xgIRGmwkuEuxAkRbgVJ4I4p5TkEiOAVa0krcd-ynnPJn-cR8Lvc8-LykOJf9OJUHnpeyHYfziZc05jtx3fvTzMU_1-L9eXtoX6vu7WXXPnaVV5aWypMJ5D0RhchKRjaoIwQm1_9KJB0sfYBGiMp4DIS6N9ijAidNBKPXYvPXTcx8PE9p8NPXEX7mFCmpvwEui0RS
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICSE48619.2023.00178
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781665457019
1665457015
EISSN 1558-1225
EndPage 2123
ExternalDocumentID 10172620
Genre orig-research
GroupedDBID -~X
.4S
.DC
123
23M
29O
5VS
6IE
6IF
6IH
6IK
6IL
6IM
6IN
8US
AAJGR
AAWTH
ABLEC
ADZIZ
AFFNX
ALMA_UNASSIGNED_HOLDINGS
APO
ARCSS
AVWKF
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
EDO
FEDTE
I-F
I07
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
XOL
ID FETCH-LOGICAL-a286t-a64b6aa666bce20ce453c1be69f53c1c63b86d1351c24a5b653f45f521904c143
IEDL.DBID RIE
ISICitedReferencesCount 26
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001032629800169&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:09:24 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a286t-a64b6aa666bce20ce453c1be69f53c1c63b86d1351c24a5b653f45f521904c143
PageCount 13
ParticipantIDs ieee_primary_10172620
PublicationCentury 2000
PublicationDate 2023-May
PublicationDateYYYYMMDD 2023-05-01
PublicationDate_xml – month: 05
  year: 2023
  text: 2023-May
PublicationDecade 2020
PublicationTitle Proceedings / International Conference on Software Engineering
PublicationTitleAbbrev ICSE
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib051921306
ssj0006499
Score 2.486983
Snippet Writing tests is a time-consuming yet essential task during software development. We propose to leverage recent advances in deep learning for text and code...
SourceID ieee
SourceType Publisher
StartPage 2111
SubjectTerms Codes
Deep learning
deep neural networks
Java
Measurement
Predictive models
programming language semantics
Semantics
test completion
Writing
Title Learning Deep Semantics for Test Completion
URI https://ieeexplore.ieee.org/document/10172620
WOSCitedRecordID wos001032629800169&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LawIxEB6q9NCTfVj6JodeV3fdJLs5W6WFIoIWvEmSnRQPXUVXf38zcbX00ENvIRDIazIzSb7vA3jOreZaFCRAbETEndbepLiNiBpOuJhnpgjs-u_ZaJTPZmpcg9UDFgYRw-cz7FAxvOUXS7ulq7IubR8iUG9AI8uyPVjrsHkEEXul9GRYH8PSx_I1Vi6JVfetPxnw3KcLHRIM7wRZ-l-KKsGhDFv_7Mo5tH-geWx8dDoXcILlJbQO2gysNtWrI3HqJ3tBXLEJfvkpXNgN80Eqm3pXwKgNMW8vyzZ8DAfT_mtUCyNEupfLKtKSG6m1zzyMxV5skYvUJgalclSwMjW5LEh7z_b8ShgpUseF855axdz6COkamuWyxBtggieuELxwSW452aNOlVFoJKrE59l4C20a_Hy1576YH8Z990f9PZzR_O6_BD5As1pv8RFO7a5abNZPYcW-ARVAlCI
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEJ4omugJHxjf9uB1YR9t2Z4RAhEJCZhwI213aji4EB7-fjvLgvHgwVvTpElf05lp-30fwHNqNdciIwFiIwLutPYmxW1A1HDChbxpsoJdv98cDNLJRA1LsHqBhUHE4vMZ1qlYvOVnc7uhq7IGbR8iUD-EI8F5HG3hWrvtI4jaK6FHw_Iglj6aL9FyUagavdaozVOfMNRJMrxeCNP_0lQpXEqn-s_OnEHtB5zHhnu3cw4HmF9AdafOwEpjvdxTp36wF8QFG-Gnn8SZXTEfprKxdwaM2hD39jyvwXunPW51g1IaIdBxKteBltxIrX3uYSzGoUUuEhsZlMpRwcrEpDIj9T0b-7UwUiSOC-d9tQq59THSFVTyeY7XwASPXCZ45qLUcrJInSij0EhUkc-08QZqNPjpYst-Md2N-_aP-ic46Y7f-tN-b_B6B6c019sPgvdQWS83-ADH9ms9Wy0fi9X7Bv9Xl2k
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Software+Engineering&rft.atitle=Learning+Deep+Semantics+for+Test+Completion&rft.au=Nie%2C+Pengyu&rft.au=Banerjee%2C+Rahul&rft.au=Li%2C+Junyi+Jessy&rft.au=Mooney%2C+Raymond+J.&rft.date=2023-05-01&rft.pub=IEEE&rft.eissn=1558-1225&rft.spage=2111&rft.epage=2123&rft_id=info:doi/10.1109%2FICSE48619.2023.00178&rft.externalDocID=10172620