Generating Variable Explanations via Zero-shot Prompt Learning

As basic elements in program, variables convey essential information that is critical for program comprehension and maintenance. However, understanding the meanings of variables in program is not always easy for developers, since poor-quality variable names are prevalent while such variable are less...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE/ACM International Conference on Automated Software Engineering : [proceedings] s. 748 - 760
Hlavní autori: Wang, Chong, Lou, Yiling, Liu, Junwei, Peng, Xin
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 11.09.2023
Predmet:
ISSN:2643-1572
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract As basic elements in program, variables convey essential information that is critical for program comprehension and maintenance. However, understanding the meanings of variables in program is not always easy for developers, since poor-quality variable names are prevalent while such variable are less informative for program comprehension. Therefore, in this paper, we target at generating concise natural language explanations for variables to facilitate program comprehension. In particular, there are two challenges in variable explanation generation, including the lack of training data and the association with complex code contexts around the variable. To address these issues, we propose a novel approach ZeroVar,which leverages code pre-trained models and zero-shot prompt learning to generate explanations for the variable based on its code context. ZeroVarcontains two stages: (i) a pre-training stage that continually pre-trains a base model (i.e., CodeT5) to recover the randomly-masked parameter descriptions in method docstrings; and (ii) a zero-shot prompt learning stage that leverages the pre-trained model to generate explanations for a given variable via the prompt constructed with the variable and its belonging method context. We then extensively evaluate the quality and usefulness of the variable explanations generated by ZeroVar.We construct an evaluation dataset of 773 variables and their reference explanations. Our results show that ZeroVarcan generate higher-quality explanations than baselines, not only on automated metrics such as BLEU and ROUGE, but also on human metrics such as correctness, completeness, and conciseness. Moreover, we further assess the usefulness of ZeroVAR-generated explanations on two downstream tasks related to variable naming quality, i.e., abbreviation expansion and spelling correction. For abbreviation expansion, the generated variable explanations can help improve the present rate (+13.1%), precision (+3.6%), and recall (+10.0%) of the state-of-the-art abbreviation explanation approach. For spelling correction, by using the generated explanations we can achieve higher hit@1 (+162.9(%) and hit@3 (+49.6%) than the recent variable representation learning approach.
AbstractList As basic elements in program, variables convey essential information that is critical for program comprehension and maintenance. However, understanding the meanings of variables in program is not always easy for developers, since poor-quality variable names are prevalent while such variable are less informative for program comprehension. Therefore, in this paper, we target at generating concise natural language explanations for variables to facilitate program comprehension. In particular, there are two challenges in variable explanation generation, including the lack of training data and the association with complex code contexts around the variable. To address these issues, we propose a novel approach ZeroVar,which leverages code pre-trained models and zero-shot prompt learning to generate explanations for the variable based on its code context. ZeroVarcontains two stages: (i) a pre-training stage that continually pre-trains a base model (i.e., CodeT5) to recover the randomly-masked parameter descriptions in method docstrings; and (ii) a zero-shot prompt learning stage that leverages the pre-trained model to generate explanations for a given variable via the prompt constructed with the variable and its belonging method context. We then extensively evaluate the quality and usefulness of the variable explanations generated by ZeroVar.We construct an evaluation dataset of 773 variables and their reference explanations. Our results show that ZeroVarcan generate higher-quality explanations than baselines, not only on automated metrics such as BLEU and ROUGE, but also on human metrics such as correctness, completeness, and conciseness. Moreover, we further assess the usefulness of ZeroVAR-generated explanations on two downstream tasks related to variable naming quality, i.e., abbreviation expansion and spelling correction. For abbreviation expansion, the generated variable explanations can help improve the present rate (+13.1%), precision (+3.6%), and recall (+10.0%) of the state-of-the-art abbreviation explanation approach. For spelling correction, by using the generated explanations we can achieve higher hit@1 (+162.9(%) and hit@3 (+49.6%) than the recent variable representation learning approach.
Author Lou, Yiling
Liu, Junwei
Peng, Xin
Wang, Chong
Author_xml – sequence: 1
  givenname: Chong
  surname: Wang
  fullname: Wang, Chong
  email: wangchong20@fudan.edu.cn
  organization: School of Computer Science, Fudan University,Shanghai Key Laboratory of Data Science,China
– sequence: 2
  givenname: Yiling
  surname: Lou
  fullname: Lou, Yiling
  email: yilinglou@fudan.edu.cn
  organization: School of Computer Science, Fudan University,Shanghai Key Laboratory of Data Science,China
– sequence: 3
  givenname: Junwei
  surname: Liu
  fullname: Liu, Junwei
  email: 22210240218@m.fudan.edu.cn
  organization: School of Computer Science, Fudan University,Shanghai Key Laboratory of Data Science,China
– sequence: 4
  givenname: Xin
  surname: Peng
  fullname: Peng, Xin
  email: pengxin@fudan.edu.cn
  organization: School of Computer Science, Fudan University,Shanghai Key Laboratory of Data Science,China
BookMark eNotjM1Kw0AURkdRsK19Al3MCyTeuZPJZDZCKWkVAgr-LNyUm-RGI-kkTILo2xvQ1cc5cL6lOPO9ZyGuFMRKgbvZPOUmRXQxAuoYQGk4EWtnXaYNaHQuTU7FAtNER8pYvBDLcfwEMDPYhbjds-dAU-vf5SuFlsqOZf49dORn2ftRfrUk3zj00fjRT_Ix9MdhkgVT8HNzKc4b6kZe_-9KvOzy5-1dVDzs77ebIiLMkimyqSYDgFwphFpVTZ2UdV0bw5TVCktbNWwbbSvChrLGambltMWKHZiESK_E9d9vy8yHIbRHCj8HBeiyRIH-BfErTAw
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ASE56229.2023.00130
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798350329964
EISSN 2643-1572
EndPage 760
ExternalDocumentID 10298410
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 61972098
  funderid: 10.13039/501100001809
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IM
6IN
6J9
AAJGR
AAWTH
ABLEC
ACREN
ADYOE
ADZIZ
AFYQB
ALMA_UNASSIGNED_HOLDINGS
AMTXH
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
ID FETCH-LOGICAL-a284t-763a5002ec120d1cfd4bddd55ea8d12b7cfe7f37ca2fa8f73ee19372ce9054aa3
IEDL.DBID RIE
ISICitedReferencesCount 6
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001103357200060&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:32:41 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a284t-763a5002ec120d1cfd4bddd55ea8d12b7cfe7f37ca2fa8f73ee19372ce9054aa3
PageCount 13
ParticipantIDs ieee_primary_10298410
PublicationCentury 2000
PublicationDate 2023-Sept.-11
PublicationDateYYYYMMDD 2023-09-11
PublicationDate_xml – month: 09
  year: 2023
  text: 2023-Sept.-11
  day: 11
PublicationDecade 2020
PublicationTitle IEEE/ACM International Conference on Automated Software Engineering : [proceedings]
PublicationTitleAbbrev ASE
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0051577
ssib057256115
Score 2.2904482
Snippet As basic elements in program, variables convey essential information that is critical for program comprehension and maintenance. However, understanding the...
SourceID ieee
SourceType Publisher
StartPage 748
SubjectTerms code pretrained models
Codes
Maintenance engineering
Measurement
naming quality
Natural languages
prompt learning
Representation learning
Task analysis
Training data
variable explanation
Title Generating Variable Explanations via Zero-shot Prompt Learning
URI https://ieeexplore.ieee.org/document/10298410
WOSCitedRecordID wos001103357200060&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgYmAqH0V8ywOrwXaSOl6QEGrFVFXiQxVLdbHP0IGmStP-fmwnbScGtsiKIuti5905770j5M4Yl0qeCgaQ-wJFFH1WcOQsR1MY8HihBcRmE2o0yicTPW7F6lELg4iRfIb34TL-y7elWYWjMr_Dpc7TIKjaV6rfiLU2iydTHryF2Oa-HqeVam2GBNcPT68DD_UyaFNkMDUVkfe8a6gS8WTY_edMjkhvp8yj4y3mHJM9nJ-Q7qY1A2136il5bOykA6eZfvhyOAikaODbQXP6t6TrGdBPrEq2_C7r8MyfRU1bt9WvHnkfDt6eX1jbKoGBx5ea-a9EaG0g0QjJrTDOpoW1NssQcitkoYxD5RJlQDrInUoQfeampEHtczaA5Ix05uUczwnVoID7Ox136Gs3qUFnaH0akygHCegL0gvxmC4aN4zpJhSXf4xfkcMQ8sCxEOKadOpqhTfkwKzr2bK6je_wFyfDnNk
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVQQYKpfBTxjQfWQOwkOF6QEGpVRKkqUVDFUl3sM3Sgqdq0v59zmrYTA1sURXF0if3unPfuMXZjjItlGIsAIKUCRWT3QRZiGKRoMgOEF1pAaTahut10MNC9SqxeamEQsSSf4a0_LP_l29zM_VYZzXCp09gLqraTmIZYyrVWn0-iCL6FWGe_hNRKVY2GRKjvHt-aBPbSq1Okb2sqSubzxlKlRJRW_Z_Pss8aG20e761R54Bt4fiQ1VfmDLyaq0fsYdlQ2rOa-QcVxF4ixT3jDpb7fzO-GAH_xGkezL7zwt_zZ1Lwqt_qV4O9t5r9p3ZQmSUEQAhTBLROeHMDiUbI0ArjbJxZa5MEIbVCZso4VC5SBqSD1KkIkXI3JQ1qytoAomNWG-djPGFcg4KQrnShQ6repAadoKVEJlIOItCnrOHjMZws-2EMV6E4--P8Ndtt9187w85z9-Wc7fnwe8aFEBesVkzneMl2zKIYzaZX5fv8BYFvoCA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+International+Conference+on+Automated+Software+Engineering+%3A+%5Bproceedings%5D&rft.atitle=Generating+Variable+Explanations+via+Zero-shot+Prompt+Learning&rft.au=Wang%2C+Chong&rft.au=Lou%2C+Yiling&rft.au=Liu%2C+Junwei&rft.au=Peng%2C+Xin&rft.date=2023-09-11&rft.pub=IEEE&rft.eissn=2643-1572&rft.spage=748&rft.epage=760&rft_id=info:doi/10.1109%2FASE56229.2023.00130&rft.externalDocID=10298410