Generating Variable Explanations via Zero-shot Prompt Learning

As basic elements in program, variables convey essential information that is critical for program comprehension and maintenance. However, understanding the meanings of variables in program is not always easy for developers, since poor-quality variable names are prevalent while such variable are less...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE/ACM International Conference on Automated Software Engineering : [proceedings] s. 748 - 760
Hlavní autoři:	Wang, Chong, Lou, Yiling, Liu, Junwei, Peng, Xin
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 11.09.2023
Témata:	code pretrained models Codes Maintenance engineering Measurement naming quality Natural languages prompt learning Representation learning Task analysis Training data variable explanation
ISSN:	2643-1572
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	As basic elements in program, variables convey essential information that is critical for program comprehension and maintenance. However, understanding the meanings of variables in program is not always easy for developers, since poor-quality variable names are prevalent while such variable are less informative for program comprehension. Therefore, in this paper, we target at generating concise natural language explanations for variables to facilitate program comprehension. In particular, there are two challenges in variable explanation generation, including the lack of training data and the association with complex code contexts around the variable. To address these issues, we propose a novel approach ZeroVar,which leverages code pre-trained models and zero-shot prompt learning to generate explanations for the variable based on its code context. ZeroVarcontains two stages: (i) a pre-training stage that continually pre-trains a base model (i.e., CodeT5) to recover the randomly-masked parameter descriptions in method docstrings; and (ii) a zero-shot prompt learning stage that leverages the pre-trained model to generate explanations for a given variable via the prompt constructed with the variable and its belonging method context. We then extensively evaluate the quality and usefulness of the variable explanations generated by ZeroVar.We construct an evaluation dataset of 773 variables and their reference explanations. Our results show that ZeroVarcan generate higher-quality explanations than baselines, not only on automated metrics such as BLEU and ROUGE, but also on human metrics such as correctness, completeness, and conciseness. Moreover, we further assess the usefulness of ZeroVAR-generated explanations on two downstream tasks related to variable naming quality, i.e., abbreviation expansion and spelling correction. For abbreviation expansion, the generated variable explanations can help improve the present rate (+13.1%), precision (+3.6%), and recall (+10.0%) of the state-of-the-art abbreviation explanation approach. For spelling correction, by using the generated explanations we can achieve higher hit@1 (+162.9(%) and hit@3 (+49.6%) than the recent variable representation learning approach.
ISSN:	2643-1572
DOI:	10.1109/ASE56229.2023.00130