iASTMapper: An Iterative Similarity-Based Abstract Syntax Tree Mapping Algorithm

Abstract syntax tree (AST) mapping algorithms are widely used to locate the code changes in a file revision by mapping the AST nodes of the source code before and after the code changes. A recent differential testing of three state-of- the-art AST mapping algorithms, i.e., GumTree, MTDiff, and IJM,...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE/ACM International Conference on Automated Software Engineering : [proceedings] s. 863 - 874
Hlavní autoři: Zhang, Neng, Chen, Qinde, Zheng, Zibin, Zou, Ying
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 11.09.2023
Témata:
ISSN:2643-1572
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Abstract syntax tree (AST) mapping algorithms are widely used to locate the code changes in a file revision by mapping the AST nodes of the source code before and after the code changes. A recent differential testing of three state-of- the-art AST mapping algorithms, i.e., GumTree, MTDiff, and IJM, reveals that the algorithms generate inaccurate mappings for a considerable number of file revisions. We find that the inaccurate mappings could be caused by the mutual influence: the mappings of lower-level AST nodes (e.g., tokens) have impacts on the mappings of higher-level AST nodes (e.g., statements) and vice versa. This mutual influence issue is rarely considered by existing algorithms. In this paper, we propose an algorithm, called iASTMapper, that iteratively map two ASTs based on the similarities between AST nodes. Given a file revision, we extract three types of AST nodes in different levels of program structures (i.e., tokens, statements, and inner-statements) from the ASTs of the two source code files. We first build mappings of the unchanged statements and inner-statements. Then, we use an iterative method to map the rest of the nodes without mapping. For each of the three types of nodes, we iteratively map the nodes based on their similarities measured using heuristic rules. We further use an iterative mechanism to connect the three iterative mapping processes by considering the mutual influence between the mappings of different types of nodes. Finally, a series of code edit actions are generated from the node mappings to help users understand and locate the code changes during revisions. We conduct experiments to compare iASTMapper with three baselines, i.e., GumTree, MTDiff, and IJM, by automatically evaluating 210,997 file revisions from ten Java projects. Furthermore, we manually evaluate the correctness of the code edit actions generated for 200 file revisions with 12 evaluators. The results demonstrate that iASTMapper outperforms the baselines. iASTMapper can generate shorter code edit actions by at least 1.29% than the baselines, with a high accuracy of 96.23%.
ISSN:2643-1572
DOI:10.1109/ASE56229.2023.00178