An Empirical Study of Code Simplification Methods in Code Intelligence Tasks.

Uloženo v:
Podrobná bibliografie
Název: An Empirical Study of Code Simplification Methods in Code Intelligence Tasks.
Autoři: Shen, Zongwen, Li, Yuning, Ge, Jidong, Chen, Xiang, Li, Chuanyi, Huang, LiGuo, Luo, Bin
Zdroj: ACM Transactions on Software Engineering & Methodology; Nov2025, Vol. 34 Issue 8, p1-31, 31p
Témata: EMPIRICAL research, SOURCE code, ARTIFICIAL intelligence, SOFTWARE engineering, MODELING languages (Computer science), PROGRAM transformation
Abstrakt: In recent years, pre-trained language models have seen significant success in natural language processing and have been increasingly applied to code-related tasks. Code intelligence tasks have shown promising performance with the support of code pre-trained language models. Pre-processing code simplification methods have been introduced to prune code tokens from the model's input while maintaining task effectiveness. These methods improve the efficiency of code intelligence tasks while reducing computational costs. Post-prediction code simplification methods provide explanations for code intelligence task outcomes, enhancing the reliability and interpretability of model predictions. However, comprehensive evaluations of these methods across diverse code pre-trained model architectures and code intelligence tasks are lacking. To assess the effectiveness of code simplification methods, we conduct an empirical study integrating these code simplification methods with various pre-trained code models across multiple code intelligence tasks. Our empirical findings suggest that developing task-specific code simplification methods would be beneficial. Then, we recommend leveraging post-prediction methods to summarize prior knowledge, which can pre-process code simplification strategies. Moreover, establishing more evaluation mechanisms for code simplification is crucial. Finally, we propose incorporating code simplification methods into the pre-training phase of code pre-trained models to enhance their program comprehension and code representation capabilities. [ABSTRACT FROM AUTHOR]
Copyright of ACM Transactions on Software Engineering & Methodology is the property of Association for Computing Machinery and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Databáze: Complementary Index
Popis
Abstrakt:In recent years, pre-trained language models have seen significant success in natural language processing and have been increasingly applied to code-related tasks. Code intelligence tasks have shown promising performance with the support of code pre-trained language models. Pre-processing code simplification methods have been introduced to prune code tokens from the model's input while maintaining task effectiveness. These methods improve the efficiency of code intelligence tasks while reducing computational costs. Post-prediction code simplification methods provide explanations for code intelligence task outcomes, enhancing the reliability and interpretability of model predictions. However, comprehensive evaluations of these methods across diverse code pre-trained model architectures and code intelligence tasks are lacking. To assess the effectiveness of code simplification methods, we conduct an empirical study integrating these code simplification methods with various pre-trained code models across multiple code intelligence tasks. Our empirical findings suggest that developing task-specific code simplification methods would be beneficial. Then, we recommend leveraging post-prediction methods to summarize prior knowledge, which can pre-process code simplification strategies. Moreover, establishing more evaluation mechanisms for code simplification is crucial. Finally, we propose incorporating code simplification methods into the pre-training phase of code pre-trained models to enhance their program comprehension and code representation capabilities. [ABSTRACT FROM AUTHOR]
ISSN:1049331X
DOI:10.1145/3720540