BERT4Anno: An annotation misuse detection method for Java

Developers leverage Java annotations to implement functions such as creating objects and operating databases. However, mastering annotations is challenging, and misused annotations might cause an application to crash. Although state-of-the-art techniques attempt to solve this problem, they do not pr...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Information and software technology Ročník 184; s. 107763
Hlavní autoři: Yang, Jingbo, Ji, Xin, Wu, Wenjun, Liao, Xingchuang, Zhang, Kui, Dong, Linxiao, Xiang, Nan, Jian, Ren
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.08.2025
Témata:
ISSN:0950-5849
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Developers leverage Java annotations to implement functions such as creating objects and operating databases. However, mastering annotations is challenging, and misused annotations might cause an application to crash. Although state-of-the-art techniques attempt to solve this problem, they do not provide conclusions on Java annotation misuse types, nor do they leverage project-level information, which results in low efficiency in detecting annotation misuses. To summarize Java annotation misuse types and provide a more efficient method for detecting misused annotations. Firstly, to categorize Java annotation misuses, we conduct an empirical study and curate 321 annotation misuse questions from Stack Overflow. Secondly, to better detect these misuses, we propose a BERT-based method, BERT4Anno, which takes project structure and resource configuration into account—factors often neglected by state-of-the-art methods. In BERT4Anno, a novel Annotation Usage Project Representation (AUPR) technique is designed to leverage the information of the interconnections among source code, configuration and project structure. Moreover, an AUPR-based Named Entity Recognition (ANER) task by fine-tuning BERT is devised to learn annotation usage knowledge. With the knowledge, the fine-tuned model can detect misused annotations. Finally, to evaluate our proposed method, two datasets, mainly curated from GitHub and comprising 404 Java projects/files with annotation misuse instances, are used for the experiments. The Java annotation misuses are categorized into 14 types based on how the curated questions violate the correct annotation usage knowledge. The comparison experiment demonstrates the superior performance of our method over state-of-the-art baselines in terms of precision, recall, and F1 score, while our visualization technique provides insightful interpretations of the mechanism underlying the model’s outperformance. By leveraging the project-level information, our proposed method can predict the appropriate types and positions of annotations and subsequently identify the misused annotations, making the detection more efficient.
ISSN:0950-5849
DOI:10.1016/j.infsof.2025.107763