BERT4Anno: An annotation misuse detection method for Java

Developers leverage Java annotations to implement functions such as creating objects and operating databases. However, mastering annotations is challenging, and misused annotations might cause an application to crash. Although state-of-the-art techniques attempt to solve this problem, they do not pr...

Full description

Saved in:

Bibliographic Details
Published in:	Information and software technology Vol. 184; p. 107763
Main Authors:	Yang, Jingbo, Ji, Xin, Wu, Wenjun, Liao, Xingchuang, Zhang, Kui, Dong, Linxiao, Xiang, Nan, Jian, Ren
Format:	Journal Article
Language:	English
Published:	Elsevier B.V 01.08.2025
Subjects:	BERT Java annotation Misuse detection Stack overflow Misuse detection BERT Stack overflow Java annotation
ISSN:	0950-5849
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Developers leverage Java annotations to implement functions such as creating objects and operating databases. However, mastering annotations is challenging, and misused annotations might cause an application to crash. Although state-of-the-art techniques attempt to solve this problem, they do not provide conclusions on Java annotation misuse types, nor do they leverage project-level information, which results in low efficiency in detecting annotation misuses. To summarize Java annotation misuse types and provide a more efficient method for detecting misused annotations. Firstly, to categorize Java annotation misuses, we conduct an empirical study and curate 321 annotation misuse questions from Stack Overflow. Secondly, to better detect these misuses, we propose a BERT-based method, BERT4Anno, which takes project structure and resource configuration into account—factors often neglected by state-of-the-art methods. In BERT4Anno, a novel Annotation Usage Project Representation (AUPR) technique is designed to leverage the information of the interconnections among source code, configuration and project structure. Moreover, an AUPR-based Named Entity Recognition (ANER) task by fine-tuning BERT is devised to learn annotation usage knowledge. With the knowledge, the fine-tuned model can detect misused annotations. Finally, to evaluate our proposed method, two datasets, mainly curated from GitHub and comprising 404 Java projects/files with annotation misuse instances, are used for the experiments. The Java annotation misuses are categorized into 14 types based on how the curated questions violate the correct annotation usage knowledge. The comparison experiment demonstrates the superior performance of our method over state-of-the-art baselines in terms of precision, recall, and F1 score, while our visualization technique provides insightful interpretations of the mechanism underlying the model’s outperformance. By leveraging the project-level information, our proposed method can predict the appropriate types and positions of annotations and subsequently identify the misused annotations, making the detection more efficient.
ISSN:	0950-5849
DOI:	10.1016/j.infsof.2025.107763