BERT4Anno: An annotation misuse detection method for Java
Developers leverage Java annotations to implement functions such as creating objects and operating databases. However, mastering annotations is challenging, and misused annotations might cause an application to crash. Although state-of-the-art techniques attempt to solve this problem, they do not pr...
Saved in:
| Published in: | Information and software technology Vol. 184; p. 107763 |
|---|---|
| Main Authors: | , , , , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier B.V
01.08.2025
|
| Subjects: | |
| ISSN: | 0950-5849 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Developers leverage Java annotations to implement functions such as creating objects and operating databases. However, mastering annotations is challenging, and misused annotations might cause an application to crash. Although state-of-the-art techniques attempt to solve this problem, they do not provide conclusions on Java annotation misuse types, nor do they leverage project-level information, which results in low efficiency in detecting annotation misuses.
To summarize Java annotation misuse types and provide a more efficient method for detecting misused annotations.
Firstly, to categorize Java annotation misuses, we conduct an empirical study and curate 321 annotation misuse questions from Stack Overflow. Secondly, to better detect these misuses, we propose a BERT-based method, BERT4Anno, which takes project structure and resource configuration into account—factors often neglected by state-of-the-art methods. In BERT4Anno, a novel Annotation Usage Project Representation (AUPR) technique is designed to leverage the information of the interconnections among source code, configuration and project structure. Moreover, an AUPR-based Named Entity Recognition (ANER) task by fine-tuning BERT is devised to learn annotation usage knowledge. With the knowledge, the fine-tuned model can detect misused annotations. Finally, to evaluate our proposed method, two datasets, mainly curated from GitHub and comprising 404 Java projects/files with annotation misuse instances, are used for the experiments.
The Java annotation misuses are categorized into 14 types based on how the curated questions violate the correct annotation usage knowledge. The comparison experiment demonstrates the superior performance of our method over state-of-the-art baselines in terms of precision, recall, and F1 score, while our visualization technique provides insightful interpretations of the mechanism underlying the model’s outperformance.
By leveraging the project-level information, our proposed method can predict the appropriate types and positions of annotations and subsequently identify the misused annotations, making the detection more efficient. |
|---|---|
| ISSN: | 0950-5849 |
| DOI: | 10.1016/j.infsof.2025.107763 |