SemetonBug: A Machine Learning Model for Automatic Bug Detection in Python Code Based on Syntactic Analysis
Saved in:
| Title: | SemetonBug: A Machine Learning Model for Automatic Bug Detection in Python Code Based on Syntactic Analysis |
|---|---|
| Authors: | Bahtiar Imran, Selamet Riadi, Emi Suryadi, M. Zulpahmi, Zaeniah Zaeniah, Erfan Wahyudi |
| Source: | Jurnal Informatika. 12:75-80 |
| Publisher Information: | Universitas Bina Sarana Informatika, 2025. |
| Publication Year: | 2025 |
| Description: | Bug detection in Python programming is a crucial aspect of software development. This study develops an automated bug detection system using feature extraction based on Abstract Syntax Tree (AST) and a Random Forest Classifier model. The dataset consists of 100 manually classified bugged files and 100 non-bugged files. The model is trained using structural code features such as the number of functions, classes, variables, conditions, and exception handling. Evaluation results indicate an accuracy of 86.67%, with balanced precision and recall across both classes. Confusion matrix analysis identifies the presence of false positives and false negatives, albeit in relatively low numbers. The accuracy curve suggests a potential overfitting issue, as training accuracy is higher than testing accuracy. This study demonstrates that the combination of AST-based feature extraction and Random Forest can be an effective approach for automated bug detection, with potential improvements through model optimization and a larger dataset. |
| Document Type: | Article |
| ISSN: | 2528-2247 2355-6579 |
| DOI: | 10.31294/inf.v12i2.25340 |
| Rights: | CC BY SA |
| Accession Number: | edsair.doi...........ee702b6f060dd02ced4a2a60eff657db |
| Database: | OpenAIRE |
| Abstract: | Bug detection in Python programming is a crucial aspect of software development. This study develops an automated bug detection system using feature extraction based on Abstract Syntax Tree (AST) and a Random Forest Classifier model. The dataset consists of 100 manually classified bugged files and 100 non-bugged files. The model is trained using structural code features such as the number of functions, classes, variables, conditions, and exception handling. Evaluation results indicate an accuracy of 86.67%, with balanced precision and recall across both classes. Confusion matrix analysis identifies the presence of false positives and false negatives, albeit in relatively low numbers. The accuracy curve suggests a potential overfitting issue, as training accuracy is higher than testing accuracy. This study demonstrates that the combination of AST-based feature extraction and Random Forest can be an effective approach for automated bug detection, with potential improvements through model optimization and a larger dataset. |
|---|---|
| ISSN: | 25282247 23556579 |
| DOI: | 10.31294/inf.v12i2.25340 |
Nájsť tento článok vo Web of Science