SemetonBug: A Machine Learning Model for Automatic Bug Detection in Python Code Based on Syntactic Analysis

Saved in:
Bibliographic Details
Title: SemetonBug: A Machine Learning Model for Automatic Bug Detection in Python Code Based on Syntactic Analysis
Authors: Bahtiar Imran, Selamet Riadi, Emi Suryadi, M. Zulpahmi, Zaeniah Zaeniah, Erfan Wahyudi
Source: Jurnal Informatika. 12:75-80
Publisher Information: Universitas Bina Sarana Informatika, 2025.
Publication Year: 2025
Description: Bug detection in Python programming is a crucial aspect of software development. This study develops an automated bug detection system using feature extraction based on Abstract Syntax Tree (AST) and a Random Forest Classifier model. The dataset consists of 100 manually classified bugged files and 100 non-bugged files. The model is trained using structural code features such as the number of functions, classes, variables, conditions, and exception handling. Evaluation results indicate an accuracy of 86.67%, with balanced precision and recall across both classes. Confusion matrix analysis identifies the presence of false positives and false negatives, albeit in relatively low numbers. The accuracy curve suggests a potential overfitting issue, as training accuracy is higher than testing accuracy. This study demonstrates that the combination of AST-based feature extraction and Random Forest can be an effective approach for automated bug detection, with potential improvements through model optimization and a larger dataset.
Document Type: Article
ISSN: 2528-2247
2355-6579
DOI: 10.31294/inf.v12i2.25340
Rights: CC BY SA
Accession Number: edsair.doi...........ee702b6f060dd02ced4a2a60eff657db
Database: OpenAIRE
Description
Abstract:Bug detection in Python programming is a crucial aspect of software development. This study develops an automated bug detection system using feature extraction based on Abstract Syntax Tree (AST) and a Random Forest Classifier model. The dataset consists of 100 manually classified bugged files and 100 non-bugged files. The model is trained using structural code features such as the number of functions, classes, variables, conditions, and exception handling. Evaluation results indicate an accuracy of 86.67%, with balanced precision and recall across both classes. Confusion matrix analysis identifies the presence of false positives and false negatives, albeit in relatively low numbers. The accuracy curve suggests a potential overfitting issue, as training accuracy is higher than testing accuracy. This study demonstrates that the combination of AST-based feature extraction and Random Forest can be an effective approach for automated bug detection, with potential improvements through model optimization and a larger dataset.
ISSN:25282247
23556579
DOI:10.31294/inf.v12i2.25340