A Novel Oversampling Technique to Solve Class Imbalance Problem: A Case Study of Students' Grades Evaluation

The academic performance of the students is one of the critical aspects in ranking educational institutions, particularly at the secondary level. If the student's performance is not appropriately defined, then the institution's reputation is at risk. Therefore, data mining could be used fo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2021 International Conference on Computing, Networking, Telecommunications & Engineering Sciences Applications (CoNTESA) S. 69 - 75
Hauptverfasser: Jahin, Dilshad, Emu, Israt Jahan, Akter, Subrina, Patwary, Muhammed J.A., Bhuiyan, Mohammad Arif Sobhan, Miraz, Mahdi H.
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 09.12.2021
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The academic performance of the students is one of the critical aspects in ranking educational institutions, particularly at the secondary level. If the student's performance is not appropriately defined, then the institution's reputation is at risk. Therefore, data mining could be used for this purpose, to attain high accuracy. However, the data being incomplete, inaccurate and/or noisy, or with an imbalance class label in the dataset, is highly likely to affect the accuracy of the data mining model. This paper proposes a semi-supervised oversampling method to first prepare a balanced dataset and then to classify the students' grades into a binary class with overall performance in any given course. The student performance dataset from the UCI machine learning repository is used, which contains student performance related data of two different courses. A detailed validation result shows that the decision tree algorithm performs better with the balanced dataset compared to the imbalanced one.
DOI:10.1109/CoNTESA52813.2021.9657151