A Novel Oversampling Technique to Solve Class Imbalance Problem: A Case Study of Students' Grades Evaluation

The academic performance of the students is one of the critical aspects in ranking educational institutions, particularly at the secondary level. If the student's performance is not appropriately defined, then the institution's reputation is at risk. Therefore, data mining could be used fo...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2021 International Conference on Computing, Networking, Telecommunications & Engineering Sciences Applications (CoNTESA) s. 69 - 75
Hlavní autoři: Jahin, Dilshad, Emu, Israt Jahan, Akter, Subrina, Patwary, Muhammed J.A., Bhuiyan, Mohammad Arif Sobhan, Miraz, Mahdi H.
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 09.12.2021
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The academic performance of the students is one of the critical aspects in ranking educational institutions, particularly at the secondary level. If the student's performance is not appropriately defined, then the institution's reputation is at risk. Therefore, data mining could be used for this purpose, to attain high accuracy. However, the data being incomplete, inaccurate and/or noisy, or with an imbalance class label in the dataset, is highly likely to affect the accuracy of the data mining model. This paper proposes a semi-supervised oversampling method to first prepare a balanced dataset and then to classify the students' grades into a binary class with overall performance in any given course. The student performance dataset from the UCI machine learning repository is used, which contains student performance related data of two different courses. A detailed validation result shows that the decision tree algorithm performs better with the balanced dataset compared to the imbalanced one.
DOI:10.1109/CoNTESA52813.2021.9657151