Application of Natural Neighbor-based Algorithm on Oversampling SMOTE Algorithms

Classification performance depends highly on data distribution. In real life, data often come imbalanced where one class is found more often than others. SMOTE-based algorithms are usually used to handle the class imbalance problem. One key parameter that algorithms in SMOTE family require is k-the...

Full description

Saved in:
Bibliographic Details
Published in:2021 7th International Conference on Engineering, Applied Sciences and Technology (ICEAST) pp. 217 - 220
Main Authors: Srinilta, Chutimet, Kanharattanachai, Sivakorn
Format: Conference Proceeding
Language:English
Published: IEEE 01.04.2021
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Classification performance depends highly on data distribution. In real life, data often come imbalanced where one class is found more often than others. SMOTE-based algorithms are usually used to handle the class imbalance problem. One key parameter that algorithms in SMOTE family require is k-the number of nearest neighbors with respect to a certain data point. K that fits the dataset the most gives the optimum performance. This paper proposes an approach to suggest a value of the parameter k using Natural Neighbor algorithm. Datasets are made balanced by four SMOTE-based algorithms-standard SMOTE, Safe-Level-SMOTE, ModifiedSMOTE and Weighted-SMOTE. The F-measure and Recall matrices are used to evaluate classification performance of a Support Vector Machine classifier running against six datasets with different imbalance ratios. The results show that, the average classification performance achieved by the proposed k's is closer to the optimum when compared with the performance given by the default value of k.
DOI:10.1109/ICEAST52143.2021.9426310