Application of Natural Neighbor-based Algorithm on Oversampling SMOTE Algorithms

Classification performance depends highly on data distribution. In real life, data often come imbalanced where one class is found more often than others. SMOTE-based algorithms are usually used to handle the class imbalance problem. One key parameter that algorithms in SMOTE family require is k-the...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2021 7th International Conference on Engineering, Applied Sciences and Technology (ICEAST) s. 217 - 220
Hlavní autoři:	Srinilta, Chutimet, Kanharattanachai, Sivakorn
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 01.04.2021
Témata:	Classification algorithms imbalanced data Measurement natural neighbor oversampling SMOTE Support vector machines
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Classification performance depends highly on data distribution. In real life, data often come imbalanced where one class is found more often than others. SMOTE-based algorithms are usually used to handle the class imbalance problem. One key parameter that algorithms in SMOTE family require is k-the number of nearest neighbors with respect to a certain data point. K that fits the dataset the most gives the optimum performance. This paper proposes an approach to suggest a value of the parameter k using Natural Neighbor algorithm. Datasets are made balanced by four SMOTE-based algorithms-standard SMOTE, Safe-Level-SMOTE, ModifiedSMOTE and Weighted-SMOTE. The F-measure and Recall matrices are used to evaluate classification performance of a Support Vector Machine classifier running against six datasets with different imbalance ratios. The results show that, the average classification performance achieved by the proposed k's is closer to the optimum when compared with the performance given by the default value of k.
DOI:	10.1109/ICEAST52143.2021.9426310