Application of Natural Neighbor-based Algorithm on Oversampling SMOTE Algorithms
Classification performance depends highly on data distribution. In real life, data often come imbalanced where one class is found more often than others. SMOTE-based algorithms are usually used to handle the class imbalance problem. One key parameter that algorithms in SMOTE family require is k-the...
Uloženo v:
| Vydáno v: | 2021 7th International Conference on Engineering, Applied Sciences and Technology (ICEAST) s. 217 - 220 |
|---|---|
| Hlavní autoři: | , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
01.04.2021
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Classification performance depends highly on data distribution. In real life, data often come imbalanced where one class is found more often than others. SMOTE-based algorithms are usually used to handle the class imbalance problem. One key parameter that algorithms in SMOTE family require is k-the number of nearest neighbors with respect to a certain data point. K that fits the dataset the most gives the optimum performance. This paper proposes an approach to suggest a value of the parameter k using Natural Neighbor algorithm. Datasets are made balanced by four SMOTE-based algorithms-standard SMOTE, Safe-Level-SMOTE, ModifiedSMOTE and Weighted-SMOTE. The F-measure and Recall matrices are used to evaluate classification performance of a Support Vector Machine classifier running against six datasets with different imbalance ratios. The results show that, the average classification performance achieved by the proposed k's is closer to the optimum when compared with the performance given by the default value of k. |
|---|---|
| DOI: | 10.1109/ICEAST52143.2021.9426310 |