Radial-Based Undersampling for imbalanced data classification

•The concept of mutual class potential is extended to the undersampling procedure.•Radial-Based Undersampling offers significantly lower computational complexity.•Method achieves significantly better results when combined with selected classifiers.•Areas of applicability of the algorithm are identif...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Pattern recognition Ročník 102; s. 107262
Hlavný autor: Koziarski, Michał
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Elsevier Ltd 01.06.2020
Predmet:
ISSN:0031-3203, 1873-5142
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:•The concept of mutual class potential is extended to the undersampling procedure.•Radial-Based Undersampling offers significantly lower computational complexity.•Method achieves significantly better results when combined with selected classifiers.•Areas of applicability of the algorithm are identified. Data imbalance remains one of the most widespread problems affecting contemporary machine learning. The negative effect data imbalance can have on the traditional learning algorithms is most severe in combination with other dataset difficulty factors, such as small disjuncts, presence of outliers and insufficient number of training observations. Aforementioned difficulty factors can also limit the applicability of some of the methods of dealing with data imbalance, in particular the neighborhood-based oversampling algorithms based on SMOTE. Radial-Based Oversampling (RBO) was previously proposed to mitigate some of the limitations of the neighborhood-based methods. In this paper we examine the possibility of utilizing the concept of mutual class potential, used to guide the oversampling process in RBO, in the undersampling procedure. Conducted computational complexity analysis indicates a significantly reduced time complexity of the proposed Radial-Based Undersampling algorithm, and the results of the performed experimental study indicate its usefulness, especially on difficult datasets.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2020.107262