An Adaptive Clustering Algorithm Based on Local-Density Peaks for Imbalanced Data Without Parameters

Imbalanced data clustering is a challenging problem in machine learning. The main difficulty is caused by the imbalance in both cluster size and data density distribution. To address this problem, we propose a novel clustering algorithm called LDPI based on local-density peaks in this study. First,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on knowledge and data engineering Jg. 35; H. 4; S. 3419 - 3432
Hauptverfasser: Tong, Wuning, Wang, Yuping, Liu, Delong
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York IEEE 01.04.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:
ISSN:1041-4347, 1558-2191
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Imbalanced data clustering is a challenging problem in machine learning. The main difficulty is caused by the imbalance in both cluster size and data density distribution. To address this problem, we propose a novel clustering algorithm called LDPI based on local-density peaks in this study. First, an initial sub-cluster construction scheme is designed based on a 3-dimensional (3-D) decision graph that can easily detect the initial sub-cluster centers and identify the noise points. Second, a sub-cluster updating strategy is designed, which can automatically identify the false sub-cluster centers and update the initial sub-clusters. Third, a sub-cluster merging scheme is designed, which merges the updated initial sub-clusters into final clusters. Consequently, the proposed algorithm has three advantages: 1) It does not require any input parameters; 2) It can automatically determine the cluster centers and number of clusters; 3) It is suitable for imbalanced datasets and datasets with arbitrary shapes and distributions. The effectiveness of LDPI is demonstrated experimentally and the superiority of LDPI is identified by comparison with 5 state-of-the-art algorithms.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1041-4347
1558-2191
DOI:10.1109/TKDE.2021.3138962