Fast Density Peaks Clustering Algorithm Based on Approximate k-Nearest Neighbors

Density peaks clustering (DPC) is one of the density-based clustering algorithms and has been widely studied and applied in recent years because of its unique parameter, non-iteration and good robustness. However, it cannot effectively identify the cluster centers, and time and space complexities ar...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on knowledge and data engineering Ročník 37; číslo 10; s. 5878 - 5889
Hlavní autoři: Ding, Shifei, Li, Chao, Xu, Xiao, Guo, Lili, Ding, Ling, Wu, Xindong
Médium: Journal Article
Jazyk:angličtina
Vydáno: IEEE 01.10.2025
Témata:
ISSN:1041-4347, 1558-2191
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Density peaks clustering (DPC) is one of the density-based clustering algorithms and has been widely studied and applied in recent years because of its unique parameter, non-iteration and good robustness. However, it cannot effectively identify the cluster centers, and time and space complexities are too high. To this end, this paper proposes a fast density peaks clustering algorithm based on approximate k -nearest neighbors (FDPAN). Firstly, it uses Balanced K-means based Hierarchical K-means (BKHK) method to partition the data and quickly find the approximate k -nearest neighbors (AKNN), improving the algorithm's efficiency on large-scale high-dimensional data. Meanwhile, three-way clustering is used to improve the neighbor search of the boundary points of the partition. Then, the local density and relative distance of DPC are recalculated by AKNN. Finally, according to the similar density chain, the connected high-density points are labeled while searching for the cluster center, and the remaining points are assigned to the clusters where their nearest higher-density points are located. Theoretical analysis and experiments on synthetic and real datasets show that FDPAN can obtain higher clustering results and shorten the operation time on large-scale high-dimensional data compared with DPC and its variants.
ISSN:1041-4347
1558-2191
DOI:10.1109/TKDE.2025.3589794