A fast density peaks clustering algorithm with sparse search

Given a large unlabeled set of complex data, how to efficiently and effectively group them into clusters remains a challenging problem. Density peaks clustering (DPC) algorithm is an emerging algorithm, which identifies cluster centers based on a decision graph. Without setting the number of cluster...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Information sciences Ročník 554; s. 61 - 83
Hlavní autoři: Xu, Xiao, Ding, Shifei, Wang, Yanru, Wang, Lijuan, Jia, Weikuan
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Inc 01.04.2021
Témata:
ISSN:0020-0255, 1872-6291
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Given a large unlabeled set of complex data, how to efficiently and effectively group them into clusters remains a challenging problem. Density peaks clustering (DPC) algorithm is an emerging algorithm, which identifies cluster centers based on a decision graph. Without setting the number of cluster centers, DPC can effectively recognize the clusters. However, the similarity between every two data points must be calculated to construct a decision graph, which results in high computational complexity. To overcome this issue, we propose a fast sparse search density peaks clustering (FSDPC) algorithm to enhance the DPC, which constructs a decision graph with fewer similarity calculations to identify cluster centers quickly. In FSDPC, we design a novel sparse search strategy to measure the similarity between the nearest neighbors of each data points. Therefore, FSDPC can enhance the efficiency of the DPC while maintaining satisfactory results. We also propose a novel random third-party data point method to search the nearest neighbors, which introduces no additional parameters or high computational complexity. The experimental results on synthetic datasets and real-world datasets indicate that the proposed algorithm consistently outperforms the DPC and other state-of-the-art algorithms.
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2020.11.050