A novel density peaks clustering algorithm based on Hopkins statistic

Density peaks clustering (DPC) is a promising algorithm due to straightforward and easy implementation. However, most of its improvements still rely on expert, strong prior information, or complex iterations to identify the cluster centers, which inevitably adds subjectivity and instability. Moreove...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Expert systems with applications Ročník 201; s. 116892
Hlavní autoři: Zhang, Ruilin, Miao, Zhenguo, Tian, Ye, Wang, Hongpeng
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York Elsevier Ltd 01.09.2022
Elsevier BV
Témata:
ISSN:0957-4174, 1873-6793
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Density peaks clustering (DPC) is a promising algorithm due to straightforward and easy implementation. However, most of its improvements still rely on expert, strong prior information, or complex iterations to identify the cluster centers, which inevitably adds subjectivity and instability. Moreover, some crisp and sensitive density metrics will sometimes reduce the representativeness of the center, resulting in poor clustering. To this end, we propose an enhanced algorithm, called Density peaks clustering based on Hopkins Statistic. The main property of the method is to realize the automatic identification of cluster centers without prior information. Specifically, with a two-stage strategy, we first specify some objects as candidate centers by linear regression and residual analysis. Subsequently, inspired by optimization idea we design a novel validity index (AHS) instead of the original decision graph to find the desired centers from the candidates. Another novel part of DPC-AHS is that the proposed adjusted-k-nearest neighbors (A-kNN) dynamically defines the neighbors during the process, which further enhances the robustness against outliers. Finally, we compare performance of DPC-AHS with 7 state-of-the-art methods over synthetic, UCI, and image datasets. Experiments on 25 datasets and in-depth discussion cases from 5 perspectives demonstrate that our algorithm is feasible and effective in clustering and center identification. •A novel density peaks clustering based on Hopkins Statistic (DPC-AHS) is proposed.•DPC-AHS can automatically find clusters and centers without manual participation.•A cluster validity index AHS with low complexity is designed to evaluate clustering.•Experiments and discussions on various datasets show the effectiveness of our method.•DPC-AHS requires only one parameter and can be applied to high dimensional data.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2022.116892