An improved density peaks clustering algorithm with fast finding cluster centers

Fast and efficient are common requirements for all clustering algorithms. Density peaks clustering algorithm (DPC) can deal with non-spherical clusters well. However, due to the difficulty of large-scale data set storage and its high computational complexity, how to conduct effective data mining has...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Knowledge-based systems Ročník 158; s. 65 - 74
Hlavní autoři: Xu, Xiao, Ding, Shifei, Shi, Zhongzhi
Médium: Journal Article
Jazyk:angličtina
Vydáno: Amsterdam Elsevier B.V 15.10.2018
Elsevier Science Ltd
Témata:
ISSN:0950-7051, 1872-7409
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Fast and efficient are common requirements for all clustering algorithms. Density peaks clustering algorithm (DPC) can deal with non-spherical clusters well. However, due to the difficulty of large-scale data set storage and its high computational complexity, how to conduct effective data mining has become a challenge. To address this issue, we propose an improved density peaks clustering algorithm with fast finding cluster centers, which improves the efficiency of DPC algorithm by screening points with higher local density based on two novel prescreening strategies. The first strategy is based on the grid-division (GDPC), which screens points according to the density of corresponding grid cells. The second strategy is based on the circle-division (CDPC), which screens the points according to the uneven distribution of data sets in the corresponding circles. Theoretical analysis and experimental results show that both the prescreening strategies can reduce the calculation complexity, and the proposed algorithm not only more satisfied than DPC algorithm, but also superior than well-known Nyström-SC algorithm on the large-scale data sets. Moreover, due to the different theories of the two prescreening strategies, the first strategy is faster and the second strategy is more accurate on the large-scale data sets.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0950-7051
1872-7409
DOI:10.1016/j.knosys.2018.05.034