An improved density peaks clustering algorithm with fast finding cluster centers
Fast and efficient are common requirements for all clustering algorithms. Density peaks clustering algorithm (DPC) can deal with non-spherical clusters well. However, due to the difficulty of large-scale data set storage and its high computational complexity, how to conduct effective data mining has...
Uloženo v:
| Vydáno v: | Knowledge-based systems Ročník 158; s. 65 - 74 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Amsterdam
Elsevier B.V
15.10.2018
Elsevier Science Ltd |
| Témata: | |
| ISSN: | 0950-7051, 1872-7409 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Fast and efficient are common requirements for all clustering algorithms. Density peaks clustering algorithm (DPC) can deal with non-spherical clusters well. However, due to the difficulty of large-scale data set storage and its high computational complexity, how to conduct effective data mining has become a challenge. To address this issue, we propose an improved density peaks clustering algorithm with fast finding cluster centers, which improves the efficiency of DPC algorithm by screening points with higher local density based on two novel prescreening strategies. The first strategy is based on the grid-division (GDPC), which screens points according to the density of corresponding grid cells. The second strategy is based on the circle-division (CDPC), which screens the points according to the uneven distribution of data sets in the corresponding circles. Theoretical analysis and experimental results show that both the prescreening strategies can reduce the calculation complexity, and the proposed algorithm not only more satisfied than DPC algorithm, but also superior than well-known Nyström-SC algorithm on the large-scale data sets. Moreover, due to the different theories of the two prescreening strategies, the first strategy is faster and the second strategy is more accurate on the large-scale data sets. |
|---|---|
| Bibliografie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0950-7051 1872-7409 |
| DOI: | 10.1016/j.knosys.2018.05.034 |