Fast density clustering strategies based on the k-means algorithm
•Study how to enhance the scalability of a density clustering algorithm by k-means.•Propose an accelerated algorithm of density clustering by k-means.•Propose an approximate algorithm of density clustering by k-means.•Show the effectiveness and efficiency of these algorithms. Clustering by fast sear...
Uložené v:
| Vydané v: | Pattern recognition Ročník 71; s. 375 - 386 |
|---|---|
| Hlavní autori: | , , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Elsevier Ltd
01.11.2017
|
| Predmet: | |
| ISSN: | 0031-3203, 1873-5142 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | •Study how to enhance the scalability of a density clustering algorithm by k-means.•Propose an accelerated algorithm of density clustering by k-means.•Propose an approximate algorithm of density clustering by k-means.•Show the effectiveness and efficiency of these algorithms.
Clustering by fast search and find of density peaks (CFSFDP) is a state-of-the-art density-based clustering algorithm that can effectively find clusters with arbitrary shapes. However, it requires to calculate the distances between all the points in a data set to determine the density and separation of each point. Consequently, its computational cost is extremely high in the case of large-scale data sets. In this study, we investigate the application of the k-means algorithm, which is a fast clustering technique, to enhance the scalability of the CFSFDP algorithm while maintaining its clustering results as far as possible. Toward this end, we propose two strategies. First, based on concept approximation, an acceleration algorithm (CFSFDP+A) involving fewer distance calculations is proposed to obtain the same clustering results as those of the original algorithm. Second, to further expand the scalability of the original algorithm, an approximate algorithm (CFSFDP+DE) based on exemplar clustering is proposed to rapidly obtain approximate clustering results of the original algorithm. Finally, experiments are conducted to illustrate the effectiveness and scalability of the proposed algorithms on several synthetic and real data sets. |
|---|---|
| ISSN: | 0031-3203 1873-5142 |
| DOI: | 10.1016/j.patcog.2017.06.023 |