Fast density clustering strategies based on the k-means algorithm

•Study how to enhance the scalability of a density clustering algorithm by k-means.•Propose an accelerated algorithm of density clustering by k-means.•Propose an approximate algorithm of density clustering by k-means.•Show the effectiveness and efficiency of these algorithms. Clustering by fast sear...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Pattern recognition Ročník 71; s. 375 - 386
Hlavní autori: Bai, Liang, Cheng, Xueqi, Liang, Jiye, Shen, Huawei, Guo, Yike
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Elsevier Ltd 01.11.2017
Predmet:
ISSN:0031-3203, 1873-5142
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:•Study how to enhance the scalability of a density clustering algorithm by k-means.•Propose an accelerated algorithm of density clustering by k-means.•Propose an approximate algorithm of density clustering by k-means.•Show the effectiveness and efficiency of these algorithms. Clustering by fast search and find of density peaks (CFSFDP) is a state-of-the-art density-based clustering algorithm that can effectively find clusters with arbitrary shapes. However, it requires to calculate the distances between all the points in a data set to determine the density and separation of each point. Consequently, its computational cost is extremely high in the case of large-scale data sets. In this study, we investigate the application of the k-means algorithm, which is a fast clustering technique, to enhance the scalability of the CFSFDP algorithm while maintaining its clustering results as far as possible. Toward this end, we propose two strategies. First, based on concept approximation, an acceleration algorithm (CFSFDP+A) involving fewer distance calculations is proposed to obtain the same clustering results as those of the original algorithm. Second, to further expand the scalability of the original algorithm, an approximate algorithm (CFSFDP+DE) based on exemplar clustering is proposed to rapidly obtain approximate clustering results of the original algorithm. Finally, experiments are conducted to illustrate the effectiveness and scalability of the proposed algorithms on several synthetic and real data sets.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2017.06.023