A fast DBSCAN clustering algorithm by accelerating neighbor searching using Groups method

Density based clustering methods are proposed for clustering spatial databases with noise. Density Based Spatial Clustering of Applications with Noise (DBSCAN) can discover clusters of arbitrary shape and also handles outliers effectively. DBSCAN obtains clusters by finding the number of points with...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Pattern recognition Ročník 58; s. 39 - 48
Hlavní autori:	Mahesh Kumar, K., Rama Mohan Reddy, A.
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Elsevier Ltd 01.10.2016
Predmet:	Benchmarking Clustering Clusters Computation DBSCAN Density Density based clustering Neighborhood graph Noise Pattern recognition Searching Unsupervised learning DBSCAN Density based clustering Unsupervised learning Neighborhood graph
ISSN:	0031-3203, 1873-5142
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Density based clustering methods are proposed for clustering spatial databases with noise. Density Based Spatial Clustering of Applications with Noise (DBSCAN) can discover clusters of arbitrary shape and also handles outliers effectively. DBSCAN obtains clusters by finding the number of points within the specified distance from a given point. It involves computing distances from given point to all other points in the dataset. The conventional index based methods construct a hierarchical structure over the dataset to speed-up the neighbor search operations. The hierarchical index-structures fail to scale for datasets of dimensionality above 20. In this paper, we propose a novel graph-based index structure method Groups that accelerates the neighbor search operations and also scalable for high dimensional datasets. Experimental results show that the proposed method improves the speed of DBSCAN by a factor of about 1.5–2.2 on benchmark datasets. The performance of DBSCAN degrades considerably with noise due to unnecessary distance computations introduced by noise points while the proposed method is robust to noise by pruning out noise points early and eliminating the unnecessary distance computations. The cluster results produced by our method are exactly similar to that of DBSCAN but executed at a much faster pace. •A graph-based index structure is built for speeding up neighbor search operations.•No additional inputs are required to build the index structure.•Proposed method is scalable for high-dimensional datasets.•Handles noise effectively to improve the performance of DBSCAN.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0031-3203 1873-5142
DOI:	10.1016/j.patcog.2016.03.008