GRIDEN: An effective grid-based and density-based spatial clustering algorithm to support parallel computing

•Propose a new effective density-based and grid-based clustering algorithm GRIDEN for massive spatial data.•Present a new concept of ε-neighbor cells to improve the clustering accuracy of grid-based algorithm.•Present a parallel computing algorithm for high dimensional density-based clustering to ac...

Full description

Saved in:
Bibliographic Details
Published in:Pattern recognition letters Vol. 109; pp. 81 - 88
Main Authors: Deng, Chao, Song, Jinwei, Sun, Ruizhi, Cai, Saihua, Shi, Yinxue
Format: Journal Article
Language:English
Published: Amsterdam Elsevier B.V 15.07.2018
Elsevier Science Ltd
Subjects:
ISSN:0167-8655, 1872-7344
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Propose a new effective density-based and grid-based clustering algorithm GRIDEN for massive spatial data.•Present a new concept of ε-neighbor cells to improve the clustering accuracy of grid-based algorithm.•Present a parallel computing algorithm for high dimensional density-based clustering to achieve high performance.•Supports for multi-density clustering and incremental density-based clustering. Density-based clustering has been widely used in many fields. A new effective grid-based and density-based spatial clustering algorithm, GRIDEN, is proposed in this paper, which supports parallel computing in addition to multi-density clustering. It constructs grids using hyper-square cells and provides users with parameter k to control the balance between efficiency and accuracy to increase the flexibility of the algorithm. Compared with conventional density-based algorithms, it achieves much higher performance by eliminating distance calculations among points based on the newly proposed concept of ε-neighbor cells. Compared with conventional grid-based algorithms, it uses a set of symmetric (2k+1)D cells to identify dense cells and the density-connected relationships among cells. Therefore, the maximum calculated deviation of ε-neighbor points in the grid-based algorithm can be controlled to an acceptable level through parameter k. In our experiments, the results demonstrate that GRIDEN can achieve a reliable clustering result that is infinite closed with respect to the exact DBSCAN as parameter k grows, and it requires computational time that is only linear to N.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0167-8655
1872-7344
DOI:10.1016/j.patrec.2017.11.011