HY-DBSCAN: A hybrid parallel DBSCAN clustering algorithm scalable on distributed-memory computers

•A parallel scalable DBSCAN algorithm which outperforms other implementations.•Optimizations for data partitioning, spatial indexing, and cluster merging.•Exploiting hybrid parallelization to take advantage of modern HPC architectures.•Demonstrating accuracy, performance and scalability of our algor...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of parallel and distributed computing Ročník 168; s. 57 - 69
Hlavní autoři: Wu, Guoqing, Cao, Liqiang, Tian, Hongyun, Wang, Wei
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Inc 01.10.2022
Témata:
ISSN:0743-7315
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:•A parallel scalable DBSCAN algorithm which outperforms other implementations.•Optimizations for data partitioning, spatial indexing, and cluster merging.•Exploiting hybrid parallelization to take advantage of modern HPC architectures.•Demonstrating accuracy, performance and scalability of our algorithm. Dbscan is a density-based clustering algorithm which is well known for its ability to discover clusters of arbitrary shape as well as to distinguish noise. As it is computationally expensive for large datasets, research studies on the parallelization of Dbscan have been received a considerable amount of attention. In this paper we present an exact, efficient and scalable parallel Dbscan algorithm which we call Hy-Dbscan. It employs three major techniques to enable scalable data clustering on distributed-memory computers i) a modified kd-tree for domain decomposition, ii) a spatial indexing approach based on grid and inference, and iii) a cluster merging scheme based on distributed Rem's Union-Find algorithm. Moreover, Hy-Dbscan exploits process level and thread level parallelization. In experiments, we have demonstrated performance and scalability using two scientific datasets on up to 2048 cores of a distributed-memory computer. Through extensive evaluation, we show that Hy-Dbscan significantly outperforms previous state-of-the-art Dbscan implementations.
ISSN:0743-7315
DOI:10.1016/j.jpdc.2022.06.005