Pardicle parallel approximate density-based clustering

Dbscan is a widely used isodensity-based clustering algorithm for particle data well-known for its ability to isolate arbitrarily-shaped clusters and to filter noise data. The algorithm is super-linear (O(nlogn)) and computationally expensive for large datasets. Given the need for speed, we propose...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis S. 560 - 571
Hauptverfasser:	Patwary, Md. Mostofa Ali, Satish, Nadathur, Sundaram, Narayanan, Manne, Fredrik, Habib, Salman, Dubey, Pradeep
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	Piscataway, NJ, USA IEEE Press 16.11.2014 IEEE
Schriftenreihe:	ACM Conferences
Schlagworte:	approximate clustering algorithm Approximation algorithms Approximation methods Clustering algorithms Computer systems organization > Dependable and fault-tolerant systems and networks Computing methodologies > Machine learning > Learning paradigms > Unsupervised learning > Cluster analysis Data structures Density based clustering Disjoint-set data structure General and reference > Cross-computing tools and techniques > Performance Heuristic algorithms Instruction sets Mathematics of computing > Mathematical analysis > Functional analysis > Approximation Networks > Network performance evaluation Partitioning algorithms Theory of computation > Design and analysis of algorithms > Approximation algorithms analysis Union-Find algorithm approximate clustering algorithm union-find algorithm disjoint-set data structure density based clustering
ISBN:	1479955000, 9781479955008
ISSN:	2167-4329
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Dbscan is a widely used isodensity-based clustering algorithm for particle data well-known for its ability to isolate arbitrarily-shaped clusters and to filter noise data. The algorithm is super-linear (O(nlogn)) and computationally expensive for large datasets. Given the need for speed, we propose a fast heuristic algorithm for Dbscan using density based sampling, which performs equally well in quality compared to exact algorithms, but is more than an order of magnitude faster. Our experiments on astrophysics and synthetic massive datasets (8.5 billion numbers) shows that our approximate algorithm is up to 56x faster than exact algorithms with almost identical quality (Omega-Index ≥ 0.99). We develop a new parallel Dbscan algorithm, which uses dynamic partitioning to improve load balancing and locality. We demonstrate near-linear speedup on shared memory (15x using 16 cores, single node Intel® Xeon® processor) and distributed memory (3917x using 4096 cores, multinode) computers, with 2x additional performance improvement using Intel® Xeon Phi™ coprocessors. Additionally, existing exact algorithms can achieve up to 3.4 times speedup using dynamic partitioning.
ISBN:	1479955000 9781479955008
ISSN:	2167-4329
DOI:	10.1109/SC.2014.51