Pardicle parallel approximate density-based clustering

Dbscan is a widely used isodensity-based clustering algorithm for particle data well-known for its ability to isolate arbitrarily-shaped clusters and to filter noise data. The algorithm is super-linear (O(nlogn)) and computationally expensive for large datasets. Given the need for speed, we propose...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis S. 560 - 571
Hauptverfasser: Patwary, Md. Mostofa Ali, Satish, Nadathur, Sundaram, Narayanan, Manne, Fredrik, Habib, Salman, Dubey, Pradeep
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: Piscataway, NJ, USA IEEE Press 16.11.2014
IEEE
Schriftenreihe:ACM Conferences
Schlagworte:
ISBN:1479955000, 9781479955008
ISSN:2167-4329
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Dbscan is a widely used isodensity-based clustering algorithm for particle data well-known for its ability to isolate arbitrarily-shaped clusters and to filter noise data. The algorithm is super-linear (O(nlogn)) and computationally expensive for large datasets. Given the need for speed, we propose a fast heuristic algorithm for Dbscan using density based sampling, which performs equally well in quality compared to exact algorithms, but is more than an order of magnitude faster. Our experiments on astrophysics and synthetic massive datasets (8.5 billion numbers) shows that our approximate algorithm is up to 56x faster than exact algorithms with almost identical quality (Omega-Index ≥ 0.99). We develop a new parallel Dbscan algorithm, which uses dynamic partitioning to improve load balancing and locality. We demonstrate near-linear speedup on shared memory (15x using 16 cores, single node Intel® Xeon® processor) and distributed memory (3917x using 4096 cores, multinode) computers, with 2x additional performance improvement using Intel® Xeon Phi™ coprocessors. Additionally, existing exact algorithms can achieve up to 3.4 times speedup using dynamic partitioning.
ISBN:1479955000
9781479955008
ISSN:2167-4329
DOI:10.1109/SC.2014.51