Pardicle parallel approximate density-based clustering

Dbscan is a widely used isodensity-based clustering algorithm for particle data well-known for its ability to isolate arbitrarily-shaped clusters and to filter noise data. The algorithm is super-linear (O(nlogn)) and computationally expensive for large datasets. Given the need for speed, we propose...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis s. 560 - 571
Hlavní autoři: Patwary, Md. Mostofa Ali, Satish, Nadathur, Sundaram, Narayanan, Manne, Fredrik, Habib, Salman, Dubey, Pradeep
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: Piscataway, NJ, USA IEEE Press 16.11.2014
IEEE
Edice:ACM Conferences
Témata:
ISBN:1479955000, 9781479955008
ISSN:2167-4329
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Dbscan is a widely used isodensity-based clustering algorithm for particle data well-known for its ability to isolate arbitrarily-shaped clusters and to filter noise data. The algorithm is super-linear (O(nlogn)) and computationally expensive for large datasets. Given the need for speed, we propose a fast heuristic algorithm for Dbscan using density based sampling, which performs equally well in quality compared to exact algorithms, but is more than an order of magnitude faster. Our experiments on astrophysics and synthetic massive datasets (8.5 billion numbers) shows that our approximate algorithm is up to 56x faster than exact algorithms with almost identical quality (Omega-Index ≥ 0.99). We develop a new parallel Dbscan algorithm, which uses dynamic partitioning to improve load balancing and locality. We demonstrate near-linear speedup on shared memory (15x using 16 cores, single node Intel® Xeon® processor) and distributed memory (3917x using 4096 cores, multinode) computers, with 2x additional performance improvement using Intel® Xeon Phi™ coprocessors. Additionally, existing exact algorithms can achieve up to 3.4 times speedup using dynamic partitioning.
ISBN:1479955000
9781479955008
ISSN:2167-4329
DOI:10.1109/SC.2014.51