Engineering High-Performance Community Detection Heuristics for Massive Graphs

The amount of graph-structured data has recently experienced an enormous growth in many applications. To transform such data into useful information, high-performance analytics algorithms and software tools are necessary. One common graph analytics kernel is community detection (or graph clustering)...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings of the International Conference on Parallel Processing s. 180 - 189
Hlavní autoři: Staudt, Christian L., Meyerhenke, Henning
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.10.2013
Témata:
ISSN:0190-3918
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The amount of graph-structured data has recently experienced an enormous growth in many applications. To transform such data into useful information, high-performance analytics algorithms and software tools are necessary. One common graph analytics kernel is community detection (or graph clustering). Despite extensive research on heuristic solvers for this task, only few parallel codes exist, although parallelism is often necessary to scale to the data volume of real-world applications. We address the deficit in computing capability by a flexible and extensible clustering algorithm framework with shared-memory parallelism. Within this framework we implement our parallel variations of known sequential algorithms and combine them by an ensemble approach. In extensive experiments driven by the algorithm engineering paradigm, we identify the most successful parameters and combinations of these algorithms. The processing rate of our fastest algorithm exceeds 10M edges/second for many large graphs, making it suitable for massive data streams. Moreover, the strongest algorithm we developed yields a very good tradeoff between quality and speed.
ISSN:0190-3918
DOI:10.1109/ICPP.2013.27