PGX.D a fast distributed graph processing engine

Graph analysis is a powerful method in data analysis. Although several frameworks have been proposed for processing large graph instances in distributed environments, their performance is much lower than using efficient single-machine implementations provided with enough memory. In this paper, we pr...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis s. 1 - 12
Hlavní autoři:	Hong, Sungpack, Depner, Siegfried, Manhardt, Thomas, Van Der Lugt, Jan, Verstraaten, Merijn, Chafi, Hassan
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	New York, NY, USA ACM 15.11.2015
Edice:	ACM Conferences
Témata:	Algorithm design and analysis Bandwidth Clustering algorithms Computational modeling Computer systems organization > Dependable and fault-tolerant systems and networks Data models General and reference > Cross-computing tools and techniques > Performance Kernel Mathematics of computing > Discrete mathematics > Graph theory > Graph algorithms Networks > Network performance evaluation Programming Theory of computation > Models of computation > Concurrency Theory of computation > Models of computation > Concurrency > Parallel computing models
ISBN:	1450337236, 9781450337236
ISSN:	2167-4337
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Graph analysis is a powerful method in data analysis. Although several frameworks have been proposed for processing large graph instances in distributed environments, their performance is much lower than using efficient single-machine implementations provided with enough memory. In this paper, we present a fast distributed graph processing system, namely PGX.D. We show that PGX.D outperforms other distributed graph systems like GraphLab significantly (3x -- 90x). Furthermore, PGX.D on 4 to 16 machines is also faster than an implementation optimized for single-machine execution. Using a fast cooperative context-switching mechanism, we implement PGX.D as a low-overhead, bandwidth-efficient communication framework that supports remote data-pulling patterns. Moreover, PGX.D achieves large traffic reduction and good workload balance by applying selective ghost nodes, edge partitioning, and edge chunking transparently to the user. Our analysis confirms that each of these features is indeed crucial for overall performance of certain kinds of graph algorithms. Finally, we advocate the use of balanced beefy clusters where the sustained random DRAM-access bandwidth in aggregate is matched with the bandwidth of the underlying interconnection fabric.
ISBN:	1450337236 9781450337236
ISSN:	2167-4337
DOI:	10.1145/2807591.2807620