Distributed Graph Computation Meets Machine Learning

TuX 2 is a new distributed graph engine that bridges graph computation and distributed machine learning. TuX 2 inherits the benefits of elegant graph computation model, efficient graph layout, and balanced parallelism to scale to billion-edge graphs, while extended and optimized for distributed mach...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE transactions on parallel and distributed systems Ročník 31; číslo 7; s. 1588 - 1604
Hlavní autoři:	Xiao, Wencong, Xue, Jilong, Miao, Youshan, Li, Zhen, Chen, Cheng, Wu, Ming, Li, Wei, Zhou, Lidong
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	New York IEEE 01.07.2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:	Algorithms distributed machine learning Fault tolerance Graph computing Graph theory heterogeneity Layouts Machine learning MEGA model Optimization stale synchronous parallel
ISSN:	1045-9219, 1558-2183
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	TuX 2 is a new distributed graph engine that bridges graph computation and distributed machine learning. TuX 2 inherits the benefits of elegant graph computation model, efficient graph layout, and balanced parallelism to scale to billion-edge graphs, while extended and optimized for distributed machine learning to support heterogeneity in data model, Stale Synchronous Parallel in scheduling, and a new Mini-batch, Exchange, GlobalSync, and Apply ( MEGA ) model for programming. TuX 2 further introduces a hybrid vertex-cut graph optimization and supports various consistency models in fault tolerance for machine learning. We have developed a set of representative distributed machine learning algorithms in TuX 2 , covering both supervised and unsupervised learning. Compared to the implementations on distributed machine learning platforms, writing those algorithms in TuX 2 takes only about 25 percent of the code: our graph computation model hides the detailed management of data layout, partitioning, and parallelism from developers. The extensive evaluation of TuX 2 , using large datasets with up to 64 billion of edges, shows that TuX 2 outperforms PowerGraph/PowerLyra, the state-of-the-art distributed graph engines, by an order of magnitude, while beating two state-of-the-art distributed machine learning systems by at least 60 percent.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1045-9219 1558-2183
DOI:	10.1109/TPDS.2020.2970047