Bipartite-Oriented Distributed Graph Partitioning for Big Learning
Many machine learning and data mining (MLDM] problems like recommendation, topic modeling, and medical diagnosis can be modeled as computing on bipartite graphs. However, inost distributed graph-parallel systems are oblivious to the unique characteristics in such graphs and existing online graph par...
Saved in:
| Published in: | Journal of computer science and technology Vol. 30; no. 1; pp. 20 - 29 |
|---|---|
| Main Author: | |
| Format: | Journal Article |
| Language: | English |
| Published: |
Boston
Springer US
2015
Springer Nature B.V |
| Subjects: | |
| ISSN: | 1000-9000, 1860-4749 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Many machine learning and data mining (MLDM] problems like recommendation, topic modeling, and medical diagnosis can be modeled as computing on bipartite graphs. However, inost distributed graph-parallel systems are oblivious to the unique characteristics in such graphs and existing online graph partitioning algorithms usually cause excessive repli- cation of vertices as well as significant pressure on network communication. This article identifies the challenges and oppor- tunities of partitioning bipartite graphs for distributed MLDM processing and proposes BiGraph, a set of bipartite-oriented graph partitioning algorithms. BiGraph leverages observations such as the skewed distribution of vertices, discriminated computation load and imbalanced data sizes between the two subsets of vertices to derive a set of optimal graph partition- ing algorithms that result in minimal vertex replication and network communication. BiGraph has been implemented on PowerGraph and is shown to have a performance boost up to 17.75X (from 1.16X) for four typical MLDM algorithnls, due to reducing up to 80% vertex replication, and up to 96% network traffic. |
|---|---|
| Bibliography: | bipartite graph, graph partitioning, graph-parallel system Many machine learning and data mining (MLDM] problems like recommendation, topic modeling, and medical diagnosis can be modeled as computing on bipartite graphs. However, inost distributed graph-parallel systems are oblivious to the unique characteristics in such graphs and existing online graph partitioning algorithms usually cause excessive repli- cation of vertices as well as significant pressure on network communication. This article identifies the challenges and oppor- tunities of partitioning bipartite graphs for distributed MLDM processing and proposes BiGraph, a set of bipartite-oriented graph partitioning algorithms. BiGraph leverages observations such as the skewed distribution of vertices, discriminated computation load and imbalanced data sizes between the two subsets of vertices to derive a set of optimal graph partition- ing algorithms that result in minimal vertex replication and network communication. BiGraph has been implemented on PowerGraph and is shown to have a performance boost up to 17.75X (from 1.16X) for four typical MLDM algorithnls, due to reducing up to 80% vertex replication, and up to 96% network traffic. 11-2296/TP ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ISSN: | 1000-9000 1860-4749 |
| DOI: | 10.1007/s11390-015-1501-x |