Bipartite-Oriented Distributed Graph Partitioning for Big Learning

Many machine learning and data mining (MLDM] problems like recommendation, topic modeling, and medical diagnosis can be modeled as computing on bipartite graphs. However, inost distributed graph-parallel systems are oblivious to the unique characteristics in such graphs and existing online graph par...

Full description

Saved in:
Bibliographic Details
Published in:Journal of computer science and technology Vol. 30; no. 1; pp. 20 - 29
Main Author: 陈榕 施佳鑫 陈海波 臧斌宇
Format: Journal Article
Language:English
Published: Boston Springer US 2015
Springer Nature B.V
Subjects:
ISSN:1000-9000, 1860-4749
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Many machine learning and data mining (MLDM] problems like recommendation, topic modeling, and medical diagnosis can be modeled as computing on bipartite graphs. However, inost distributed graph-parallel systems are oblivious to the unique characteristics in such graphs and existing online graph partitioning algorithms usually cause excessive repli- cation of vertices as well as significant pressure on network communication. This article identifies the challenges and oppor- tunities of partitioning bipartite graphs for distributed MLDM processing and proposes BiGraph, a set of bipartite-oriented graph partitioning algorithms. BiGraph leverages observations such as the skewed distribution of vertices, discriminated computation load and imbalanced data sizes between the two subsets of vertices to derive a set of optimal graph partition- ing algorithms that result in minimal vertex replication and network communication. BiGraph has been implemented on PowerGraph and is shown to have a performance boost up to 17.75X (from 1.16X) for four typical MLDM algorithnls, due to reducing up to 80% vertex replication, and up to 96% network traffic.
Bibliography:bipartite graph, graph partitioning, graph-parallel system
Many machine learning and data mining (MLDM] problems like recommendation, topic modeling, and medical diagnosis can be modeled as computing on bipartite graphs. However, inost distributed graph-parallel systems are oblivious to the unique characteristics in such graphs and existing online graph partitioning algorithms usually cause excessive repli- cation of vertices as well as significant pressure on network communication. This article identifies the challenges and oppor- tunities of partitioning bipartite graphs for distributed MLDM processing and proposes BiGraph, a set of bipartite-oriented graph partitioning algorithms. BiGraph leverages observations such as the skewed distribution of vertices, discriminated computation load and imbalanced data sizes between the two subsets of vertices to derive a set of optimal graph partition- ing algorithms that result in minimal vertex replication and network communication. BiGraph has been implemented on PowerGraph and is shown to have a performance boost up to 17.75X (from 1.16X) for four typical MLDM algorithnls, due to reducing up to 80% vertex replication, and up to 96% network traffic.
11-2296/TP
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1000-9000
1860-4749
DOI:10.1007/s11390-015-1501-x