CuSP: A Customizable Streaming Edge Partitioner for Distributed Graph Analytics

Graph analytics systems must analyze graphs with billions of vertices and edges which require several terabytes of storage. Distributed-memory clusters are often used for analyzing such large graphs since the main memory of a single machine is usually restricted to a few hundreds of gigabytes. This...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings - IEEE International Parallel and Distributed Processing Symposium pp. 439 - 450
Main Authors: Hoang, Loc, Dathathri, Roshan, Gill, Gurbinder, Pingali, Keshav
Format: Conference Proceeding
Language:English
Published: IEEE 01.05.2019
Subjects:
ISSN:1530-2075
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Graph analytics systems must analyze graphs with billions of vertices and edges which require several terabytes of storage. Distributed-memory clusters are often used for analyzing such large graphs since the main memory of a single machine is usually restricted to a few hundreds of gigabytes. This requires partitioning the graph among the machines in the cluster. Existing graph analytics systems usually come with a built-in partitioner that incorporates a particular partitioning policy, but the best partitioning policy is dependent on the algorithm, input graph, and platform. Therefore, built-in partitioners are not sufficiently flexible. Stand-alone graph partitioners are available, but they too implement only a small number of partitioning policies. This paper presents CuSP, a fast streaming edge partitioning framework which permits users to specify the desired partitioning policy at a high level of abstraction and generates high-quality graph partitions fast. For example, it can partition wdc12, the largest publicly available web-crawl graph, with 4 billion vertices and 129 billion edges, in under 2 minutes for clusters with 128 machines. Our experiments show that it can produce quality partitions 6× faster on average than the state-of-the-art standalone partitioner in the literature while supporting a wider range of partitioning policies.
ISSN:1530-2075
DOI:10.1109/IPDPS.2019.00054