Partitioning-Aware Performance Modeling of Distributed Graph Processing Tasks

Much of the data being produced in large scale by modern applications represents connected entities and their relationships, that can be modeled as large graphs. In order to extract valuable information from these large datasets, several parallel and distributed graph processing engines have been pr...

Full description

Saved in:
Bibliographic Details
Published in:International journal of parallel programming Vol. 51; no. 4-5; pp. 231 - 255
Main Authors: Presser, Daniel, Siqueira, Frank
Format: Journal Article
Language:English
Published: New York Springer US 01.10.2023
Springer Nature B.V
Subjects:
ISSN:0885-7458, 1573-7640
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Much of the data being produced in large scale by modern applications represents connected entities and their relationships, that can be modeled as large graphs. In order to extract valuable information from these large datasets, several parallel and distributed graph processing engines have been proposed. These systems are designed to run in large clusters, where resources must by allocated efficiently. Aiming to handle this problem, this paper presents a performance prediction model for GPS, a popular Pregel-based graph processing framework. By leveraging a micro-partitioning technique, our system can use various partitioning algorithms that greatly reduce the execution time, comparing with the simple hash partitioning that is commonly used in graph processing systems. Experimental results show that the prediction model has accuracy close to 90%, allowing it to be used in schedulers or to estimate the cost of running graph processing tasks.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0885-7458
1573-7640
DOI:10.1007/s10766-023-00753-w