Partitioning-Aware Performance Modeling of Distributed Graph Processing Tasks
Much of the data being produced in large scale by modern applications represents connected entities and their relationships, that can be modeled as large graphs. In order to extract valuable information from these large datasets, several parallel and distributed graph processing engines have been pr...
Saved in:
| Published in: | International journal of parallel programming Vol. 51; no. 4-5; pp. 231 - 255 |
|---|---|
| Main Authors: | , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York
Springer US
01.10.2023
Springer Nature B.V |
| Subjects: | |
| ISSN: | 0885-7458, 1573-7640 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Much of the data being produced in large scale by modern applications represents connected entities and their relationships, that can be modeled as large graphs. In order to extract valuable information from these large datasets, several parallel and distributed graph processing engines have been proposed. These systems are designed to run in large clusters, where resources must by allocated efficiently. Aiming to handle this problem, this paper presents a performance prediction model for GPS, a popular Pregel-based graph processing framework. By leveraging a micro-partitioning technique, our system can use various partitioning algorithms that greatly reduce the execution time, comparing with the simple hash partitioning that is commonly used in graph processing systems. Experimental results show that the prediction model has accuracy close to 90%, allowing it to be used in schedulers or to estimate the cost of running graph processing tasks. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0885-7458 1573-7640 |
| DOI: | 10.1007/s10766-023-00753-w |