Partitioning-Aware Performance Modeling of Distributed Graph Processing Tasks

Much of the data being produced in large scale by modern applications represents connected entities and their relationships, that can be modeled as large graphs. In order to extract valuable information from these large datasets, several parallel and distributed graph processing engines have been pr...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:International journal of parallel programming Ročník 51; číslo 4-5; s. 231 - 255
Hlavní autoři: Presser, Daniel, Siqueira, Frank
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York Springer US 01.10.2023
Springer Nature B.V
Témata:
ISSN:0885-7458, 1573-7640
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Much of the data being produced in large scale by modern applications represents connected entities and their relationships, that can be modeled as large graphs. In order to extract valuable information from these large datasets, several parallel and distributed graph processing engines have been proposed. These systems are designed to run in large clusters, where resources must by allocated efficiently. Aiming to handle this problem, this paper presents a performance prediction model for GPS, a popular Pregel-based graph processing framework. By leveraging a micro-partitioning technique, our system can use various partitioning algorithms that greatly reduce the execution time, comparing with the simple hash partitioning that is commonly used in graph processing systems. Experimental results show that the prediction model has accuracy close to 90%, allowing it to be used in schedulers or to estimate the cost of running graph processing tasks.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0885-7458
1573-7640
DOI:10.1007/s10766-023-00753-w