A performance predictor for implementation selection of parallelized static and temporal graph algorithms

Task‐based execution of graph workloads allows various ordered and unordered implementations, with tasks representing dependencies between graph vertices and edges. This work explores graph algorithms in the context of ordered and unordered task‐based implementations, that trade‐off work‐efficiency...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Concurrency and computation Ročník 34; číslo 2
Hlavní autori: Rehman, Akif, Ahmad, Masab, Khan, Omer
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Hoboken Wiley Subscription Services, Inc 25.01.2022
Predmet:
ISSN:1532-0626, 1532-0634
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Task‐based execution of graph workloads allows various ordered and unordered implementations, with tasks representing dependencies between graph vertices and edges. This work explores graph algorithms in the context of ordered and unordered task‐based implementations, that trade‐off work‐efficiency with parallelism. The monotonicity of convergent graph solutions is the reason behind the trade‐off between work‐efficiency and parallelism. This trade‐off results in variable performance‐based choices within and across different machines (CPUs and GPUs), graph algorithms, implementations (ordered, relaxed, and unordered). Input graphs also augment this choice space, with this work analyzing temporally changing graphs in addition to the static graphs explored by prior works. These algorithmic and architectural choices are first explored in this work, and it is seen that different graph workload‐input combinations perform optimally on diverse architectural configurations. The resulting choice space is analyzed and this work represents it in the form of characteristic variables that correlate with each choice space. Using these characteristic variables, this work proposes analytical and neural network models to correlate these choice spaces to find the best performing implementation. The variables and the prediction models proposed in this work are also integrated with a state‐of‐the‐art performance predictor on a multiaccelerator setup, and shows geometric performance gains of 54% on a CPU, 14% on a GPU, and 31.5% in a multiaccelerator setup over baseline implementations without performance prediction.
Bibliografia:Funding information
National Science Foundation, CNS‐1718481; U.S. Naval Research Laboratory
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1532-0626
1532-0634
DOI:10.1002/cpe.6267