Automatic Code Generation for High-Performance Graph Algorithms

Graph problems are common across many fields, from scientific computing to social sciences. Despite their importance and the attention received, implementing graph algorithms effectively on modern computing systems remains a challenging task that requires significant programming effort and generally...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT) s. 14 - 26
Hlavní autori: Peng, Zhen, Ashraf, Rizwan A., Guo, Luanzheng, Tian, Ruiqin, Kestor, Gokcen
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 21.10.2023
Predmet:
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Graph problems are common across many fields, from scientific computing to social sciences. Despite their importance and the attention received, implementing graph algorithms effectively on modern computing systems remains a challenging task that requires significant programming effort and generally results in customized implementations. Current computing and memory hierarchies are not architected for irregular computations, resulting performance that is far from the theoretical architectural peak. In this paper, we propose a compiler framework to simplify the development of graph algorihtm implementations that can achieve high performance on modern computing systems. We provide a high-level domain specific language (DSL) to represent graph algorithms through sparse linear algebra expressions and graph primitives including semiring and masking. The compiler leverages the semantics information expressed through the DSL during the optimization and code transformation passes, resulting in more efficient IR passed to the compiler backend. In particular, we introduce an Index Tree Dialect that preserves the semantic information of the graph algorithm to perform high-level, domain-specific optimizations, including workspace transformation, two-phase computation, and automatic parallelization. We demonstrate that this work outperforms state-of-the-art graph libraries LAGraph by up to 3.7 × speedup in semiring operations, 2.19 ×speedup in an important sparse computational kernel, and 9.05 × speedup in graph processing algorithms.
DOI:10.1109/PACT58117.2023.00010