Enhancing Performance Through Control-Flow Unmerging and Loop Unrolling on GPUs

Compilers use a wide range of advanced optimizations to improve the quality of the machine code they generate. In most cases, compiler optimizations rely on precise analyses to be able to perform the optimizations. However, whenever a control-flow merge is performed information is lost as it is not...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings / International Symposium on Code Generation and Optimization s. 106 - 118
Hlavní autoři: Murtovi, Alnis, Georgakoudis, Giorgis, Parasyris, Konstantinos, Liao, Chunhua, Laguna, Ignacio, Steffen, Bernhard
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 02.03.2024
Témata:
ISSN:2643-2838
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Compilers use a wide range of advanced optimizations to improve the quality of the machine code they generate. In most cases, compiler optimizations rely on precise analyses to be able to perform the optimizations. However, whenever a control-flow merge is performed information is lost as it is not possible to precisely reason about the program anymore. One existing solution to this issue is code duplication, which involves duplicating instructions from merge blocks to their predecessors. This paper introduces a novel and more aggressive approach to code duplication, grounded in loop unrolling and control-flow unmerging that enables subsequent optimizations that cannot be enabled by applying only one of these transformations. We implemented our approach inside LLVM, and evaluated its performance on a collection of GPU benchmarks in CUDA. Our results demonstrate that, even when faced with branch divergence, which complicates code duplication across multiple branches and increases the associated cost, our optimization technique achieves performance improvements of up to 81%.
ISSN:2643-2838
DOI:10.1109/CGO57630.2024.10444819