Instruction Scheduling for the GPU on the GPU

In this paper, we show how to use the GPU to parallelize a precise instruction scheduling algorithm that is based on Ant Colony Optimization (ACO). ACO is a nature-inspired intelligent-search technique that has been used to compute precise solutions to NP-hard problems in operations research (OR). S...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings / International Symposium on Code Generation and Optimization s. 435 - 447
Hlavní autoři:	Shobaki, Ghassan, Muyan-Ozcelik, Pinar, Hutton, Josh, Linck, Bruce, Malyshenko, Vladislav, Kerbow, Austin, Ramirez-Ortega, Ronaldo, Gordon, Vahl Scott
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 02.03.2024
Témata:	Ant Colony Optimization (ACO) Benchmark testing GPU computing Graphics processing units instruction scheduling multi-objective optimization NP-hard problem Operations research Optimization parallel compiler optimization Production Scheduling algorithms
ISSN:	2643-2838
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	In this paper, we show how to use the GPU to parallelize a precise instruction scheduling algorithm that is based on Ant Colony Optimization (ACO). ACO is a nature-inspired intelligent-search technique that has been used to compute precise solutions to NP-hard problems in operations research (OR). Such intelligent-search techniques were not used in the past to solve NP-hard compiler optimization problems, because they require substantially more computation than the heuristic techniques used in production compilers. In this work, we show that parallelizing such a compute-intensive technique on the GPU makes using it in compilation reasonably practical. The register-pressure-aware instruction scheduling problem addressed in this work is a multi-objective optimization problem that is significantly more complex than the problems that were previously solved using parallel ACO on the GPU. We describe a number of techniques that we have developed to efficiently parallelize an ACO algorithm for solving this multi-objective optimization problem on the GPU. The target processor is also a GPU. Our experimental evaluation shows that parallel ACO-based scheduling on the GPU runs up to 27 times faster than sequential ACO-based scheduling on the CPU, and this leads to reducing the total compile time of the rocPRIM benchmarks by 21%. ACO-based scheduling improves the execution-speed of the compiled benchmarks by up to 74% relative to AMD's production scheduler. To the best of our knowledge, our work is the first successful attempt to parallelize a compiler optimization algorithm on the GPU.
ISSN:	2643-2838
DOI:	10.1109/CGO57630.2024.10444869