Dynamically Fusing Python HPC Kernels

Recent trends in high-performance computing show an increase in the adoption of performance portable frameworks such as Kokkos and interpreted languages such as Python. PyKokkos follows these trends and enables programmers to write performance-portable kernels in Python which greatly increases produ...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings of the ACM on software engineering Ročník 2; číslo ISSTA; s. 1865 - 1886
Hlavní autoři: Al Awar, Nader, Naeem, Muhammad Hannan, Almgren-Bell, James, Biros, George, Gligoric, Milos
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York, NY, USA ACM 22.06.2025
Témata:
ISSN:2994-970X, 2994-970X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Recent trends in high-performance computing show an increase in the adoption of performance portable frameworks such as Kokkos and interpreted languages such as Python. PyKokkos follows these trends and enables programmers to write performance-portable kernels in Python which greatly increases productivity. One issue that programmers still face is how to organize parallel code, as splitting code into separate kernels simplifies testing and debugging but may result in suboptimal performance. To enable programmers to organize kernels in any way they prefer while ensuring good performance, we present PyFuser, a program analysis framework for automatic fusion of performance portable PyKokkos kernels. PyFuser dynamically traces kernel calls and lazily fuses them once the result is requested by the application. PyFuser generates fused kernels that execute faster due to better reuse of data, improved compiler optimizations, and reduced kernel launch overhead, while not requiring any changes to existing PyKokkos code. We also introduce automated code transformations that further optimize the fused kernels generated by PyFuser. Our experiments show that on average PyFuser achieves speedups compared to unfused kernels of 3.8x on NVIDIA and AMD GPUs, as well as Intel and AMD CPUs.
ISSN:2994-970X
2994-970X
DOI:10.1145/3728959