Performance portability via C++ PSTL, SYCL, OpenMP, and HIP: the Gaia AVU-GSR case study

Applications that analyze data from modern scientific experiments will soon require a computing capacity of ExaFLOPs. The current trend to achieve such performance is to employ GPU-accelerated supercomputers and design applications to optimally exploit this hardware. Since each supercomputer is typi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis S. 1152 - 1163
Hauptverfasser:	Malenza, Giulio, Cesare, Valentina, Santimaria, Marco Edoardo, Birke, Robert, Vecchiato, Alberto, Becciani, Ugo, Aldinucci, Marco
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	IEEE 17.11.2024
Schlagworte:	Astrometry C++ languages Codes CPU and GPU architectures GPU programming Graphics processing units Hardware Harmonic analysis High performance computing Hip Performance portability Portable languages Stars Supercomputers
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Applications that analyze data from modern scientific experiments will soon require a computing capacity of ExaFLOPs. The current trend to achieve such performance is to employ GPU-accelerated supercomputers and design applications to optimally exploit this hardware. Since each supercomputer is typically a one-off project, the necessity of having computational languages portable across diverse CPU and GPU architectures without performance losses is increasingly compelling. Here, we study the performance portability of the LSQR algorithm as found in the AVU-GSR code of the ESA Gaia mission. This code computes the astrometric parameters of the ∼10 8 stars in our Galaxy. The LSQR algorithm is widely used across a broad range of high-performance computing (HPC) applications, elevating the study's relevance beyond the astrophysical domain. We developed different GPU-accelerated ports based on CUDA, C++ PSTL, SYCL, OpenMP, and HIP. We carefully verified the correctness of each port and tuned them to five different GPU-accelerated platforms from NVIDIA and AMD to evaluate the performance portability (ȹ) in terms of the harmonic mean of the application's performance efficiency across the tested hardware. HIP was demonstrated to be the most portable solution with a 0.94 average ȹ across the tested problem sizes, closely followed by SYCL coupled with AdaptiveCpp (ACPP) with 0.93. If we only consider NVIDIA platforms, CUDA would be the winner with 0.97. The tuning-oblivious C++ PSTL achieves 0.62 when coupled with vendor-specific compilers.
DOI:	10.1109/SCW63240.2024.00157