View in EDS

Enhancing Kokkos with OpenACC

Saved in:

Bibliographic Details
Title:	Enhancing Kokkos with OpenACC
Authors:	Valero Lara, Pedro, Lee, Seyong, González Tallada, Marc, Denny, Joel, Teranishi, Keita, Vetter, Jeffrey
Contributors:	Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. PM - Programming Models
Publisher Information:	SAGE publishing
Publication Year:	2024
Collection:	Universitat Politècnica de Catalunya, BarcelonaTech: UPCommons - Global access to UPC knowledge
Subject Terms:	Àrees temàtiques de la UPC::Informàtica::Programació, OpenACC, C++ metaprogramming, Kokkos, CUDA, OpenMP target, Parallel programming models
Description:	C++ template metaprogramming has emerged as a prominent approach for achieving performance portability in heterogeneous computing. Kokkos represents a notable paradigm in this domain, offering programmers a suite of high-level abstractions for generic programming while deferring much of the device-specific code generation and optimization to the compiler through template specializations. Kokkos furnishes a range of device-specific code specializations across multiple back ends, including CUDA and HIP. Diverging from conventional back ends, the OpenACC implementation presents a high-level, multicompiler, multidevice, and directive-based programming model. This paper presents recent advancements in the OpenACC back end for Kokkos (i.e., KokkACC) and focuses on its integration into the Kokkos ecosystem, exploration of automatic device selection capabilities to enhance productivity, and performance evaluation on modern hardware such as NVIDIA H100 GPUs. The study includes implementation details and a thorough performance assessment across various computational benchmarks, including minibenchmarks (AXPY and DOT product), miniapps (LULESH, MiniFE, and SNAP-LAMMPS), and a scientific kernel based on the lattice Boltzmann method. ; This research used resources from the Experimental Computing Laboratory and the Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory, which is supported by the Office of Science of the US Department of Energy (DOE) under contract DE-AC05-00OR22725. This research was supported in part by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the DOE Office of Science and the National Nuclear Security Administration. This research was also supported in part by the DOE Office of Science, Office of Advanced Scientific Computing Research, and Scientific Discovery through Advanced Computing program. This manuscript has been authored by UT-Battelle LLC under contract DE-AC05-00OR22725 with DOE. ; Peer Reviewed ; Postprint (author's final draft)
Document Type:	article in journal/newspaper
File Description:	18 p.; application/pdf
Language:	English
Relation:	https://hdl.handle.net/2117/419896
DOI:	10.1177/10943420241261987
Availability:	https://hdl.handle.net/2117/419896 https://doi.org/10.1177/10943420241261987
Rights:	Open Access
Accession Number:	edsbas.BF09E808
Database:	BASE

View record from BASE

Nájsť tento článok vo Web of Science

Description
Abstract:	C++ template metaprogramming has emerged as a prominent approach for achieving performance portability in heterogeneous computing. Kokkos represents a notable paradigm in this domain, offering programmers a suite of high-level abstractions for generic programming while deferring much of the device-specific code generation and optimization to the compiler through template specializations. Kokkos furnishes a range of device-specific code specializations across multiple back ends, including CUDA and HIP. Diverging from conventional back ends, the OpenACC implementation presents a high-level, multicompiler, multidevice, and directive-based programming model. This paper presents recent advancements in the OpenACC back end for Kokkos (i.e., KokkACC) and focuses on its integration into the Kokkos ecosystem, exploration of automatic device selection capabilities to enhance productivity, and performance evaluation on modern hardware such as NVIDIA H100 GPUs. The study includes implementation details and a thorough performance assessment across various computational benchmarks, including minibenchmarks (AXPY and DOT product), miniapps (LULESH, MiniFE, and SNAP-LAMMPS), and a scientific kernel based on the lattice Boltzmann method. ; This research used resources from the Experimental Computing Laboratory and the Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory, which is supported by the Office of Science of the US Department of Energy (DOE) under contract DE-AC05-00OR22725. This research was supported in part by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the DOE Office of Science and the National Nuclear Security Administration. This research was also supported in part by the DOE Office of Science, Office of Advanced Scientific Computing Research, and Scientific Discovery through Advanced Computing program. This manuscript has been authored by UT-Battelle LLC under contract DE-AC05-00OR22725 with DOE. ; Peer Reviewed ; Postprint (author's final draft)
DOI:	10.1177/10943420241261987