A performance portable implementation of the semi-Lagrangian algorithm in six dimensions

This paper describes our approach to developing a simulation software application for the fully kinetic 6D-Vlasov equation, which will be used to explore physics beyond the reduced gyrokinetic model. Simulating the fully kinetic Vlasov equation requires efficient utilization of compute and storage c...

Full description

Saved in:
Bibliographic Details
Published in:Computer physics communications Vol. 295; p. 108973
Main Authors: Schild, Nils, Räth, Mario, Eibl, Sebastian, Hallatschek, Klaus, Kormann, Katharina
Format: Journal Article
Language:English
Published: Elsevier B.V 01.02.2024
Subjects:
ISSN:0010-4655, 1879-2944
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper describes our approach to developing a simulation software application for the fully kinetic 6D-Vlasov equation, which will be used to explore physics beyond the reduced gyrokinetic model. Simulating the fully kinetic Vlasov equation requires efficient utilization of compute and storage capabilities due to the high dimensionality of the problem. In addition, the implementation needs to be extensible regarding the physical model and flexible regarding the hardware for production runs. We start on the algorithmic background to simulate the 6-D Vlasov equation using a semi-Lagrangian algorithm. The performance portable software stack, which enables production runs on pure CPU as well as AMD or Nvidia GPU accelerated nodes, is presented. The extensibility of our implementation is guaranteed through the described software architecture of the main kernel, which achieves a memory bandwidth of almost 500 GB/s on a V100 Nvidia GPU and around 100 GB/s on an Intel Xeon Gold CPU using a single code base. We provide performance data on multiple node-level architectures discussing utilized and further available hardware capabilities. Finally, the network communication bottleneck of 6-D grid-based algorithms is quantified. A verification of physics beyond gyrokinetic theory, for the example of ion Bernstein waves, concludes the work. •Performance portable implementation of a semi-Lagrangian algorithm for full kinetics.•Software architecture for Lagrange interpolation stencils using design patterns.•Node level performance analysis for OpenMP, HIP and CUDA using a single code base.•Quantification of the network communication bottleneck for 6D distributed grids.
ISSN:0010-4655
1879-2944
DOI:10.1016/j.cpc.2023.108973