A performance portable implementation of the semi-Lagrangian algorithm in six dimensions

This paper describes our approach to developing a simulation software application for the fully kinetic 6D-Vlasov equation, which will be used to explore physics beyond the reduced gyrokinetic model. Simulating the fully kinetic Vlasov equation requires efficient utilization of compute and storage c...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Computer physics communications Ročník 295; s. 108973
Hlavní autoři: Schild, Nils, Räth, Mario, Eibl, Sebastian, Hallatschek, Klaus, Kormann, Katharina
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.02.2024
Témata:
ISSN:0010-4655, 1879-2944
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:This paper describes our approach to developing a simulation software application for the fully kinetic 6D-Vlasov equation, which will be used to explore physics beyond the reduced gyrokinetic model. Simulating the fully kinetic Vlasov equation requires efficient utilization of compute and storage capabilities due to the high dimensionality of the problem. In addition, the implementation needs to be extensible regarding the physical model and flexible regarding the hardware for production runs. We start on the algorithmic background to simulate the 6-D Vlasov equation using a semi-Lagrangian algorithm. The performance portable software stack, which enables production runs on pure CPU as well as AMD or Nvidia GPU accelerated nodes, is presented. The extensibility of our implementation is guaranteed through the described software architecture of the main kernel, which achieves a memory bandwidth of almost 500 GB/s on a V100 Nvidia GPU and around 100 GB/s on an Intel Xeon Gold CPU using a single code base. We provide performance data on multiple node-level architectures discussing utilized and further available hardware capabilities. Finally, the network communication bottleneck of 6-D grid-based algorithms is quantified. A verification of physics beyond gyrokinetic theory, for the example of ion Bernstein waves, concludes the work. •Performance portable implementation of a semi-Lagrangian algorithm for full kinetics.•Software architecture for Lagrange interpolation stencils using design patterns.•Node level performance analysis for OpenMP, HIP and CUDA using a single code base.•Quantification of the network communication bottleneck for 6D distributed grids.
ISSN:0010-4655
1879-2944
DOI:10.1016/j.cpc.2023.108973