Impact of Varying BLAS Precision on DCMESH

The limiting factor in the application of high-accuracy quantum molecular simulations to large systems has been the associated high computational costs in terms of both compute power and memory. In this paper we explore the use of various BLAS precision modes (BF16, TF32, and Complex 3M) in DCMESH (...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis s. 1468 - 1480
Hlavní autoři: Piroozan, Nariman, Pennycook, S. John, Mohammed Razakh, Taufeq, Caday, Peter, Kumar, Nalini, Nakano, Aiichiro
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 17.11.2024
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The limiting factor in the application of high-accuracy quantum molecular simulations to large systems has been the associated high computational costs in terms of both compute power and memory. In this paper we explore the use of various BLAS precision modes (BF16, TF32, and Complex 3M) in DCMESH (divide-and-conquer Maxwell-Ehrenfest-surface hopping), a framework utilized for the study of light-matter interaction. On a single stack of the Intel® Data Center GPU Max Series 1550, we are able to achieve a speedup of 1.35x while retaining accuracy in key output parameters such as the number of excited electrons, current density, and kinetic energy. For large problem sizes, we observe speed-ups of up to 3.91x for individual BLAS calls. Switching between BLAS precision modes requires no source code changes (only environment variables), and so the approach we demonstrate here could be readily applied to other High Performance Computing (HPC) workloads that spend a significant amount of time in BLAS calls.
DOI:10.1109/SCW63240.2024.00187