Impact of Varying BLAS Precision on DCMESH
The limiting factor in the application of high-accuracy quantum molecular simulations to large systems has been the associated high computational costs in terms of both compute power and memory. In this paper we explore the use of various BLAS precision modes (BF16, TF32, and Complex 3M) in DCMESH (...
Uloženo v:
| Vydáno v: | SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis s. 1468 - 1480 |
|---|---|
| Hlavní autoři: | , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
17.11.2024
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | The limiting factor in the application of high-accuracy quantum molecular simulations to large systems has been the associated high computational costs in terms of both compute power and memory. In this paper we explore the use of various BLAS precision modes (BF16, TF32, and Complex 3M) in DCMESH (divide-and-conquer Maxwell-Ehrenfest-surface hopping), a framework utilized for the study of light-matter interaction. On a single stack of the Intel® Data Center GPU Max Series 1550, we are able to achieve a speedup of 1.35x while retaining accuracy in key output parameters such as the number of excited electrons, current density, and kinetic energy. For large problem sizes, we observe speed-ups of up to 3.91x for individual BLAS calls. Switching between BLAS precision modes requires no source code changes (only environment variables), and so the approach we demonstrate here could be readily applied to other High Performance Computing (HPC) workloads that spend a significant amount of time in BLAS calls. |
|---|---|
| DOI: | 10.1109/SCW63240.2024.00187 |