Cross-Platform Optimization and Benchmarking of the Lattice Boltzmann Method on Heterogeneous Architectures

The Lattice Boltzmann Method (LBM) has gained attention for its ability to handle complex fluid dynamics simulations, making it suitable for large-scale industrial applications. However, maximizing the performance of LBM on advanced heterogeneous architectures remains a challenge. In this work, we i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2025 IEEE 11th International Conference on High Performance and Smart Computing (HPSC) S. 37 - 47
Hauptverfasser: Zhu, Guanghui, Lv, Xiaojing, Liu, Zhao, Liu, Tao, Zhang, Wusheng, Fan, Yujing, Yu, Hongkun, Gao, Zhanyun, Shang, Jiandong
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 09.05.2025
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The Lattice Boltzmann Method (LBM) has gained attention for its ability to handle complex fluid dynamics simulations, making it suitable for large-scale industrial applications. However, maximizing the performance of LBM on advanced heterogeneous architectures remains a challenge. In this work, we introduce a comprehensive software framework specifically designed to support large-scale LBM simulations for industrial applications. Our framework integrates essential components, including a mesh generator, pre-processing and post-processing modules, an efficient LBM solver, and additional features like domain partitioning, parallel I/O, and visualization interfaces. This end-to-end solution aims to streamline large-scale LBM simulations and promote its application in industrial contexts.To achieve high performance and scalability, we propose several optimization techniques tailored to the new generation of heterogeneous supercomputing platforms, including the Sunway supercomputer and the Sugon supercomputer. Our approach includes a customized multi-level parallelization strategy, fusion of multiple kernels with different performance constraints, and code optimization techniques, to fully exploit the computational power of these many-core processors. We achieve 2.76 PFLOPS sustained performance encompassing 4.2 trillion lattice cells coupled with 81.4% memory bandwidth on the new Sunway supercomputer. Scaling from a baseline of 128 MPI processes and 8 DCUs to 8,192 processes and 512 DCUs, the strong scaling efficiency reached 74.2% on a Sugon supercomputer. Our results demonstrate the framework's scalability and performance, highlighting its potential for enabling efficient, large-scale LBM simulations in industrial applications.
DOI:10.1109/HPSC66065.2025.00029