SunwayLB: Enabling Extreme-Scale Lattice Boltzmann Method Based Computing Fluid Dynamics Simulations on Advanced Heterogeneous Supercomputers
The Lattice Boltzmann Method (LBM) is a class of Computational Fluid Dynamics methods which models the fluid as fictive particles. In this paper, we report our work on SunwayLB, which enables LBM based solutions aiming for industrial applications using advanced heterogeneous systems such as the Sunw...
Saved in:
| Published in: | IEEE transactions on parallel and distributed systems Vol. 35; no. 2; pp. 324 - 337 |
|---|---|
| Main Authors: | , , , , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York
IEEE
01.02.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 1045-9219, 1558-2183 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The Lattice Boltzmann Method (LBM) is a class of Computational Fluid Dynamics methods which models the fluid as fictive particles. In this paper, we report our work on SunwayLB, which enables LBM based solutions aiming for industrial applications using advanced heterogeneous systems such as the Sunway supercomputers. We propose several techniques to boost the simulation speed and improve the scalability of SunwayLB, including a customized multi-level domain decomposition and data sharing scheme, a carefully orchestrated strategy to fuse kernels with different performance constraints for a more balanced workload, and optimization strategies for assembly code. Based on these optimization schemes, we manage to scale SunwayLB on three advanced supercomputers: Sunway TaihuLight, the new Sunway Supercomputer and a GPU cluster. On Sunway TaihuLight, our largest simulation involves up to 5.6 trillion lattice cells, achieving 11,245 billion cell updates per second (GLUPS), 77% memory bandwidth utilization and a sustained performance of 4.7 PFlops. We further improve the memory bandwidth utilization and computational efficiency using the unique features of a new generation of Sunway supercomputer. On the new Sunway Supercomputer, the largest simulation contains over 4.2 trillion lattice cells, resulting in 6,583 GLUPS, 81% memory bandwidth utilization and a sustained performance of 2.76 PFlops. To evaluate the portability of our code, we also adapt our code to a GPU cluster with tailored optimization techniques, resulting in 191x speedup and 83.8% memory bandwidth utilization. We demonstrate a series of computational experiments for extreme-large scale fluid flow, as examples of real-world applications, to check the validity and performance of our work. The results show that our implementation is competent to be a highly scalable and efficient solution for large-scale CFD problems on heterogeneous systems. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1045-9219 1558-2183 |
| DOI: | 10.1109/TPDS.2023.3343706 |