Cross-Platform Optimization and Benchmarking of the Lattice Boltzmann Method on Heterogeneous Architectures

The Lattice Boltzmann Method (LBM) has gained attention for its ability to handle complex fluid dynamics simulations, making it suitable for large-scale industrial applications. However, maximizing the performance of LBM on advanced heterogeneous architectures remains a challenge. In this work, we i...

Full description

Saved in:
Bibliographic Details
Published in:2025 IEEE 11th International Conference on High Performance and Smart Computing (HPSC) pp. 37 - 47
Main Authors: Zhu, Guanghui, Lv, Xiaojing, Liu, Zhao, Liu, Tao, Zhang, Wusheng, Fan, Yujing, Yu, Hongkun, Gao, Zhanyun, Shang, Jiandong
Format: Conference Proceeding
Language:English
Published: IEEE 09.05.2025
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract The Lattice Boltzmann Method (LBM) has gained attention for its ability to handle complex fluid dynamics simulations, making it suitable for large-scale industrial applications. However, maximizing the performance of LBM on advanced heterogeneous architectures remains a challenge. In this work, we introduce a comprehensive software framework specifically designed to support large-scale LBM simulations for industrial applications. Our framework integrates essential components, including a mesh generator, pre-processing and post-processing modules, an efficient LBM solver, and additional features like domain partitioning, parallel I/O, and visualization interfaces. This end-to-end solution aims to streamline large-scale LBM simulations and promote its application in industrial contexts.To achieve high performance and scalability, we propose several optimization techniques tailored to the new generation of heterogeneous supercomputing platforms, including the Sunway supercomputer and the Sugon supercomputer. Our approach includes a customized multi-level parallelization strategy, fusion of multiple kernels with different performance constraints, and code optimization techniques, to fully exploit the computational power of these many-core processors. We achieve 2.76 PFLOPS sustained performance encompassing 4.2 trillion lattice cells coupled with 81.4% memory bandwidth on the new Sunway supercomputer. Scaling from a baseline of 128 MPI processes and 8 DCUs to 8,192 processes and 512 DCUs, the strong scaling efficiency reached 74.2% on a Sugon supercomputer. Our results demonstrate the framework's scalability and performance, highlighting its potential for enabling efficient, large-scale LBM simulations in industrial applications.
AbstractList The Lattice Boltzmann Method (LBM) has gained attention for its ability to handle complex fluid dynamics simulations, making it suitable for large-scale industrial applications. However, maximizing the performance of LBM on advanced heterogeneous architectures remains a challenge. In this work, we introduce a comprehensive software framework specifically designed to support large-scale LBM simulations for industrial applications. Our framework integrates essential components, including a mesh generator, pre-processing and post-processing modules, an efficient LBM solver, and additional features like domain partitioning, parallel I/O, and visualization interfaces. This end-to-end solution aims to streamline large-scale LBM simulations and promote its application in industrial contexts.To achieve high performance and scalability, we propose several optimization techniques tailored to the new generation of heterogeneous supercomputing platforms, including the Sunway supercomputer and the Sugon supercomputer. Our approach includes a customized multi-level parallelization strategy, fusion of multiple kernels with different performance constraints, and code optimization techniques, to fully exploit the computational power of these many-core processors. We achieve 2.76 PFLOPS sustained performance encompassing 4.2 trillion lattice cells coupled with 81.4% memory bandwidth on the new Sunway supercomputer. Scaling from a baseline of 128 MPI processes and 8 DCUs to 8,192 processes and 512 DCUs, the strong scaling efficiency reached 74.2% on a Sugon supercomputer. Our results demonstrate the framework's scalability and performance, highlighting its potential for enabling efficient, large-scale LBM simulations in industrial applications.
Author Liu, Zhao
Yu, Hongkun
Zhu, Guanghui
Shang, Jiandong
Fan, Yujing
Lv, Xiaojing
Liu, Tao
Zhang, Wusheng
Gao, Zhanyun
Author_xml – sequence: 1
  givenname: Guanghui
  surname: Zhu
  fullname: Zhu, Guanghui
  email: jn_zgh@126.com
  organization: School of Computer and Artificial Intelligence Zhengzhou University Michael Levitt Research Institute for Life Sciences and Digital Convergence Zhengzhou University of Technology National Supercomputing Center in Zhengzhou Zhengzhou University,Zhengzhou,China
– sequence: 2
  givenname: Xiaojing
  surname: Lv
  fullname: Lv, Xiaojing
  email: jing3704@126.com
  organization: National Supercomputing Center in Wuxi,China Ship Scientific Research Center,Wuxi,China
– sequence: 3
  givenname: Zhao
  surname: Liu
  fullname: Liu, Zhao
  email: liuz18@tsinghua.org.cn
  organization: Michael Levitt Research Institute for Life Sciences and Digital Convergence Zhengzhou University of Technology,National Supercomputing Center in Wuxi,Wuxi,China
– sequence: 4
  givenname: Tao
  surname: Liu
  fullname: Liu, Tao
  email: liut_nsccwx@163.com
  organization: National Supercomputing Center in Wuxi,Wuxi,China
– sequence: 5
  givenname: Wusheng
  surname: Zhang
  fullname: Zhang, Wusheng
  email: zws@tsinghua.edu.cn
  organization: Tsinghua University,Department of Computer Science and Technology,Beijing,China
– sequence: 6
  givenname: Yujing
  surname: Fan
  fullname: Fan, Yujing
  email: fanyujing0310@outlook.com
  organization: National Supercomputing Center in Wuxi,Wuxi,China
– sequence: 7
  givenname: Hongkun
  surname: Yu
  fullname: Yu, Hongkun
  email: yhk15@mails.tsinghua.edu.cn
  organization: Tsinghua University,Department of Computer Science and Technology,Beijing,China
– sequence: 8
  givenname: Zhanyun
  surname: Gao
  fullname: Gao, Zhanyun
  email: jh_g2y@gs.zzu.edu.cn
  organization: Zhengzhou University National Supercomputing Center in Zhengzhou Zhengzhou University,School of Computer and Artificial Intelligence,Zhengzhou,China
– sequence: 9
  givenname: Jiandong
  surname: Shang
  fullname: Shang, Jiandong
  email: sjd@zzu.edu.cn
  organization: Zhengzhou University National Supercomputing Center in Zhengzhou Zhengzhou University,School of Computer and Artificial Intelligence,Zhengzhou,China
BookMark eNotkMtOAjEYhWuiC0XegEVfYLCdMr0sYaJigoFEXZNe_jINTEs6ZSFP7ySyOpvznXw5T-g-pggIzSiZU0rUy3r31XJOeDOvSd3MCSG1ukNTJZRkjDaKcyYe0bHNaRiq3UkXn3KPt-cS-nDVJaSIdXR4BdF2vc7HEA84eVw6wBtdSrCAV-lUrr2OEX9C6ZLDI7OGAjkdIEK6DHiZbRcK2HLJMDyjB69PA0xvOUE_b6_f7brabN8_2uWmClTIUjkLxlDpiFTCMStBLaCWXtiFccz4pgGniaHaSTG2vKfCaiOoE4aamnDGJmj2vxsAYH_OYdT_3Y-nMCm4YH9CPFlL
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/HPSC66065.2025.00029
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798331596637
EndPage 47
ExternalDocumentID 11038767
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i178t-dcebb18d0897d3c8e94e28f7c4bd3bf55eda0b1ad8718dff17cab71d7b1b20633
IEDL.DBID RIE
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001548132200007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Jun 25 06:00:26 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i178t-dcebb18d0897d3c8e94e28f7c4bd3bf55eda0b1ad8718dff17cab71d7b1b20633
PageCount 11
ParticipantIDs ieee_primary_11038767
PublicationCentury 2000
PublicationDate 2025-May-9
PublicationDateYYYYMMDD 2025-05-09
PublicationDate_xml – month: 05
  year: 2025
  text: 2025-May-9
  day: 09
PublicationDecade 2020
PublicationTitle 2025 IEEE 11th International Conference on High Performance and Smart Computing (HPSC)
PublicationTitleAbbrev HPSC
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.9080205
Snippet The Lattice Boltzmann Method (LBM) has gained attention for its ability to handle complex fluid dynamics simulations, making it suitable for large-scale...
SourceID ieee
SourceType Publisher
StartPage 37
SubjectTerms Computational modeling
Heterogeneous (hybrid) systems
Kernel
Lattice Boltzmann Method
Lattice Boltzmann methods
Memory management
Next generation networking
Numerical Algorithms and Problems
Numerical models
Optimization
Program processors
Scalability
Supercomputers
Title Cross-Platform Optimization and Benchmarking of the Lattice Boltzmann Method on Heterogeneous Architectures
URI https://ieeexplore.ieee.org/document/11038767
WOSCitedRecordID wos001548132200007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVoxcAEiCK-5YHVNE7SnjPSiqpDKZH4ULfKn6KiTVCbMvDrObsBujCwRFGkUxQ7yb13fs9HyDXHf5zjVrE40SlLU9NhiDIiBnhU0FXaBbX7ywjGYzGZZHltVg9eGGttEJ_ZG38a1vJNqde-VNbmfjdv6EKDNABgY9aq7XA8ytrD_LHfRUDeQdoX-1JJ5IHjVtOUkDMG-_-82wFp_brvaP6TVw7Jji2OyFvfpzOWz2XlYSZ9wE99UXsoqSwM7WHc60KG0jctHUVgR0ey8to22ivn1edCFgW9Dw2jKcYMvQ6mxNfHIvent1vrCasWeR7cPfWHrG6UwGYcRMWMtkpxYSKRgUm0sFlqY-FAp8okynU61shIcWmQHQnjHActFXADiqsYMUpyTJpFWdgTQiOM4QYfUCFRUlGmdCw1IOhwAomVdqek5Udq-r7ZC2P6PUhnf1w_J3t-MoJEMLsgzWq5tpdkV39Us9XyKszgFwPTod4
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgIMEEiCK-8cAaGuejdkZaUQWRlkgU1K3yp6hoE9SmDPx6zm6ALgwsURTpFMVOcu-d3_MhdE3gH2eIFl4QysiLIhV7gDJ8j8JR0LaQxqndXzI6GLDRKMlrs7rzwmitnfhM39hTt5avSrm0pbIWsbt50zbdRFtxFAVkZdeqDXHET1pp_tRtAySPgfgFtljiW-i41jbFZY3e3j_vt4-av_47nP9klgO0oYtD9Na1Cc3Lp7yyQBM_wsc-q12UmBcKdyDudcZd8RuXBgO0wxmvrLoNd8pp9TnjRYH7rmU0hpjUKmFKeIE0sH98u7aisGii597dsJt6dasEb0IoqzwltRCEKZ8lVIWS6STSATNURkKFwsSxVtwXhCvgR0wZQ6jkghJFBREBoJTwCDWKstDHCPsQQxQ8oACqJPxEyIBLCrDDMKBW0pygph2p8ftqN4zx9yCd_nH9Cu2kw342zu4HD2do106MEwwm56hRzZf6Am3Lj2qymF-62fwCl2WlJQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+IEEE+11th+International+Conference+on+High+Performance+and+Smart+Computing+%28HPSC%29&rft.atitle=Cross-Platform+Optimization+and+Benchmarking+of+the+Lattice+Boltzmann+Method+on+Heterogeneous+Architectures&rft.au=Zhu%2C+Guanghui&rft.au=Lv%2C+Xiaojing&rft.au=Liu%2C+Zhao&rft.au=Liu%2C+Tao&rft.date=2025-05-09&rft.pub=IEEE&rft.spage=37&rft.epage=47&rft_id=info:doi/10.1109%2FHPSC66065.2025.00029&rft.externalDocID=11038767