Multiple‐GPU parallelization of three‐dimensional material point method based on single‐root complex

As one of the arbitrary Lagrangian–Eulerian methods, the material point method (MPM) owns intrinsic advantages in simulation of large deformation problems by combining the merits of the Lagrangian and Eulerian approaches. Significant computational intensity is involved in the calculations of the MPM...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:International journal for numerical methods in engineering Ročník 123; číslo 6; s. 1481 - 1504
Hlavní autori: Dong, Youkou, Cui, Lan, Zhang, Xue
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Hoboken, USA John Wiley & Sons, Inc 30.03.2022
Wiley Subscription Services, Inc
Predmet:
ISSN:0029-5981, 1097-0207
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:As one of the arbitrary Lagrangian–Eulerian methods, the material point method (MPM) owns intrinsic advantages in simulation of large deformation problems by combining the merits of the Lagrangian and Eulerian approaches. Significant computational intensity is involved in the calculations of the MPM due to its very fine mesh needed to achieve a sufficiently high accuracy. A new multiple‐GPU parallel strategy is developed based on a single‐root complex architecture of the computer purely within a CUDA environment. Peer‐to‐Peer (P2P) communication between the GPUs is performed to exchange the information of the crossing particles and ghost element nodes, which is faster than the heavy send/receive operations between different computers through the infiniBand network. Domain decomposition is performed to split the whole computational task over the GPUs with a number of subdomains. The computations within each subdomain are allocated on a corresponding GPU using an enhanced “Particle‐List” scheme to tackle the data race during the interpolation from associated particles to common nodes. The acceleration effect of the parallelization is evaluated with two benchmarks cases, mini‐slump test after a dam break and cone penetration test in clay, where the maximum speedups with 1 and 8 GPUs are 88 and 604, respectively.
Bibliografia:Funding information
National Natural Science Foundations of China, 51909248; Zhejiang University, Dalian University of Technology, State Key Laboratory of Coastal and Offshore Engineering
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0029-5981
1097-0207
DOI:10.1002/nme.6906