Multiple‐GPU parallelization of three‐dimensional material point method based on single‐root complex

As one of the arbitrary Lagrangian–Eulerian methods, the material point method (MPM) owns intrinsic advantages in simulation of large deformation problems by combining the merits of the Lagrangian and Eulerian approaches. Significant computational intensity is involved in the calculations of the MPM...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal for numerical methods in engineering Jg. 123; H. 6; S. 1481 - 1504
Hauptverfasser:	Dong, Youkou, Cui, Lan, Zhang, Xue
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Hoboken, USA John Wiley & Sons, Inc 30.03.2022 Wiley Subscription Services, Inc
Schlagworte:	cone penetration test Cone penetration tests Finite element method Interpolation material point method mini slump test Nodes parallel computation
ISSN:	0029-5981, 1097-0207
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	As one of the arbitrary Lagrangian–Eulerian methods, the material point method (MPM) owns intrinsic advantages in simulation of large deformation problems by combining the merits of the Lagrangian and Eulerian approaches. Significant computational intensity is involved in the calculations of the MPM due to its very fine mesh needed to achieve a sufficiently high accuracy. A new multiple‐GPU parallel strategy is developed based on a single‐root complex architecture of the computer purely within a CUDA environment. Peer‐to‐Peer (P2P) communication between the GPUs is performed to exchange the information of the crossing particles and ghost element nodes, which is faster than the heavy send/receive operations between different computers through the infiniBand network. Domain decomposition is performed to split the whole computational task over the GPUs with a number of subdomains. The computations within each subdomain are allocated on a corresponding GPU using an enhanced “Particle‐List” scheme to tackle the data race during the interpolation from associated particles to common nodes. The acceleration effect of the parallelization is evaluated with two benchmarks cases, mini‐slump test after a dam break and cone penetration test in clay, where the maximum speedups with 1 and 8 GPUs are 88 and 604, respectively.
Bibliographie:	Funding information National Natural Science Foundations of China, 51909248; Zhejiang University, Dalian University of Technology, State Key Laboratory of Coastal and Offshore Engineering ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0029-5981 1097-0207
DOI:	10.1002/nme.6906