Multiple‐GPU parallelization of three‐dimensional material point method based on single‐root complex
As one of the arbitrary Lagrangian–Eulerian methods, the material point method (MPM) owns intrinsic advantages in simulation of large deformation problems by combining the merits of the Lagrangian and Eulerian approaches. Significant computational intensity is involved in the calculations of the MPM...
Gespeichert in:
| Veröffentlicht in: | International journal for numerical methods in engineering Jg. 123; H. 6; S. 1481 - 1504 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Hoboken, USA
John Wiley & Sons, Inc
30.03.2022
Wiley Subscription Services, Inc |
| Schlagworte: | |
| ISSN: | 0029-5981, 1097-0207 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | As one of the arbitrary Lagrangian–Eulerian methods, the material point method (MPM) owns intrinsic advantages in simulation of large deformation problems by combining the merits of the Lagrangian and Eulerian approaches. Significant computational intensity is involved in the calculations of the MPM due to its very fine mesh needed to achieve a sufficiently high accuracy. A new multiple‐GPU parallel strategy is developed based on a single‐root complex architecture of the computer purely within a CUDA environment. Peer‐to‐Peer (P2P) communication between the GPUs is performed to exchange the information of the crossing particles and ghost element nodes, which is faster than the heavy send/receive operations between different computers through the infiniBand network. Domain decomposition is performed to split the whole computational task over the GPUs with a number of subdomains. The computations within each subdomain are allocated on a corresponding GPU using an enhanced “Particle‐List” scheme to tackle the data race during the interpolation from associated particles to common nodes. The acceleration effect of the parallelization is evaluated with two benchmarks cases, mini‐slump test after a dam break and cone penetration test in clay, where the maximum speedups with 1 and 8 GPUs are 88 and 604, respectively. |
|---|---|
| Bibliographie: | Funding information National Natural Science Foundations of China, 51909248; Zhejiang University, Dalian University of Technology, State Key Laboratory of Coastal and Offshore Engineering ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0029-5981 1097-0207 |
| DOI: | 10.1002/nme.6906 |