A parallel nonlinear multigrid solver for unsteady incompressible flow simulation on multi-GPU cluster
A nonlinear multigrid solver for solutions of unsteady three-dimensional incompressible viscous flow working on multi-GPU cluster is developed. The solver consists of a full approximation scheme (FAS) V-cycle scheme to accelerate the computation, in which the artificial compressibility method based...
Saved in:
| Published in: | Journal of computational physics Vol. 414; p. 109447 |
|---|---|
| Main Authors: | , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Cambridge
Elsevier Inc
01.08.2020
Elsevier Science Ltd |
| Subjects: | |
| ISSN: | 0021-9991, 1090-2716 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | A nonlinear multigrid solver for solutions of unsteady three-dimensional incompressible viscous flow working on multi-GPU cluster is developed. The solver consists of a full approximation scheme (FAS) V-cycle scheme to accelerate the computation, in which the artificial compressibility method based Navier-Stokes solver is used as a smoother. Multi-stream overlapping strategies are designed to assist multi-GPU computations. The numerical procedure is validated by computing 3D laminar and turbulent flows within a lid-driven cubic cavity. The predicted results compare favorably with previous benchmark solutions and measurements, both in mean and turbulent quantities. For the performance of the FAS V-cycle scheme, up to two orders of magnitude speedups are reported, and the relationship between work unit (WU) and total grid number N is O(N0.3) under the deepest FAS V-cycle. A detailed evaluation of the GPU implementation is carried out employing the Roofline model and the scalability analysis.
•A parallel nonlinear multigrid solver for unsteady incompressible flow simulation is implemented on multi-GPU cluster.•The artificial compressibility method based Navier-Stokes solver is used as a smoother for multigrid.•For FAS Lev. 7, 250 speedups over its single grid counterpart is reported.•The work unit scales with the total grid number N at O(N0.3) under the deepest FAS V-cycle.•A detailed evaluation of the GPU implementation is carried out employing the Roofline model and the scalability analysis. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0021-9991 1090-2716 |
| DOI: | 10.1016/j.jcp.2020.109447 |