A hybrid MPI–OpenMP scheme for scalable parallel pseudospectral computations for fluid turbulence

► A two-level hybrid OpenMP/MPI parallelization scheme is presented for pseudospectral computations of fluid turbulence. ► The hybrid scheme leads naturally to a new picture for the domain decomposition of the grids. ► The hybrid scheme scales well up to ∼20,000 compute cores with a maximum parallel...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Parallel computing Ročník 37; číslo 6; s. 316 - 326
Hlavní autoři: Mininni, Pablo D., Rosenberg, Duane, Reddy, Raghu, Pouquet, Annick
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.06.2011
Témata:
ISSN:0167-8191, 1872-7336
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:► A two-level hybrid OpenMP/MPI parallelization scheme is presented for pseudospectral computations of fluid turbulence. ► The hybrid scheme leads naturally to a new picture for the domain decomposition of the grids. ► The hybrid scheme scales well up to ∼20,000 compute cores with a maximum parallel efficiency of 89%. ► The method allows us to reduce the number of MPI tasks, and increase network bandwidth. ► The new scheme is competitive with the pure MPI-based method, but does not provide a clear “win” in our tests. A hybrid scheme that utilizes MPI for distributed memory parallelism and OpenMP for shared memory parallelism is presented. The work is motivated by the desire to achieve exceptionally high Reynolds numbers in pseudospectral computations of fluid turbulence on emerging petascale, high core-count, massively parallel processing systems. The hybrid implementation derives from and augments a well-tested scalable MPI-parallelized pseudospectral code. The hybrid paradigm leads to a new picture for the domain decomposition of the pseudospectral grids, which is helpful in understanding, among other things, the 3D transpose of the global data that is necessary for the parallel fast Fourier transforms that are the central component of the numerical discretizations. Details of the hybrid implementation are provided, and performance tests illustrate the utility of the method. It is shown that the hybrid scheme achieves good scalability up to ∼20,000 compute cores with a maximum efficiency of 89%, and a mean of 79%. Data are presented that help guide the choice of the optimal number of MPI tasks and OpenMP threads in order to maximize code performance on two different platforms.
Bibliografie:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:0167-8191
1872-7336
DOI:10.1016/j.parco.2011.05.004