High Performance FFT Based Poisson Solver on a CPU-GPU Heterogeneous Platform

We develop an optimized FFT based Poisson solver on a CPU-GPU heterogeneous platform for the case when the input is too large to fit on the GPU global memory. The solver involves memory bound computations such as 3D FFT in which the large 3D data may have to be transferred over the PCIe bus several...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2013 IEEE 27th International Symposium on Parallel and Distributed Processing s. 115 - 125
Hlavní autoři: Jing Wu, JaJa, Joseph
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.05.2013
Témata:
ISBN:146736066X, 9781467360661
ISSN:1530-2075
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:We develop an optimized FFT based Poisson solver on a CPU-GPU heterogeneous platform for the case when the input is too large to fit on the GPU global memory. The solver involves memory bound computations such as 3D FFT in which the large 3D data may have to be transferred over the PCIe bus several times during the computation. We develop a new strategy to decompose and allocate the computation between the GPU and the CPU such that the 3D data is transferred only once to the device memory, and the executions of the GPU kernels are almost completely overlapped with the PCI data transfer. We were able to achieve significantly better performance than what has been reported in previous related work, including over 50 GFLOPS for the three periodic boundary conditions, and over 40 GFLOPS for the two periodic, one Neumann boundary conditions. The PCIe bus bandwidth achieved is over 5GB/s, which is close to the best possible on our platform. For all the cases tested, the single 3D PCIe transfer time, which constitutes a lower bound on what is possible on our platform, takes almost 70% of the total execution time of the Poisson solver.
ISBN:146736066X
9781467360661
ISSN:1530-2075
DOI:10.1109/IPDPS.2013.18