Accelerating Multi - Process Communication for Parallel 3-D FFT

Today largest and most powerful supercomputers in the world are built on heterogeneous platforms; and using the combined power of multi-core CPUs and GPUs, has had a great impact accelerating large-scale applications. However, on these architectures, parallel algorithms, such as the Fast Fourier Tra...

Full description

Saved in:
Bibliographic Details
Published in:2021 Workshop on Exascale MPI (ExaMPI) pp. 46 - 53
Main Authors: Ayala, Alan, Tomov, Stan, Stoyanov, Miroslav, Haidar, Azzam, Dongarra, Jack
Format: Conference Proceeding
Language:English
Published: IEEE 01.11.2021
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Today largest and most powerful supercomputers in the world are built on heterogeneous platforms; and using the combined power of multi-core CPUs and GPUs, has had a great impact accelerating large-scale applications. However, on these architectures, parallel algorithms, such as the Fast Fourier Transform (FFT), encounter that inter-processor communication become a bottleneck and limits their scalability. In this paper, we present techniques for speeding up multi-process communication cost during the computation of FFTs, considering hybrid network connections as those expected on upcoming exascale machines. Among our techniques, we present algorithmic tuning, making use of phase diagrams; parametric tuning, using different FFT settings; and MPI distribution tuning based on FFT size and computational resources available. We present several experiments obtained on Summit supercomputer at Oak Ridge National Laboratory, using up to 40,960 IBM Power9 cores and 6,144 NVIDIA V-100 GPUs.
DOI:10.1109/ExaMPI54564.2021.00011