Accelerating Multi - Process Communication for Parallel 3-D FFT

Today largest and most powerful supercomputers in the world are built on heterogeneous platforms; and using the combined power of multi-core CPUs and GPUs, has had a great impact accelerating large-scale applications. However, on these architectures, parallel algorithms, such as the Fast Fourier Tra...

Full description

Saved in:
Bibliographic Details
Published in:2021 Workshop on Exascale MPI (ExaMPI) pp. 46 - 53
Main Authors: Ayala, Alan, Tomov, Stan, Stoyanov, Miroslav, Haidar, Azzam, Dongarra, Jack
Format: Conference Proceeding
Language:English
Published: IEEE 01.11.2021
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Today largest and most powerful supercomputers in the world are built on heterogeneous platforms; and using the combined power of multi-core CPUs and GPUs, has had a great impact accelerating large-scale applications. However, on these architectures, parallel algorithms, such as the Fast Fourier Transform (FFT), encounter that inter-processor communication become a bottleneck and limits their scalability. In this paper, we present techniques for speeding up multi-process communication cost during the computation of FFTs, considering hybrid network connections as those expected on upcoming exascale machines. Among our techniques, we present algorithmic tuning, making use of phase diagrams; parametric tuning, using different FFT settings; and MPI distribution tuning based on FFT size and computational resources available. We present several experiments obtained on Summit supercomputer at Oak Ridge National Laboratory, using up to 40,960 IBM Power9 cores and 6,144 NVIDIA V-100 GPUs.
AbstractList Today largest and most powerful supercomputers in the world are built on heterogeneous platforms; and using the combined power of multi-core CPUs and GPUs, has had a great impact accelerating large-scale applications. However, on these architectures, parallel algorithms, such as the Fast Fourier Transform (FFT), encounter that inter-processor communication become a bottleneck and limits their scalability. In this paper, we present techniques for speeding up multi-process communication cost during the computation of FFTs, considering hybrid network connections as those expected on upcoming exascale machines. Among our techniques, we present algorithmic tuning, making use of phase diagrams; parametric tuning, using different FFT settings; and MPI distribution tuning based on FFT size and computational resources available. We present several experiments obtained on Summit supercomputer at Oak Ridge National Laboratory, using up to 40,960 IBM Power9 cores and 6,144 NVIDIA V-100 GPUs.
Author Haidar, Azzam
Ayala, Alan
Dongarra, Jack
Stoyanov, Miroslav
Tomov, Stan
Author_xml – sequence: 1
  givenname: Alan
  surname: Ayala
  fullname: Ayala, Alan
  organization: University of Tennessee,Knoxville,TN,USA
– sequence: 2
  givenname: Stan
  surname: Tomov
  fullname: Tomov, Stan
  organization: University of Tennessee,Knoxville,TN,USA
– sequence: 3
  givenname: Miroslav
  surname: Stoyanov
  fullname: Stoyanov, Miroslav
  organization: Oak Ridge National Laboratory,Oak Ridge,TN,USA
– sequence: 4
  givenname: Azzam
  surname: Haidar
  fullname: Haidar, Azzam
  organization: Nvidia Corporation,Santa Clara,CA,USA
– sequence: 5
  givenname: Jack
  surname: Dongarra
  fullname: Dongarra, Jack
  organization: University of Tennessee,Knoxville,TN,USA
BookMark eNotjEFLwzAYQCPoQed-gSD5A635kiZpTjLqOgcb9jDPIyZfJJC2knag_96Cnt7hPd4duR7GAQl5BFYCMPO0_bbHbi8rqaqSMw4lYwzgiqyNrkEpWS1VzW7J88Y5TJjtHIdPerykOdKCdnl0OE20Gfv-MkS32HGgYcy0s9mmhImK4oW27eme3ASbJlz_c0Xe2-2peS0Ob7t9szkUkTMxF7zW3nDD0INWyCwPULugIDgJaJ3xWruP2lmtnPLSK68kcLMUSgcQJogVefj7RkQ8f-XY2_xzNkryWmjxCzCERpo
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ExaMPI54564.2021.00011
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781665411080
1665411082
EndPage 53
ExternalDocumentID 9652837
Genre orig-research
GrantInformation_xml – fundername: Office of Science and the National Nuclear Security Administration
  funderid: 10.13039/100006168
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i203t-287d9290ed176e0a2f18cf61fc51eac9d77cb8ca76c6d5d6d65129cf667f139f3
IEDL.DBID RIE
ISICitedReferencesCount 4
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000758726600006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Thu Jun 29 18:37:46 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-287d9290ed176e0a2f18cf61fc51eac9d77cb8ca76c6d5d6d65129cf667f139f3
PageCount 8
ParticipantIDs ieee_primary_9652837
PublicationCentury 2000
PublicationDate 2021-Nov.
PublicationDateYYYYMMDD 2021-11-01
PublicationDate_xml – month: 11
  year: 2021
  text: 2021-Nov.
PublicationDecade 2020
PublicationTitle 2021 Workshop on Exascale MPI (ExaMPI)
PublicationTitleAbbrev EXAMPI
PublicationYear 2021
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.8030696
Snippet Today largest and most powerful supercomputers in the world are built on heterogeneous platforms; and using the combined power of multi-core CPUs and GPUs, has...
SourceID ieee
SourceType Publisher
StartPage 46
SubjectTerms Exascale FFT
Fast Fourier transforms
Graphics processing units
Hybrid systems
Libraries
MPI tuning
Scalability
Slabs
Software
Supercomputers
Title Accelerating Multi - Process Communication for Parallel 3-D FFT
URI https://ieeexplore.ieee.org/document/9652837
WOSCitedRecordID wos000758726600006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB7a4sGTSiu-ycGjaZN95HES0S56sOyhQm8lm4cUZCvtVvz5JulSKXjxFkIgTIbk-zLJNwNwm4YEIqkm2HnugTOVWSxyVWFS5UZksiK5c7HYBJ9MxGwmyw7c7bQw1tr4-cwOQzO-5Zul3oRQ2UiykIqEd6HLOdtqtVrRLyVyNP5Wr-VLYAQhVpLQYeQ7e1VTImgUR_-b7hgGv-o7VO5w5QQ6tu7D_YPWHiGCv-p3FGWzCKP2mz_ak3kgz0NRqVahSsoHSvETKorpAN6K8fTxGbfFD_AiIWmD_U3GeOpCrKGcWaISR4V2jDqdU39YSsO5roRWnGlmcsMMC9DtRzDuPKtz6Sn06mVtzwApKTxpos5vVZZ5hqGUTHKXcam0qDyEnUM_GD__3Oa3mLd2X_zdfQmHYXW3erwr6DWrjb2GA_3VLNarm-iUHz4Ojcs
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5qFfSk0opvc_Bo2uxuNo-TiHZpsS17qNBbyeYhBdlKbcWfb7JdKoIXbyEEwmRIvi-TfDMAt0lIIJJogp3nHpgqarFIVYFJkRpBZUFS56piE3w8FtOpzBtwt9XCWGurz2e2E5rVW75Z6HUIlXUlC6lI-A7sppTGZKPWqmW_EZHd3pca5YPACUK0JI46FeP5VTelgo3s8H8THkH7R3-H8i2yHEPDli24f9DaY0TwWPmKKuEswqj-6I9-CT2QZ6IoV8tQJ-UNJfgJZdmkDS9Zb_LYx3X5AzyPSbLC_i5jPHkh1kScWaJiFwntWOR0GvnjUhrOdSG04kwzkxpmWABvP4Jx53mdS06gWS5KewpISeFpU-T8ZmXUcwylZJw6yqXSovAgdgatYPzsfZPhYlbbff539w3s9yej4Ww4GD9fwEFY6Y067xKaq-XaXsGe_lzNP5bXlYO-AdFKkRI
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2021+Workshop+on+Exascale+MPI+%28ExaMPI%29&rft.atitle=Accelerating+Multi+-+Process+Communication+for+Parallel+3-D+FFT&rft.au=Ayala%2C+Alan&rft.au=Tomov%2C+Stan&rft.au=Stoyanov%2C+Miroslav&rft.au=Haidar%2C+Azzam&rft.date=2021-11-01&rft.pub=IEEE&rft.spage=46&rft.epage=53&rft_id=info:doi/10.1109%2FExaMPI54564.2021.00011&rft.externalDocID=9652837