Accelerating Multi - Process Communication for Parallel 3-D FFT

Today largest and most powerful supercomputers in the world are built on heterogeneous platforms; and using the combined power of multi-core CPUs and GPUs, has had a great impact accelerating large-scale applications. However, on these architectures, parallel algorithms, such as the Fast Fourier Tra...

Full description

Saved in:

Bibliographic Details
Published in:	2021 Workshop on Exascale MPI (ExaMPI) pp. 46 - 53
Main Authors:	Ayala, Alan, Tomov, Stan, Stoyanov, Miroslav, Haidar, Azzam, Dongarra, Jack
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01.11.2021
Subjects:	Exascale FFT Fast Fourier transforms Graphics processing units Hybrid systems Libraries MPI tuning Scalability Slabs Software Supercomputers
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	Today largest and most powerful supercomputers in the world are built on heterogeneous platforms; and using the combined power of multi-core CPUs and GPUs, has had a great impact accelerating large-scale applications. However, on these architectures, parallel algorithms, such as the Fast Fourier Transform (FFT), encounter that inter-processor communication become a bottleneck and limits their scalability. In this paper, we present techniques for speeding up multi-process communication cost during the computation of FFTs, considering hybrid network connections as those expected on upcoming exascale machines. Among our techniques, we present algorithmic tuning, making use of phase diagrams; parametric tuning, using different FFT settings; and MPI distribution tuning based on FFT size and computational resources available. We present several experiments obtained on Summit supercomputer at Oak Ridge National Laboratory, using up to 40,960 IBM Power9 cores and 6,144 NVIDIA V-100 GPUs.
AbstractList	Today largest and most powerful supercomputers in the world are built on heterogeneous platforms; and using the combined power of multi-core CPUs and GPUs, has had a great impact accelerating large-scale applications. However, on these architectures, parallel algorithms, such as the Fast Fourier Transform (FFT), encounter that inter-processor communication become a bottleneck and limits their scalability. In this paper, we present techniques for speeding up multi-process communication cost during the computation of FFTs, considering hybrid network connections as those expected on upcoming exascale machines. Among our techniques, we present algorithmic tuning, making use of phase diagrams; parametric tuning, using different FFT settings; and MPI distribution tuning based on FFT size and computational resources available. We present several experiments obtained on Summit supercomputer at Oak Ridge National Laboratory, using up to 40,960 IBM Power9 cores and 6,144 NVIDIA V-100 GPUs.
Author	Haidar, Azzam Ayala, Alan Dongarra, Jack Stoyanov, Miroslav Tomov, Stan
Author_xml	– sequence: 1 givenname: Alan surname: Ayala fullname: Ayala, Alan organization: University of Tennessee,Knoxville,TN,USA – sequence: 2 givenname: Stan surname: Tomov fullname: Tomov, Stan organization: University of Tennessee,Knoxville,TN,USA – sequence: 3 givenname: Miroslav surname: Stoyanov fullname: Stoyanov, Miroslav organization: Oak Ridge National Laboratory,Oak Ridge,TN,USA – sequence: 4 givenname: Azzam surname: Haidar fullname: Haidar, Azzam organization: Nvidia Corporation,Santa Clara,CA,USA – sequence: 5 givenname: Jack surname: Dongarra fullname: Dongarra, Jack organization: University of Tennessee,Knoxville,TN,USA
BookMark	eNotjEFLwzAYQCPoQed-gSD5A635kiZpTjLqOgcb9jDPIyZfJJC2knag_96Cnt7hPd4duR7GAQl5BFYCMPO0_bbHbi8rqaqSMw4lYwzgiqyNrkEpWS1VzW7J88Y5TJjtHIdPerykOdKCdnl0OE20Gfv-MkS32HGgYcy0s9mmhImK4oW27eme3ASbJlz_c0Xe2-2peS0Ob7t9szkUkTMxF7zW3nDD0INWyCwPULugIDgJaJ3xWruP2lmtnPLSK68kcLMUSgcQJogVefj7RkQ8f-XY2_xzNkryWmjxCzCERpo
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/ExaMPI54564.2021.00011
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9781665411080 1665411082
EndPage	53
ExternalDocumentID	9652837
Genre	orig-research
GrantInformation_xml	– fundername: Office of Science and the National Nuclear Security Administration funderid: 10.13039/100006168
GroupedDBID	6IE 6IL CBEJK RIE RIL
ID	FETCH-LOGICAL-i203t-287d9290ed176e0a2f18cf61fc51eac9d77cb8ca76c6d5d6d65129cf667f139f3
IEDL.DBID	RIE
ISICitedReferencesCount	4
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000758726600006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Thu Jun 29 18:37:46 EDT 2023
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i203t-287d9290ed176e0a2f18cf61fc51eac9d77cb8ca76c6d5d6d65129cf667f139f3
PageCount	8
ParticipantIDs	ieee_primary_9652837
PublicationCentury	2000
PublicationDate	2021-Nov.
PublicationDateYYYYMMDD	2021-11-01
PublicationDate_xml	– month: 11 year: 2021 text: 2021-Nov.
PublicationDecade	2020
PublicationTitle	2021 Workshop on Exascale MPI (ExaMPI)
PublicationTitleAbbrev	EXAMPI
PublicationYear	2021
Publisher	IEEE
Publisher_xml	– name: IEEE
Score	1.8030696
Snippet	Today largest and most powerful supercomputers in the world are built on heterogeneous platforms; and using the combined power of multi-core CPUs and GPUs, has...
SourceID	ieee
SourceType	Publisher
StartPage	46
SubjectTerms	Exascale FFT Fast Fourier transforms Graphics processing units Hybrid systems Libraries MPI tuning Scalability Slabs Software Supercomputers
Title	Accelerating Multi - Process Communication for Parallel 3-D FFT
URI	https://ieeexplore.ieee.org/document/9652837
WOSCitedRecordID	wos000758726600006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB7a4sGTSiu-ycGjaZN95HES0S56sOyhQm8lm4cUZCvtVvz5JulSKXjxFkIgTIbk-zLJNwNwm4YEIqkm2HnugTOVWSxyVWFS5UZksiK5c7HYBJ9MxGwmyw7c7bQw1tr4-cwOQzO-5Zul3oRQ2UiykIqEd6HLOdtqtVrRLyVyNP5Wr-VLYAQhVpLQYeQ7e1VTImgUR_-b7hgGv-o7VO5w5QQ6tu7D_YPWHiGCv-p3FGWzCKP2mz_ak3kgz0NRqVahSsoHSvETKorpAN6K8fTxGbfFD_AiIWmD_U3GeOpCrKGcWaISR4V2jDqdU39YSsO5roRWnGlmcsMMC9DtRzDuPKtz6Sn06mVtzwApKTxpos5vVZZ5hqGUTHKXcam0qDyEnUM_GD__3Oa3mLd2X_zdfQmHYXW3erwr6DWrjb2GA_3VLNarm-iUHz4Ojcs
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5qFfSk0opvc_Bo2uxuNo-TiHZpsS17qNBbyeYhBdlKbcWfb7JdKoIXbyEEwmRIvi-TfDMAt0lIIJJogp3nHpgqarFIVYFJkRpBZUFS56piE3w8FtOpzBtwt9XCWGurz2e2E5rVW75Z6HUIlXUlC6lI-A7sppTGZKPWqmW_EZHd3pca5YPACUK0JI46FeP5VTelgo3s8H8THkH7R3-H8i2yHEPDli24f9DaY0TwWPmKKuEswqj-6I9-CT2QZ6IoV8tQJ-UNJfgJZdmkDS9Zb_LYx3X5AzyPSbLC_i5jPHkh1kScWaJiFwntWOR0GvnjUhrOdSG04kwzkxpmWABvP4Jx53mdS06gWS5KewpISeFpU-T8ZmXUcwylZJw6yqXSovAgdgatYPzsfZPhYlbbff539w3s9yej4Ww4GD9fwEFY6Y067xKaq-XaXsGe_lzNP5bXlYO-AdFKkRI
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2021+Workshop+on+Exascale+MPI+%28ExaMPI%29&rft.atitle=Accelerating+Multi+-+Process+Communication+for+Parallel+3-D+FFT&rft.au=Ayala%2C+Alan&rft.au=Tomov%2C+Stan&rft.au=Stoyanov%2C+Miroslav&rft.au=Haidar%2C+Azzam&rft.date=2021-11-01&rft.pub=IEEE&rft.spage=46&rft.epage=53&rft_id=info:doi/10.1109%2FExaMPI54564.2021.00011&rft.externalDocID=9652837