Accelerating Multi - Process Communication for Parallel 3-D FFT
Today largest and most powerful supercomputers in the world are built on heterogeneous platforms; and using the combined power of multi-core CPUs and GPUs, has had a great impact accelerating large-scale applications. However, on these architectures, parallel algorithms, such as the Fast Fourier Tra...
Gespeichert in:
| Veröffentlicht in: | 2021 Workshop on Exascale MPI (ExaMPI) S. 46 - 53 |
|---|---|
| Hauptverfasser: | , , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
01.11.2021
|
| Schlagworte: | |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Today largest and most powerful supercomputers in the world are built on heterogeneous platforms; and using the combined power of multi-core CPUs and GPUs, has had a great impact accelerating large-scale applications. However, on these architectures, parallel algorithms, such as the Fast Fourier Transform (FFT), encounter that inter-processor communication become a bottleneck and limits their scalability. In this paper, we present techniques for speeding up multi-process communication cost during the computation of FFTs, considering hybrid network connections as those expected on upcoming exascale machines. Among our techniques, we present algorithmic tuning, making use of phase diagrams; parametric tuning, using different FFT settings; and MPI distribution tuning based on FFT size and computational resources available. We present several experiments obtained on Summit supercomputer at Oak Ridge National Laboratory, using up to 40,960 IBM Power9 cores and 6,144 NVIDIA V-100 GPUs. |
|---|---|
| AbstractList | Today largest and most powerful supercomputers in the world are built on heterogeneous platforms; and using the combined power of multi-core CPUs and GPUs, has had a great impact accelerating large-scale applications. However, on these architectures, parallel algorithms, such as the Fast Fourier Transform (FFT), encounter that inter-processor communication become a bottleneck and limits their scalability. In this paper, we present techniques for speeding up multi-process communication cost during the computation of FFTs, considering hybrid network connections as those expected on upcoming exascale machines. Among our techniques, we present algorithmic tuning, making use of phase diagrams; parametric tuning, using different FFT settings; and MPI distribution tuning based on FFT size and computational resources available. We present several experiments obtained on Summit supercomputer at Oak Ridge National Laboratory, using up to 40,960 IBM Power9 cores and 6,144 NVIDIA V-100 GPUs. |
| Author | Haidar, Azzam Ayala, Alan Dongarra, Jack Stoyanov, Miroslav Tomov, Stan |
| Author_xml | – sequence: 1 givenname: Alan surname: Ayala fullname: Ayala, Alan organization: University of Tennessee,Knoxville,TN,USA – sequence: 2 givenname: Stan surname: Tomov fullname: Tomov, Stan organization: University of Tennessee,Knoxville,TN,USA – sequence: 3 givenname: Miroslav surname: Stoyanov fullname: Stoyanov, Miroslav organization: Oak Ridge National Laboratory,Oak Ridge,TN,USA – sequence: 4 givenname: Azzam surname: Haidar fullname: Haidar, Azzam organization: Nvidia Corporation,Santa Clara,CA,USA – sequence: 5 givenname: Jack surname: Dongarra fullname: Dongarra, Jack organization: University of Tennessee,Knoxville,TN,USA |
| BookMark | eNotjEFLwzAYQCPoQed-gSD5A635kiZpTjLqOgcb9jDPIyZfJJC2knag_96Cnt7hPd4duR7GAQl5BFYCMPO0_bbHbi8rqaqSMw4lYwzgiqyNrkEpWS1VzW7J88Y5TJjtHIdPerykOdKCdnl0OE20Gfv-MkS32HGgYcy0s9mmhImK4oW27eme3ASbJlz_c0Xe2-2peS0Ob7t9szkUkTMxF7zW3nDD0INWyCwPULugIDgJaJ3xWruP2lmtnPLSK68kcLMUSgcQJogVefj7RkQ8f-XY2_xzNkryWmjxCzCERpo |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ExaMPI54564.2021.00011 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9781665411080 1665411082 |
| EndPage | 53 |
| ExternalDocumentID | 9652837 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: Office of Science and the National Nuclear Security Administration funderid: 10.13039/100006168 |
| GroupedDBID | 6IE 6IL CBEJK RIE RIL |
| ID | FETCH-LOGICAL-i203t-287d9290ed176e0a2f18cf61fc51eac9d77cb8ca76c6d5d6d65129cf667f139f3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 4 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000758726600006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Thu Jun 29 18:37:46 EDT 2023 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i203t-287d9290ed176e0a2f18cf61fc51eac9d77cb8ca76c6d5d6d65129cf667f139f3 |
| PageCount | 8 |
| ParticipantIDs | ieee_primary_9652837 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-Nov. |
| PublicationDateYYYYMMDD | 2021-11-01 |
| PublicationDate_xml | – month: 11 year: 2021 text: 2021-Nov. |
| PublicationDecade | 2020 |
| PublicationTitle | 2021 Workshop on Exascale MPI (ExaMPI) |
| PublicationTitleAbbrev | EXAMPI |
| PublicationYear | 2021 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 1.8029679 |
| Snippet | Today largest and most powerful supercomputers in the world are built on heterogeneous platforms; and using the combined power of multi-core CPUs and GPUs, has... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 46 |
| SubjectTerms | Exascale FFT Fast Fourier transforms Graphics processing units Hybrid systems Libraries MPI tuning Scalability Slabs Software Supercomputers |
| Title | Accelerating Multi - Process Communication for Parallel 3-D FFT |
| URI | https://ieeexplore.ieee.org/document/9652837 |
| WOSCitedRecordID | wos000758726600006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB3a4sGTSit-k4NH0242bbJ7EtEuClr2UKW3kp0kUpBtWbfizzdJl0rBi7cQAmEykPdmkjcDcG24dCTExpSFoto4VNSFYCPnEIZDa6w0oYvC27OcTJLZLM1bcLPVwhhjwucz0_fD8Javl7j2qbJBKnwpEtmGtpRio9VqRL8sSgfjb_WSP3lG4HMlMesHvrPTNSWARnbwv-0OoferviP5FleOoGXKLtzeITqE8P4q30mQzRJKmm_-ZEfmQRwPJbmqfJeUD8LpA8myaQ9es_H0_pE2zQ_oIo54TV0kox11iYxmUphIxZYlaAWzOGLusky1lFgkqKRAoUdaaOGh260Q0jpWZ_kxdMplaU6AxF4uzQuLsZFDoXmRKOQRco0sSlQhT6HrjZ-vNvUt5o3dZ39Pn8O-P92NHu8COnW1Npewh1_14rO6Ck75AcMTjjM |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB1qFfSk0orf5uDRbTebNtk9iWiXFtuyhyq9lewkkYJsZW3Fn2-SLpWCF28hBMJkIPNmkjcP4FYzYUGIiQLqm2pjRwY2Betah1DsGG2E9ioKr0MxHsfTaZLV4G7DhdFa-89nuuWG_i1fLXDlSmXthLtWJGIHdp1yVsXWqmi_NEzavW85ygYOE7hqSURbHvFs6ab4sJEe_m_DI2j-8u9Itoksx1DTRQPuHxBtjHAeK96IJ86SgFQf_ckW0YNYJEoyWTqdlHfCgieSppMmvKS9yWM_qOQPgnkUsmVgcxllwUuoFRVchzIyNEbDqcEutddlooTAPEYpOHLVVVxxF7ztCi6MxXWGnUC9WBT6FEjkCNMsNxhp0eGK5bFEFiJTSMNY5uIMGs742ce6w8Wssvv87-kb2O9PRsPZcDB-voADd9Jrdt4l1JflSl_BHn4t55_ltXfQD127kXw |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2021+Workshop+on+Exascale+MPI+%28ExaMPI%29&rft.atitle=Accelerating+Multi+-+Process+Communication+for+Parallel+3-D+FFT&rft.au=Ayala%2C+Alan&rft.au=Tomov%2C+Stan&rft.au=Stoyanov%2C+Miroslav&rft.au=Haidar%2C+Azzam&rft.date=2021-11-01&rft.pub=IEEE&rft.spage=46&rft.epage=53&rft_id=info:doi/10.1109%2FExaMPI54564.2021.00011&rft.externalDocID=9652837 |