Low communication FMM-accelerated FFT on GPUs
Communication-avoiding algorithms have been a subject of growing interest in the last decade due to the growth of distributed memory systems and the disproportionate increase of computational throughput to communication bandwidth. For distributed 1D FFTs, communication costs quickly dominate executi...
Gespeichert in:
| Veröffentlicht in: | International Conference for High Performance Computing, Networking, Storage and Analysis (Online) S. 1 - 11 |
|---|---|
| 1. Verfasser: | |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
New York, NY, USA
ACM
12.11.2017
|
| Schriftenreihe: | ACM Conferences |
| Schlagworte: | |
| ISBN: | 9781450351140, 145035114X |
| ISSN: | 2167-4337 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Communication-avoiding algorithms have been a subject of growing interest in the last decade due to the growth of distributed memory systems and the disproportionate increase of computational throughput to communication bandwidth. For distributed 1D FFTs, communication costs quickly dominate execution time as all industry-standard implementations perform three all-to-all transpositions of the data. In this work, we reformulate an existing algorithm that employs the Fast Multipole Method to reduce the communication requirements to approximately a single all-to-all transpose. We present a detailed and clear implementation strategy that relies heavily on existing library primitives, demonstrate that our strategy achieves consistent speed-ups between 1.3x and 2.2x against cuFFTXT on up to eight NVIDIA Tesla P100 GPUs, and develop an accurate compute model to analyze the performance and dependencies of the algorithm. |
|---|---|
| AbstractList | Communication-avoiding algorithms have been a subject of growing interest in the last decade due to the growth of distributed memory systems and the disproportionate increase of computational throughput to communication bandwidth. For distributed 1D FFTs, communication costs quickly dominate execution time as all industry-standard implementations perform three all-to-all transpositions of the data. In this work, we reformulate an existing algorithm that employs the Fast Multipole Method to reduce the communication requirements to approximately a single all-to-all transpose. We present a detailed and clear implementation strategy that relies heavily on existing library primitives, demonstrate that our strategy achieves consistent speed-ups between 1.3x and 2.2x against cuFFTXT on up to eight NVIDIA Tesla P100 GPUs, and develop an accurate compute model to analyze the performance and dependencies of the algorithm. Communication-avoiding algorithms have been a subject of growing interest in the last decade due to the growth of distributed memory systems and the disproportionate increase of computational throughput to communication bandwidth. For distributed 1D FFTs, communication costs quickly dominate execution time as all industry-standard implementations perform three all-to-all transpositions of the data. In this work, we reformulate an existing algorithm that employs the Fast Multipole Method to reduce the communication requirements to approximately a single all-to-all transpose. We present a detailed and clear implementation strategy that relies heavily on existing library primitives, demonstrate that our strategy achieves consistent speed-ups between 1. 3\times and 2. 2\times against cuFFTXT on up to eight NVIDIA Tesla P100 GPUs, and develop an accurate compute model to analyze the performance and dependencies of the algorithm. |
| Author | Cecka, Cris |
| Author_xml | – sequence: 1 givenname: Cris surname: Cecka fullname: Cecka, Cris email: ccecka@nvidia.com organization: NVIDIA |
| BookMark | eNqNkDtPwzAUhc1LopTMDCwZWRJ87TixR1SRgpQKhna2rl9SRJOgpAjx7zE0ExN3OdL9dM7wXZHzfug9ITdAc4BC3HNgpaIy_01QJyRRlYyAchE5PSULBmWVFZxXZ3_YJUmmqTVUUChBFHRBsmb4TO3QdR99a_HQDn1abzYZWuv3fsSDd2ldb9P4Xr_upmtyEXA_-WTOJdnVj9vVU9a8rJ9XD02GHKpDFsCXQiq0FJVTIAQ4Lr3BYMBRESxKJisnLS18JXm8UDgMiNwVpmRM8SW5Pe623nv9PrYdjl9aKVYCY5HmR4q202YY3iYNVP-40bMbPbvRZmx9iIW7fxb4N4hjXwE |
| ContentType | Conference Proceeding |
| Copyright | 2017 ACM |
| Copyright_xml | – notice: 2017 ACM |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/3126908.3126919 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9781450351140 145035114X |
| EISSN | 2167-4337 |
| EndPage | 11 |
| ExternalDocumentID | 9926122 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IL 6IN ABLEC ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK IEGSK OCL RIB RIC RIE RIL 6IH 6IK AAWTH ADZIZ CHZPO IPLJI |
| ID | FETCH-LOGICAL-a317t-f1e6589ac0a9d91551d38ebafb1d05fca8287d8c04e783333f4dafaa3d4b62293 |
| IEDL.DBID | RIE |
| ISBN | 9781450351140 145035114X |
| ISICitedReferencesCount | 6 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000458161700054&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:19:14 EDT 2025 Wed Jan 31 06:44:57 EST 2024 Wed Jan 31 06:44:12 EST 2024 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Keywords | FFT multi-GPU GPU FMM |
| Language | English |
| License | Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. |
| LinkModel | DirectLink |
| MeetingName | SC '17: The International Conference for High Performance Computing, Networking, Storage and Analysis |
| MergedId | FETCHMERGED-LOGICAL-a317t-f1e6589ac0a9d91551d38ebafb1d05fca8287d8c04e783333f4dafaa3d4b62293 |
| PageCount | 11 |
| ParticipantIDs | acm_books_10_1145_3126908_3126919_brief ieee_primary_9926122 acm_books_10_1145_3126908_3126919 |
| PublicationCentury | 2000 |
| PublicationDate | 20171112 2017-Nov.-12 |
| PublicationDateYYYYMMDD | 2017-11-12 |
| PublicationDate_xml | – month: 11 year: 2017 text: 20171112 day: 12 |
| PublicationDecade | 2010 |
| PublicationPlace | New York, NY, USA |
| PublicationPlace_xml | – name: New York, NY, USA |
| PublicationSeriesTitle | ACM Conferences |
| PublicationTitle | International Conference for High Performance Computing, Networking, Storage and Analysis (Online) |
| PublicationTitleAbbrev | SC |
| PublicationYear | 2017 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssib050161540 ssj0003204180 |
| Score | 1.699429 |
| Snippet | Communication-avoiding algorithms have been a subject of growing interest in the last decade due to the growth of distributed memory systems and the... |
| SourceID | ieee acm |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Analytical models Computational modeling Costs FFT FMM GPU Memory management Multi-GPU Predictive models Tensors Throughput |
| Title | Low communication FMM-accelerated FFT on GPUs |
| URI | https://ieeexplore.ieee.org/document/9926122 |
| WOSCitedRecordID | wos000458161700054&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3LSgMxFL20xYWrqq1YX4wguDHt5DGdZCni6MKWLlrobsgTXNhKH_r7Jum0UhDEbCaEWQyHTM69Sc65ALdECWUlCYoeQRHrM4M8z_eRj3a50cxxR1UsNpEPh3w6FaMa3O-0MNbaePnMdkM3nuWbuV6HrbKeEMHwyi-49TzPN1qt7dzJYuhS-ZaEVZiSlGGeVm4-mGU9iolPBXk3PoOzTl3q972iKpFTiub_vuYI2j_ivGS0o51jqNnZCTS31RmS6mdtAXqdfyV7-o-kGAzQg9aeaYJBhEmKYpz44efRZNmGSfE0fnxBVXEEJD3lr5DD1gcPQupUChNM3rGh3CrpFDZp5rQMTvaG65TZnFPfHDPSSUkNU33iSf4UGrP5zJ5BklGFHXaSKCx9uuVETkiqDWGe4ZSjtAM3HqkyRP3LciNkzsoKzbJCswN3f75TKp_9uw60Apblx8ZNo6xgPP99-AIOSSDScPGOXEJjtVjbKzjQn6u35eI6ToFvAkCo-Q |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3LSgMxFL3UKuiqaivW5wiCG9NOHtNOliKOFdvSRQvdhTzBha30ob9vMp1WCoKYzYQwi-GQybk3yTkX4JYorqwkQdHDKWItZpDn-Rby0W5qNHOpoyovNtHu99PxmA9KcL_Rwlhr88tnthG6-Vm-mepl2Cprch4Mr_yCu5swRvBKrbWePUkevBTOJWEdpiRmOI0LPx_MkibFxCeDaSN_Bm-dHanft8qq5KySVf73PYdQ-5HnRYMN8RxByU6OobKuzxAVv2sVUHf6FW0pQKKs10MPWnuuCRYRJsqyYeSHnwejeQ1G2dPwsYOK8ghIetJfIIetDx-41LHkJti8Y0NTq6RT2MSJ0zJ42ZtUx8y2U-qbY0Y6KalhqkU8zZ9AeTKd2FOIEqqww04ShaVPuBxvExJrQ5jnOOUorcONR0qEuH8uVlLmRBRoigLNOtz9-Y5QPv93dagGLMXHyk9DFDCe_T58DfudYa8rui_913M4IIFWwzU8cgHlxWxpL2FPfy7e5rOrfDp8AwMQrEA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis+%28Online%29&rft.atitle=Low+Communication+FMM-Accelerated+FFT+on+GPUs&rft.au=Cecka%2C+Cris&rft.date=2017-11-12&rft.pub=ACM&rft.eissn=2167-4337&rft.spage=1&rft.epage=11&rft_id=info:doi/10.1145%2F3126908.3126919&rft.externalDocID=9926122 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450351140/lc.gif&client=summon&freeimage=true |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450351140/mc.gif&client=summon&freeimage=true |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450351140/sc.gif&client=summon&freeimage=true |

