Parallel Fast Multipole Method accelerated FFT on HPC clusters
With increasing sizes of distributed systems, there comes an increased risk of communication bottlenecks. In the past decade there has been a growing interest in communication-avoiding algorithms. The distributed memory Fast Fourier Transform is an important algorithm which suffers from major commun...
Saved in:
| Published in: | Parallel computing Vol. 104-105; p. 102783 |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier B.V
01.07.2021
|
| Subjects: | |
| ISSN: | 0167-8191, 1872-7336 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | With increasing sizes of distributed systems, there comes an increased risk of communication bottlenecks. In the past decade there has been a growing interest in communication-avoiding algorithms. The distributed memory Fast Fourier Transform is an important algorithm which suffers from major communication bottlenecks. In this work, we take a look at an existing communication-avoiding algorithm FMM-FFT, an alternative to FFT which utilizes the Fast Multipole Method (FMM) to reduce communications to a single all-to-all communication. We present a detailed implementation of FMM-FFT relying on modern libraries and demonstrate it on two distinct distributed memory architectures notably a traditional Intel Xeon based HPC cluster and then a Beowulf cluster. We show that while the FMM-FFT is significantly slower than FFT on the traditional HPC cluster, on the Beowulf cluster it outperforms standard FFT, consistently getting speedups of 1.5x or more against FFTW. We then proceed to show how the communication to computation cost metric is important and useful in explaining the performance results of FMM-FFT against standard FFT. The source code pertaining to this work is being made publicly available under a permissive open source licence at Github. |
|---|---|
| AbstractList | With increasing sizes of distributed systems, there comes an increased risk of communication bottlenecks. In the past decade there has been a growing interest in communication-avoiding algorithms. The distributed memory Fast Fourier Transform is an important algorithm which suffers from major communication bottlenecks. In this work, we take a look at an existing communication-avoiding algorithm FMM-FFT, an alternative to FFT which utilizes the Fast Multipole Method (FMM) to reduce communications to a single all-to-all communication. We present a detailed implementation of FMM-FFT relying on modern libraries and demonstrate it on two distinct distributed memory architectures notably a traditional Intel Xeon based HPC cluster and then a Beowulf cluster. We show that while the FMM-FFT is significantly slower than FFT on the traditional HPC cluster, on the Beowulf cluster it outperforms standard FFT, consistently getting speedups of 1.5x or more against FFTW. We then proceed to show how the communication to computation cost metric is important and useful in explaining the performance results of FMM-FFT against standard FFT. The source code pertaining to this work is being made publicly available under a permissive open source licence at Github. |
| ArticleNumber | 102783 |
| Author | Chaudhury, Bhaskar Karthi, Amarnath Mehta, Chahak Jetly, Vishrut |
| Author_xml | – sequence: 1 givenname: Chahak surname: Mehta fullname: Mehta, Chahak email: 201501422@daiict.ac.in – sequence: 2 givenname: Amarnath surname: Karthi fullname: Karthi, Amarnath email: 201501005@daiict.ac.in – sequence: 3 givenname: Vishrut surname: Jetly fullname: Jetly, Vishrut email: 201601449@daiict.ac.in – sequence: 4 givenname: Bhaskar surname: Chaudhury fullname: Chaudhury, Bhaskar email: bhaskar_chaudhury@daiict.ac.in |
| BookMark | eNp9kM1KAzEUhYNUsK0-gZu8wNT8zSSzUJDitEKLXdR1yCR3cEqclCQVfHun1rWrC4f7HQ7fDE2GMABC95QsKKHVw2FxNNGGBSOMjgmTil-hKVWSFZLzaoKm45csFK3pDZqldCCEVEKRKXramWi8B48bkzLennzuj8ED3kL-CA4ba8FDNBkcbpo9DgNe75bY-lPKENMtuu6MT3D3d-fovXnZL9fF5m31unzeFJaVPBctUUYYp5yUdd1WgoAobalEKVrKJDNGcGKIVLSsheKOOGgBhDCis0441vE54pdeG0NKETp9jP2nid-aEn1WoA_6V4E-K9AXBSP1eKFgnPbVQ9TJ9jBYcH0Em7UL_b_8D1MwZmk |
| Cites_doi | 10.1137/0733082 10.1090/S0025-5718-1994-1185244-1 10.1137/0914081 10.1049/el:19840012 10.1090/S0025-5718-1965-0178586-1 10.1016/S0167-8191(84)90413-7 10.1137/S1064827597316266 |
| ContentType | Journal Article |
| Copyright | 2021 Elsevier B.V. |
| Copyright_xml | – notice: 2021 Elsevier B.V. |
| DBID | AAYXX CITATION |
| DOI | 10.1016/j.parco.2021.102783 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1872-7336 |
| ExternalDocumentID | 10_1016_j_parco_2021_102783 S0167819121000405 |
| GroupedDBID | --K --M -~X .DC .~1 0R~ 123 1B1 1~. 1~5 29O 4.4 457 4G. 5VS 6OB 7-5 71M 8P~ 9JN AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABEFU ABFNM ABJNI ABMAC ABXDB ABYKQ ACDAQ ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADJOM ADMUD ADTZH AEBSH AECPX AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q G8K GBLVA GBOLZ HLZ HVGLF HZ~ H~9 IHE J1W JJJVA KOM LG9 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SBC SCC SDF SDG SDP SES SEW SPC SPCBC SST SSV SSZ T5K WH7 WUQ XPP ZMT ~G- 9DU AATTM AAXKI AAYWO AAYXX ABDPE ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD |
| ID | FETCH-LOGICAL-c253t-b08a4ad8d7799b640e45c58454b1272aa430a078159483d0debee44a4fcd4d2f3 |
| ISICitedReferencesCount | 1 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000654719400003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0167-8191 |
| IngestDate | Sat Nov 29 07:24:06 EST 2025 Fri Feb 23 02:45:35 EST 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Fast Fourier Transform Fast Multipole Method Parallel programming Communication avoiding algorithms High performance computing Beowulf cluster |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c253t-b08a4ad8d7799b640e45c58454b1272aa430a078159483d0debee44a4fcd4d2f3 |
| ParticipantIDs | crossref_primary_10_1016_j_parco_2021_102783 elsevier_sciencedirect_doi_10_1016_j_parco_2021_102783 |
| PublicationCentury | 2000 |
| PublicationDate | July 2021 2021-07-00 |
| PublicationDateYYYYMMDD | 2021-07-01 |
| PublicationDate_xml | – month: 07 year: 2021 text: July 2021 |
| PublicationDecade | 2020 |
| PublicationTitle | Parallel computing |
| PublicationYear | 2021 |
| Publisher | Elsevier B.V |
| Publisher_xml | – name: Elsevier B.V |
| References | Cecka (b13) 2017 Swarztrauber (b7) 1984; 1 (b9) 2017 Van Loan (b15) 1992 Rao, Kim, Hwang (b2) 2010 Frigo, Johnson (b8) 1998 Edelman, McCorquodale, Toledo (b10) 1998; 20 Greengard (b11) 1987 Dutt, Gu, Rokhlin (b18) 1996; 33 Dutt, Rokhlin (b16) 1993; 14 Crandall, Fagin (b5) 1994; 62 Cooley, Tukey (b1) 1965; 19 T. Sterling, D. J. Becker, D. Savarese, J. E. Dorband, U. A. Ranawake, C. V. Packer, BEOWULF: A parallel workstation for scientific computation, 1, 1970. Cipra (b3) 2000; 33 Langston, Baskaran, Meister, Vasilache, Lethin (b12) 2013 Demmel (b6) 2013 Duhamel, Hollmann (b17) 1984; 20 Honma, Akiyama, Tazaki, Kuramochi, Ikeda, Hada, Uemura (b4) 2016 Honma (10.1016/j.parco.2021.102783_b4) 2016 Van Loan (10.1016/j.parco.2021.102783_b15) 1992 Cooley (10.1016/j.parco.2021.102783_b1) 1965; 19 Frigo (10.1016/j.parco.2021.102783_b8) 1998 Crandall (10.1016/j.parco.2021.102783_b5) 1994; 62 Edelman (10.1016/j.parco.2021.102783_b10) 1998; 20 Cecka (10.1016/j.parco.2021.102783_b13) 2017 Greengard (10.1016/j.parco.2021.102783_b11) 1987 10.1016/j.parco.2021.102783_b14 Rao (10.1016/j.parco.2021.102783_b2) 2010 Duhamel (10.1016/j.parco.2021.102783_b17) 1984; 20 Dutt (10.1016/j.parco.2021.102783_b18) 1996; 33 Demmel (10.1016/j.parco.2021.102783_b6) 2013 (10.1016/j.parco.2021.102783_b9) 2017 Dutt (10.1016/j.parco.2021.102783_b16) 1993; 14 Swarztrauber (10.1016/j.parco.2021.102783_b7) 1984; 1 Cipra (10.1016/j.parco.2021.102783_b3) 2000; 33 Langston (10.1016/j.parco.2021.102783_b12) 2013 |
| References_xml | – year: 1987 ident: b11 article-title: The Rapid Evaluation of Potential Fields in Particle Systems – start-page: 1381 year: 1998 end-page: 1384 ident: b8 article-title: FFTW: an adaptive software architecture for the FFT publication-title: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP ’98 (Cat. No.98CH36181), Vol. 3 – volume: 33 year: 2000 ident: b3 article-title: The best of the 20th century: Editors name top 10 algorithms publication-title: SIAM News – volume: 20 start-page: 1094 year: 1998 end-page: 1114 ident: b10 article-title: The future fast fourier transform? publication-title: SIAM J. Sci. Comput. – volume: 1 start-page: 45 year: 1984 end-page: 63 ident: b7 article-title: FFT algorithms for vector computers publication-title: Parallel Comput. – start-page: 1 year: 2013 end-page: 6 ident: b12 article-title: Re-introduction of communication-avoiding FMM-accelerated FFTs with GPU acceleration publication-title: 2013 IEEE High Performance Extreme Computing Conference (HPEC) – volume: 33 start-page: 1689 year: 1996 end-page: 1711 ident: b18 article-title: Fast algorithms for polynomial interpolation, integration, and differentiation publication-title: SIAM J. Numer. Anal. – volume: 19 start-page: 297 year: 1965 end-page: 301 ident: b1 article-title: An algorithm for the machine calculation of complex fourier series publication-title: Math. Comp. – volume: 20 start-page: 14 year: 1984 end-page: 16 ident: b17 article-title: ‘Split radix’ FFT algorithm publication-title: Electron. Lett. – start-page: 54:1 year: 2017 end-page: 54:11 ident: b13 article-title: Low communication FMM-accelerated FFT on GPUs publication-title: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017, Denver, CO, USA, November 12 - 17, 2017 – year: 2010 ident: b2 article-title: Fast Fourier Transform - Algorithms and Applications – year: 2017 ident: b9 article-title: Intel MKL DFTI library – volume: 62 start-page: 305 year: 1994 end-page: 324 ident: b5 article-title: Discrete weighted transforms and large-integer arithmetic publication-title: Math. Comp. – reference: T. Sterling, D. J. Becker, D. Savarese, J. E. Dorband, U. A. Ranawake, C. V. Packer, BEOWULF: A parallel workstation for scientific computation, 1, 1970. – year: 1992 ident: b15 article-title: Computational Frameworks for the Fast Fourier Transform – volume: 14 start-page: 1368 year: 1993 end-page: 1393 ident: b16 article-title: Fast fourier transforms for nonequispaced data publication-title: SIAM J. Sci. Comput. – year: 2016 ident: b4 article-title: Imaging black holes with sparse modeling publication-title: Journal of Physics: Conference Series, Vol. 699 – year: 2013 ident: b6 article-title: Communication-avoiding algorithms for linear algebra and beyond publication-title: Proceedings of 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, Boston, MA, USA – year: 2016 ident: 10.1016/j.parco.2021.102783_b4 article-title: Imaging black holes with sparse modeling – volume: 33 start-page: 1689 issue: 5 year: 1996 ident: 10.1016/j.parco.2021.102783_b18 article-title: Fast algorithms for polynomial interpolation, integration, and differentiation publication-title: SIAM J. Numer. Anal. doi: 10.1137/0733082 – start-page: 54:1 year: 2017 ident: 10.1016/j.parco.2021.102783_b13 article-title: Low communication FMM-accelerated FFT on GPUs – year: 1987 ident: 10.1016/j.parco.2021.102783_b11 – volume: 62 start-page: 305 issue: 205 year: 1994 ident: 10.1016/j.parco.2021.102783_b5 article-title: Discrete weighted transforms and large-integer arithmetic publication-title: Math. Comp. doi: 10.1090/S0025-5718-1994-1185244-1 – volume: 14 start-page: 1368 issue: 6 year: 1993 ident: 10.1016/j.parco.2021.102783_b16 article-title: Fast fourier transforms for nonequispaced data publication-title: SIAM J. Sci. Comput. doi: 10.1137/0914081 – volume: 20 start-page: 14 issue: 1 year: 1984 ident: 10.1016/j.parco.2021.102783_b17 article-title: ‘Split radix’ FFT algorithm publication-title: Electron. Lett. doi: 10.1049/el:19840012 – year: 1992 ident: 10.1016/j.parco.2021.102783_b15 – volume: 19 start-page: 297 issue: 90 year: 1965 ident: 10.1016/j.parco.2021.102783_b1 article-title: An algorithm for the machine calculation of complex fourier series publication-title: Math. Comp. doi: 10.1090/S0025-5718-1965-0178586-1 – ident: 10.1016/j.parco.2021.102783_b14 – volume: 1 start-page: 45 issue: 1 year: 1984 ident: 10.1016/j.parco.2021.102783_b7 article-title: FFT algorithms for vector computers publication-title: Parallel Comput. doi: 10.1016/S0167-8191(84)90413-7 – year: 2013 ident: 10.1016/j.parco.2021.102783_b6 article-title: Communication-avoiding algorithms for linear algebra and beyond – start-page: 1381 year: 1998 ident: 10.1016/j.parco.2021.102783_b8 article-title: FFTW: an adaptive software architecture for the FFT – start-page: 1 year: 2013 ident: 10.1016/j.parco.2021.102783_b12 article-title: Re-introduction of communication-avoiding FMM-accelerated FFTs with GPU acceleration – year: 2010 ident: 10.1016/j.parco.2021.102783_b2 – volume: 33 year: 2000 ident: 10.1016/j.parco.2021.102783_b3 article-title: The best of the 20th century: Editors name top 10 algorithms publication-title: SIAM News – year: 2017 ident: 10.1016/j.parco.2021.102783_b9 – volume: 20 start-page: 1094 issue: 3 year: 1998 ident: 10.1016/j.parco.2021.102783_b10 article-title: The future fast fourier transform? publication-title: SIAM J. Sci. Comput. doi: 10.1137/S1064827597316266 |
| SSID | ssj0006480 |
| Score | 2.2827442 |
| Snippet | With increasing sizes of distributed systems, there comes an increased risk of communication bottlenecks. In the past decade there has been a growing interest... |
| SourceID | crossref elsevier |
| SourceType | Index Database Publisher |
| StartPage | 102783 |
| SubjectTerms | Beowulf cluster Communication avoiding algorithms Fast Fourier Transform Fast Multipole Method High performance computing Parallel programming |
| Title | Parallel Fast Multipole Method accelerated FFT on HPC clusters |
| URI | https://dx.doi.org/10.1016/j.parco.2021.102783 |
| Volume | 104-105 |
| WOSCitedRecordID | wos000654719400003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1872-7336 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0006480 issn: 0167-8191 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb5wwELbaTQ-99F01TRv50BslAmMevkTarrJK0jbawzbaGzI2iDxKVsBW-fmdMeCQNoqaQy9ohWDWYj5mhvHMN4R8ivwoC7B6Ride4PJCJm4WxMwVgBYhuFa-6Xo__RafnCSrlVj0G-2NGScQV1VyfS3W_1XVcA6Uja2zD1C3FQon4DcoHY6gdjj-k-IXssb5KJfOXDatYxps11hC-N3MinakUuBpkCBCO_P5EjcLDhczR11ukDKhGQerVpIyox8GJ2foessu6pyVspS22ecrLMcMCXamPyXmGW2u-Thvu5HWp2dNWW9srQ3cv9Flv5f_pZTNhazHiQjm26JVm5sEm4vff7eMK9IieuHIRPq41xncab27RML53hoUj42ZzN-7ufo2V_YfPsxWFg5Fa-epEZKikLQT8phssTgUyYRsTY8OVsfWYUfcDNiz6x_IqUwZ4F9ruTuAGQUlyxfkWf81QacdCl6SR3n1ijwfJnXQ3nC_JvuDKimCglpQ0A4UdAQKCqCgVxUFUNABFG_Ij_nBcnbo9pMzXMXCoHUzL5Fc6kTHsRBZxL2chwpCzZBnPouZlDzwJNI8IVlPoD0Nr3LOueSF0lyzInhLJtVVlb8jNPK11lwWjOUQ8BQQ78Nbj55BREVeZGKbfB4eR7ruCFLSe5SwTaLhkaV9jNfFbimA4L4b3z_sf3bI0xuMfiCTtt7kH8kT9as9a-rdHgG_AU2kcGg |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Parallel+Fast+Multipole+Method+accelerated+FFT+on+HPC+clusters&rft.jtitle=Parallel+computing&rft.au=Mehta%2C+Chahak&rft.au=Karthi%2C+Amarnath&rft.au=Jetly%2C+Vishrut&rft.au=Chaudhury%2C+Bhaskar&rft.date=2021-07-01&rft.issn=0167-8191&rft.volume=104-105&rft.spage=102783&rft_id=info:doi/10.1016%2Fj.parco.2021.102783&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_parco_2021_102783 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0167-8191&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0167-8191&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0167-8191&client=summon |