Parallel Fast Multipole Method accelerated FFT on HPC clusters

With increasing sizes of distributed systems, there comes an increased risk of communication bottlenecks. In the past decade there has been a growing interest in communication-avoiding algorithms. The distributed memory Fast Fourier Transform is an important algorithm which suffers from major commun...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Parallel computing Jg. 104-105; S. 102783
Hauptverfasser: Mehta, Chahak, Karthi, Amarnath, Jetly, Vishrut, Chaudhury, Bhaskar
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier B.V 01.07.2021
Schlagworte:
ISSN:0167-8191, 1872-7336
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract With increasing sizes of distributed systems, there comes an increased risk of communication bottlenecks. In the past decade there has been a growing interest in communication-avoiding algorithms. The distributed memory Fast Fourier Transform is an important algorithm which suffers from major communication bottlenecks. In this work, we take a look at an existing communication-avoiding algorithm FMM-FFT, an alternative to FFT which utilizes the Fast Multipole Method (FMM) to reduce communications to a single all-to-all communication. We present a detailed implementation of FMM-FFT relying on modern libraries and demonstrate it on two distinct distributed memory architectures notably a traditional Intel Xeon based HPC cluster and then a Beowulf cluster. We show that while the FMM-FFT is significantly slower than FFT on the traditional HPC cluster, on the Beowulf cluster it outperforms standard FFT, consistently getting speedups of 1.5x or more against FFTW. We then proceed to show how the communication to computation cost metric is important and useful in explaining the performance results of FMM-FFT against standard FFT. The source code pertaining to this work is being made publicly available under a permissive open source licence at Github.
AbstractList With increasing sizes of distributed systems, there comes an increased risk of communication bottlenecks. In the past decade there has been a growing interest in communication-avoiding algorithms. The distributed memory Fast Fourier Transform is an important algorithm which suffers from major communication bottlenecks. In this work, we take a look at an existing communication-avoiding algorithm FMM-FFT, an alternative to FFT which utilizes the Fast Multipole Method (FMM) to reduce communications to a single all-to-all communication. We present a detailed implementation of FMM-FFT relying on modern libraries and demonstrate it on two distinct distributed memory architectures notably a traditional Intel Xeon based HPC cluster and then a Beowulf cluster. We show that while the FMM-FFT is significantly slower than FFT on the traditional HPC cluster, on the Beowulf cluster it outperforms standard FFT, consistently getting speedups of 1.5x or more against FFTW. We then proceed to show how the communication to computation cost metric is important and useful in explaining the performance results of FMM-FFT against standard FFT. The source code pertaining to this work is being made publicly available under a permissive open source licence at Github.
ArticleNumber 102783
Author Chaudhury, Bhaskar
Karthi, Amarnath
Mehta, Chahak
Jetly, Vishrut
Author_xml – sequence: 1
  givenname: Chahak
  surname: Mehta
  fullname: Mehta, Chahak
  email: 201501422@daiict.ac.in
– sequence: 2
  givenname: Amarnath
  surname: Karthi
  fullname: Karthi, Amarnath
  email: 201501005@daiict.ac.in
– sequence: 3
  givenname: Vishrut
  surname: Jetly
  fullname: Jetly, Vishrut
  email: 201601449@daiict.ac.in
– sequence: 4
  givenname: Bhaskar
  surname: Chaudhury
  fullname: Chaudhury, Bhaskar
  email: bhaskar_chaudhury@daiict.ac.in
BookMark eNp9kM1KAzEUhYNUsK0-gZu8wNT8zSSzUJDitEKLXdR1yCR3cEqclCQVfHun1rWrC4f7HQ7fDE2GMABC95QsKKHVw2FxNNGGBSOMjgmTil-hKVWSFZLzaoKm45csFK3pDZqldCCEVEKRKXramWi8B48bkzLennzuj8ED3kL-CA4ba8FDNBkcbpo9DgNe75bY-lPKENMtuu6MT3D3d-fovXnZL9fF5m31unzeFJaVPBctUUYYp5yUdd1WgoAobalEKVrKJDNGcGKIVLSsheKOOGgBhDCis0441vE54pdeG0NKETp9jP2nid-aEn1WoA_6V4E-K9AXBSP1eKFgnPbVQ9TJ9jBYcH0Em7UL_b_8D1MwZmk
Cites_doi 10.1137/0733082
10.1090/S0025-5718-1994-1185244-1
10.1137/0914081
10.1049/el:19840012
10.1090/S0025-5718-1965-0178586-1
10.1016/S0167-8191(84)90413-7
10.1137/S1064827597316266
ContentType Journal Article
Copyright 2021 Elsevier B.V.
Copyright_xml – notice: 2021 Elsevier B.V.
DBID AAYXX
CITATION
DOI 10.1016/j.parco.2021.102783
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1872-7336
ExternalDocumentID 10_1016_j_parco_2021_102783
S0167819121000405
GroupedDBID --K
--M
-~X
.DC
.~1
0R~
123
1B1
1~.
1~5
29O
4.4
457
4G.
5VS
6OB
7-5
71M
8P~
9JN
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
ABBOA
ABEFU
ABFNM
ABJNI
ABMAC
ABXDB
ABYKQ
ACDAQ
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADJOM
ADMUD
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
G8K
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
H~9
IHE
J1W
JJJVA
KOM
LG9
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
ROL
RPZ
SBC
SCC
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SST
SSV
SSZ
T5K
WH7
WUQ
XPP
ZMT
~G-
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
ID FETCH-LOGICAL-c253t-b08a4ad8d7799b640e45c58454b1272aa430a078159483d0debee44a4fcd4d2f3
ISICitedReferencesCount 1
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000654719400003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0167-8191
IngestDate Sat Nov 29 07:24:06 EST 2025
Fri Feb 23 02:45:35 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Fast Fourier Transform
Fast Multipole Method
Parallel programming
Communication avoiding algorithms
High performance computing
Beowulf cluster
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c253t-b08a4ad8d7799b640e45c58454b1272aa430a078159483d0debee44a4fcd4d2f3
ParticipantIDs crossref_primary_10_1016_j_parco_2021_102783
elsevier_sciencedirect_doi_10_1016_j_parco_2021_102783
PublicationCentury 2000
PublicationDate July 2021
2021-07-00
PublicationDateYYYYMMDD 2021-07-01
PublicationDate_xml – month: 07
  year: 2021
  text: July 2021
PublicationDecade 2020
PublicationTitle Parallel computing
PublicationYear 2021
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Cecka (b13) 2017
Swarztrauber (b7) 1984; 1
(b9) 2017
Van Loan (b15) 1992
Rao, Kim, Hwang (b2) 2010
Frigo, Johnson (b8) 1998
Edelman, McCorquodale, Toledo (b10) 1998; 20
Greengard (b11) 1987
Dutt, Gu, Rokhlin (b18) 1996; 33
Dutt, Rokhlin (b16) 1993; 14
Crandall, Fagin (b5) 1994; 62
Cooley, Tukey (b1) 1965; 19
T. Sterling, D. J. Becker, D. Savarese, J. E. Dorband, U. A. Ranawake, C. V. Packer, BEOWULF: A parallel workstation for scientific computation, 1, 1970.
Cipra (b3) 2000; 33
Langston, Baskaran, Meister, Vasilache, Lethin (b12) 2013
Demmel (b6) 2013
Duhamel, Hollmann (b17) 1984; 20
Honma, Akiyama, Tazaki, Kuramochi, Ikeda, Hada, Uemura (b4) 2016
Honma (10.1016/j.parco.2021.102783_b4) 2016
Van Loan (10.1016/j.parco.2021.102783_b15) 1992
Cooley (10.1016/j.parco.2021.102783_b1) 1965; 19
Frigo (10.1016/j.parco.2021.102783_b8) 1998
Crandall (10.1016/j.parco.2021.102783_b5) 1994; 62
Edelman (10.1016/j.parco.2021.102783_b10) 1998; 20
Cecka (10.1016/j.parco.2021.102783_b13) 2017
Greengard (10.1016/j.parco.2021.102783_b11) 1987
10.1016/j.parco.2021.102783_b14
Rao (10.1016/j.parco.2021.102783_b2) 2010
Duhamel (10.1016/j.parco.2021.102783_b17) 1984; 20
Dutt (10.1016/j.parco.2021.102783_b18) 1996; 33
Demmel (10.1016/j.parco.2021.102783_b6) 2013
(10.1016/j.parco.2021.102783_b9) 2017
Dutt (10.1016/j.parco.2021.102783_b16) 1993; 14
Swarztrauber (10.1016/j.parco.2021.102783_b7) 1984; 1
Cipra (10.1016/j.parco.2021.102783_b3) 2000; 33
Langston (10.1016/j.parco.2021.102783_b12) 2013
References_xml – year: 1987
  ident: b11
  article-title: The Rapid Evaluation of Potential Fields in Particle Systems
– start-page: 1381
  year: 1998
  end-page: 1384
  ident: b8
  article-title: FFTW: an adaptive software architecture for the FFT
  publication-title: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP ’98 (Cat. No.98CH36181), Vol. 3
– volume: 33
  year: 2000
  ident: b3
  article-title: The best of the 20th century: Editors name top 10 algorithms
  publication-title: SIAM News
– volume: 20
  start-page: 1094
  year: 1998
  end-page: 1114
  ident: b10
  article-title: The future fast fourier transform?
  publication-title: SIAM J. Sci. Comput.
– volume: 1
  start-page: 45
  year: 1984
  end-page: 63
  ident: b7
  article-title: FFT algorithms for vector computers
  publication-title: Parallel Comput.
– start-page: 1
  year: 2013
  end-page: 6
  ident: b12
  article-title: Re-introduction of communication-avoiding FMM-accelerated FFTs with GPU acceleration
  publication-title: 2013 IEEE High Performance Extreme Computing Conference (HPEC)
– volume: 33
  start-page: 1689
  year: 1996
  end-page: 1711
  ident: b18
  article-title: Fast algorithms for polynomial interpolation, integration, and differentiation
  publication-title: SIAM J. Numer. Anal.
– volume: 19
  start-page: 297
  year: 1965
  end-page: 301
  ident: b1
  article-title: An algorithm for the machine calculation of complex fourier series
  publication-title: Math. Comp.
– volume: 20
  start-page: 14
  year: 1984
  end-page: 16
  ident: b17
  article-title: ‘Split radix’ FFT algorithm
  publication-title: Electron. Lett.
– start-page: 54:1
  year: 2017
  end-page: 54:11
  ident: b13
  article-title: Low communication FMM-accelerated FFT on GPUs
  publication-title: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017, Denver, CO, USA, November 12 - 17, 2017
– year: 2010
  ident: b2
  article-title: Fast Fourier Transform - Algorithms and Applications
– year: 2017
  ident: b9
  article-title: Intel MKL DFTI library
– volume: 62
  start-page: 305
  year: 1994
  end-page: 324
  ident: b5
  article-title: Discrete weighted transforms and large-integer arithmetic
  publication-title: Math. Comp.
– reference: T. Sterling, D. J. Becker, D. Savarese, J. E. Dorband, U. A. Ranawake, C. V. Packer, BEOWULF: A parallel workstation for scientific computation, 1, 1970.
– year: 1992
  ident: b15
  article-title: Computational Frameworks for the Fast Fourier Transform
– volume: 14
  start-page: 1368
  year: 1993
  end-page: 1393
  ident: b16
  article-title: Fast fourier transforms for nonequispaced data
  publication-title: SIAM J. Sci. Comput.
– year: 2016
  ident: b4
  article-title: Imaging black holes with sparse modeling
  publication-title: Journal of Physics: Conference Series, Vol. 699
– year: 2013
  ident: b6
  article-title: Communication-avoiding algorithms for linear algebra and beyond
  publication-title: Proceedings of 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, Boston, MA, USA
– year: 2016
  ident: 10.1016/j.parco.2021.102783_b4
  article-title: Imaging black holes with sparse modeling
– volume: 33
  start-page: 1689
  issue: 5
  year: 1996
  ident: 10.1016/j.parco.2021.102783_b18
  article-title: Fast algorithms for polynomial interpolation, integration, and differentiation
  publication-title: SIAM J. Numer. Anal.
  doi: 10.1137/0733082
– start-page: 54:1
  year: 2017
  ident: 10.1016/j.parco.2021.102783_b13
  article-title: Low communication FMM-accelerated FFT on GPUs
– year: 1987
  ident: 10.1016/j.parco.2021.102783_b11
– volume: 62
  start-page: 305
  issue: 205
  year: 1994
  ident: 10.1016/j.parco.2021.102783_b5
  article-title: Discrete weighted transforms and large-integer arithmetic
  publication-title: Math. Comp.
  doi: 10.1090/S0025-5718-1994-1185244-1
– volume: 14
  start-page: 1368
  issue: 6
  year: 1993
  ident: 10.1016/j.parco.2021.102783_b16
  article-title: Fast fourier transforms for nonequispaced data
  publication-title: SIAM J. Sci. Comput.
  doi: 10.1137/0914081
– volume: 20
  start-page: 14
  issue: 1
  year: 1984
  ident: 10.1016/j.parco.2021.102783_b17
  article-title: ‘Split radix’ FFT algorithm
  publication-title: Electron. Lett.
  doi: 10.1049/el:19840012
– year: 1992
  ident: 10.1016/j.parco.2021.102783_b15
– volume: 19
  start-page: 297
  issue: 90
  year: 1965
  ident: 10.1016/j.parco.2021.102783_b1
  article-title: An algorithm for the machine calculation of complex fourier series
  publication-title: Math. Comp.
  doi: 10.1090/S0025-5718-1965-0178586-1
– ident: 10.1016/j.parco.2021.102783_b14
– volume: 1
  start-page: 45
  issue: 1
  year: 1984
  ident: 10.1016/j.parco.2021.102783_b7
  article-title: FFT algorithms for vector computers
  publication-title: Parallel Comput.
  doi: 10.1016/S0167-8191(84)90413-7
– year: 2013
  ident: 10.1016/j.parco.2021.102783_b6
  article-title: Communication-avoiding algorithms for linear algebra and beyond
– start-page: 1381
  year: 1998
  ident: 10.1016/j.parco.2021.102783_b8
  article-title: FFTW: an adaptive software architecture for the FFT
– start-page: 1
  year: 2013
  ident: 10.1016/j.parco.2021.102783_b12
  article-title: Re-introduction of communication-avoiding FMM-accelerated FFTs with GPU acceleration
– year: 2010
  ident: 10.1016/j.parco.2021.102783_b2
– volume: 33
  year: 2000
  ident: 10.1016/j.parco.2021.102783_b3
  article-title: The best of the 20th century: Editors name top 10 algorithms
  publication-title: SIAM News
– year: 2017
  ident: 10.1016/j.parco.2021.102783_b9
– volume: 20
  start-page: 1094
  issue: 3
  year: 1998
  ident: 10.1016/j.parco.2021.102783_b10
  article-title: The future fast fourier transform?
  publication-title: SIAM J. Sci. Comput.
  doi: 10.1137/S1064827597316266
SSID ssj0006480
Score 2.2827442
Snippet With increasing sizes of distributed systems, there comes an increased risk of communication bottlenecks. In the past decade there has been a growing interest...
SourceID crossref
elsevier
SourceType Index Database
Publisher
StartPage 102783
SubjectTerms Beowulf cluster
Communication avoiding algorithms
Fast Fourier Transform
Fast Multipole Method
High performance computing
Parallel programming
Title Parallel Fast Multipole Method accelerated FFT on HPC clusters
URI https://dx.doi.org/10.1016/j.parco.2021.102783
Volume 104-105
WOSCitedRecordID wos000654719400003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1872-7336
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0006480
  issn: 0167-8191
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3fi9QwEA6y54Mv_hbPU8mDb7XSpmmTvgjrcst54rHgKvtWsmlK7zx7S9s97s93Jmnj4oqo4EsphTRlZjKZTma-j5BXmUjKVCs8GszXIWfChPk6NqHkaQyLzzDNK0s2Ic7O5GqVL4YWgs7SCYimkTc3-ea_qhqegbKxdfYv1O1fCg_gHpQOV1A7XP9I8QvVIj_KZTBXXR_YBtsNlhB-tFzRgdIadhoEiCiD-XyJhwUni1mgL7cImdDtBqv-TdpSP4ybnIXrrV3UOatVrXyzzwf4HEsSHEy_Kcwz-lzzqekdpfWX865ut77WBsZvy3o4y39Xq-6rancTESz2RatDdmyvQ8YlLMER40-h22-ck5UCovrEAZ94L4z4ibYDe9-ruwTDxZsNGAQ2bLIYESeEY8D5CS77E86IEzI8ueAIb3vARJrLCTmYvj9enfp9OuOWV89_4YhJZav_9qb6ddyyE4ss75O7w08EnTrlPyC3TPOQ3BsJOujgrx-Rt6MGKdoC9bZAnS3QHVugYAv0qqFgC3S0hcfk8_x4OTsJB8KMULM06cN1JBVXpSyFyPN1xiPDUw0RZsrXMRNMKZ5ECtGdEKMnKaMSVrDhXPFKl7xkVfKETJqrxjwlNEl4lVSKVYYpbjIpOaskM4bFCoax6JC8HsVRbBwuSjEWDF4UVnoFSq9w0jsk2SiyYgjtXMhWgI5_N_DZvw48Ind-GOlzMunbrXlBbuvr_rxrXw628B07-3Ds
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Parallel+Fast+Multipole+Method+accelerated+FFT+on+HPC+clusters&rft.jtitle=Parallel+computing&rft.au=Mehta%2C+Chahak&rft.au=Karthi%2C+Amarnath&rft.au=Jetly%2C+Vishrut&rft.au=Chaudhury%2C+Bhaskar&rft.date=2021-07-01&rft.pub=Elsevier+B.V&rft.issn=0167-8191&rft.eissn=1872-7336&rft.volume=104-105&rft_id=info:doi/10.1016%2Fj.parco.2021.102783&rft.externalDocID=S0167819121000405
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0167-8191&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0167-8191&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0167-8191&client=summon