Computation–communication overlap and parameter auto-tuning for scalable parallel 3-D FFT

•We design a new method of parallel 3-D FFT based on 2-D decomposition of an input 3-D array.•We optimize the performance through computation–communication overlap and parameter auto-tuning.•Experimental results from two supercomputers confirm that our method is faster than three existing libraries....

Full description

Saved in:
Bibliographic Details
Published in:Journal of computational science Vol. 14; no. C; pp. 38 - 50
Main Authors: Song, Sukhyun, Hollingsworth, Jeffrey K.
Format: Journal Article
Language:English
Published: Netherlands Elsevier B.V 01.05.2016
Elsevier
Subjects:
ISSN:1877-7503, 1877-7511
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract •We design a new method of parallel 3-D FFT based on 2-D decomposition of an input 3-D array.•We optimize the performance through computation–communication overlap and parameter auto-tuning.•Experimental results from two supercomputers confirm that our method is faster than three existing libraries. Parallel 3-D FFT is widely used in scientific applications, therefore it is important to achieve high performance on large-scale systems with many thousands of computing cores. This paper describes a new method for scalable high-performance parallel 3-D FFT. We use a 2-D decomposition of 3-D arrays to increase scaling to a large number of cores. In order to achieve high performance, we use non-blocking MPI all-to-all operations and exploit computation-communication overlap. We also auto-tune our 3-D FFT code efficiently in a large parameter space and cope with the complex trade-off in optimizing our code in various system environments. According to experimental results from two systems, our method computes parallel 3-D FFT significantly faster than three existing libraries, and scales well to at least 32,768 compute cores.
AbstractList •We design a new method of parallel 3-D FFT based on 2-D decomposition of an input 3-D array.•We optimize the performance through computation–communication overlap and parameter auto-tuning.•Experimental results from two supercomputers confirm that our method is faster than three existing libraries. Parallel 3-D FFT is widely used in scientific applications, therefore it is important to achieve high performance on large-scale systems with many thousands of computing cores. This paper describes a new method for scalable high-performance parallel 3-D FFT. We use a 2-D decomposition of 3-D arrays to increase scaling to a large number of cores. In order to achieve high performance, we use non-blocking MPI all-to-all operations and exploit computation-communication overlap. We also auto-tune our 3-D FFT code efficiently in a large parameter space and cope with the complex trade-off in optimizing our code in various system environments. According to experimental results from two systems, our method computes parallel 3-D FFT significantly faster than three existing libraries, and scales well to at least 32,768 compute cores.
Author Hollingsworth, Jeffrey K.
Song, Sukhyun
Author_xml – sequence: 1
  givenname: Sukhyun
  surname: Song
  fullname: Song, Sukhyun
  email: shsong@cs.umd.edu
– sequence: 2
  givenname: Jeffrey K.
  surname: Hollingsworth
  fullname: Hollingsworth, Jeffrey K.
  email: hollings@cs.umd.edu
BackLink https://www.osti.gov/biblio/1374664$$D View this record in Osti.gov
BookMark eNp9kM9KxDAQh4Mo-PcFPBXvrZmmbbrgRVZXBcGLnjyEdDrVLG2yJFnBm-_gG_oktrviwYO5TAZ-3zDzHbJd6ywxdgo8Aw7V-TJbOgxZzqHMIM84hx12ALWUqSwBdn__XOyzkxCWfHyirmcgDtjz3A2rddTROPv18YluGNbW4KZP3Bv5Xq8Sbdtkpb0eKJJP9Dq6NI4p-5J0zicBda-bnjaRvqc-EelVslg8HrO9TveBTn7qEXtaXD_Ob9P7h5u7-eV9ioWQMYW8IV1iwRElzsqSsNMyp5wXLbSV1LJtq7yttKgb7ESDMINSFx1UWJAsm1ocsbPtXBeiUQFNJHxFZy1hVCBkUVXFGMq3IfQuBE-dWnkzaP-ugKtJo1qqSaOaNCrI1ahxhOo_0Dh84yZ6bfr_0YstSuPlb4b8tBhZpNb4aa_Wmf_wbxOWkis
CitedBy_id crossref_primary_10_1109_JPROC_2018_2870284
crossref_primary_10_1080_10618562_2021_1971202
crossref_primary_10_1016_j_jocs_2023_101945
crossref_primary_10_1109_ACCESS_2018_2878271
crossref_primary_10_1109_JPROC_2018_2841200
crossref_primary_10_1016_j_jocs_2016_04_014
Cites_doi 10.1016/j.parco.2012.12.002
10.1093/comjnl/7.4.308
10.1137/11082748X
10.1016/j.cpc.2006.12.006
10.1109/JPROC.2004.840301
ContentType Journal Article
Copyright 2015 Elsevier B.V.
Copyright_xml – notice: 2015 Elsevier B.V.
DBID AAYXX
CITATION
OTOTI
DOI 10.1016/j.jocs.2015.12.001
DatabaseName CrossRef
OSTI.GOV
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Sciences (General)
Business
EISSN 1877-7511
EndPage 50
ExternalDocumentID 1374664
10_1016_j_jocs_2015_12_001
S187775031530048X
GroupedDBID --K
--M
.~1
0R~
1B1
1~.
1~5
4.4
457
4G.
5VS
7-5
71M
8P~
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAXUO
AAYFN
ABBOA
ABFRF
ABJNI
ABMAC
ABXDB
ABYKQ
ACDAQ
ACGFO
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADMUD
AEBSH
AEFWE
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
AXJTR
BKOJK
BLXMC
EBS
EFJIC
EFLBG
EJD
EP3
FDB
FEDTE
FIRID
FNPLU
FYGXN
GBLVA
GBOLZ
HVGLF
HZ~
J1W
KOM
M41
MO0
N9A
O-L
O9-
OAUVE
P-8
P-9
P2P
PC.
Q38
RIG
ROL
SDF
SES
SPC
SPCBC
SSV
SSZ
T5K
UNMZH
~G-
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
AALMO
ABPIF
ABQIS
OTOTI
ID FETCH-LOGICAL-c437t-12bea5c40cc7c955ecfa72e204d1d67a7dd62d6a38bcf3bc1915a4f16c4e75b83
ISICitedReferencesCount 8
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000379560000005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1877-7503
IngestDate Thu May 18 22:40:36 EDT 2023
Sat Nov 29 06:58:59 EST 2025
Tue Nov 18 22:48:38 EST 2025
Fri Feb 23 02:31:14 EST 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue C
Keywords Non-blocking collective
3-D FFT
Computation–communication overlap
Auto-tuning
MPI
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c437t-12bea5c40cc7c955ecfa72e204d1d67a7dd62d6a38bcf3bc1915a4f16c4e75b83
Notes USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
ER25763; ER26054; AC02-05CH11231
OpenAccessLink https://www.osti.gov/biblio/1374664
PageCount 13
ParticipantIDs osti_scitechconnect_1374664
crossref_primary_10_1016_j_jocs_2015_12_001
crossref_citationtrail_10_1016_j_jocs_2015_12_001
elsevier_sciencedirect_doi_10_1016_j_jocs_2015_12_001
PublicationCentury 2000
PublicationDate May 2016
2016-05-00
2016-05-01
PublicationDateYYYYMMDD 2016-05-01
PublicationDate_xml – month: 05
  year: 2016
  text: May 2016
PublicationDecade 2010
PublicationPlace Netherlands
PublicationPlace_xml – name: Netherlands
PublicationTitle Journal of computational science
PublicationYear 2016
Publisher Elsevier B.V
Elsevier
Publisher_xml – name: Elsevier B.V
– name: Elsevier
References Lee, Malaya, Moser (bib0010) 2013
Kandalla, Subramoni, Tomko, Pekurovsky, Sur, Panda (bib0115) 2011; 26
Kim, Dally, Scott, Abts (bib0090) 2008
Pekurovsky (bib0020) 2012; 34
Li, Laizet (bib0025) 2010
Ţăpuş, Chung, Hollingsworth (bib0070) 2002
M.P.I. Forum, Mpi: A Message-Passing Interface Standard Version 3.0.
Nelder, Mead (bib0080) 1965; 7
Nishtala, Hargrove, Bonachea, Yelick (bib0040) 2009
Frigo, Johnson (bib0045) 2005; 93
.
Ayala, Wang (bib0030) 2013; 39
Hoefler, Lumsdaine (bib0075) 2008
Eleftheriou, Fitch, Rayshubskiy, Ward, Germain (bib0100) 2005; 49
Faanes, Bataineh, Roweth, Court, Froese, Alverson, Johnson, Kopnick, Higgins, Reinhard (bib0095) 2012
Brachos (bib0085) 2011
Bell, Bonachea, Nishtala, Yelick (bib0065) 2006
Ishiyama, Nitadori, Makino (bib0005) 2012
Song, Hollingsworth (bib0015) 2014
Hoefler, Gottschling, Lumsdaine (bib0050) 2008
Takahashi (bib0035) 2010
Song, Hollingsworth (bib0055) 2014
Doi, Negishi (bib0105) 2010
Fang, Deng, Martyna (bib0110) 2007; 176
Brachos (10.1016/j.jocs.2015.12.001_bib0085) 2011
10.1016/j.jocs.2015.12.001_bib0060
Ayala (10.1016/j.jocs.2015.12.001_bib0030) 2013; 39
Takahashi (10.1016/j.jocs.2015.12.001_bib0035) 2010
Pekurovsky (10.1016/j.jocs.2015.12.001_bib0020) 2012; 34
Ţăpuş (10.1016/j.jocs.2015.12.001_bib0070) 2002
Doi (10.1016/j.jocs.2015.12.001_bib0105) 2010
Frigo (10.1016/j.jocs.2015.12.001_bib0045) 2005; 93
Hoefler (10.1016/j.jocs.2015.12.001_bib0075) 2008
Bell (10.1016/j.jocs.2015.12.001_bib0065) 2006
Ishiyama (10.1016/j.jocs.2015.12.001_bib0005) 2012
Hoefler (10.1016/j.jocs.2015.12.001_bib0050) 2008
Nelder (10.1016/j.jocs.2015.12.001_bib0080) 1965; 7
Faanes (10.1016/j.jocs.2015.12.001_bib0095) 2012
Kandalla (10.1016/j.jocs.2015.12.001_bib0115) 2011; 26
Li (10.1016/j.jocs.2015.12.001_bib0025) 2010
Kim (10.1016/j.jocs.2015.12.001_bib0090) 2008
Lee (10.1016/j.jocs.2015.12.001_bib0010) 2013
Song (10.1016/j.jocs.2015.12.001_bib0015) 2014
Eleftheriou (10.1016/j.jocs.2015.12.001_bib0100) 2005; 49
Song (10.1016/j.jocs.2015.12.001_bib0055) 2014
Fang (10.1016/j.jocs.2015.12.001_bib0110) 2007; 176
Nishtala (10.1016/j.jocs.2015.12.001_bib0040) 2009
References_xml – year: 2008
  ident: bib0050
  article-title: Brief announcement: leveraging non-blocking collective communication in high-performance applications
  publication-title: Proceedings of the 20th Annual Symposium on Parallelism in Algorithms and Architectures (SPAA)
– year: 2011
  ident: bib0085
  article-title: Parallel FFT Libraries (Master's thesis)
– year: 2012
  ident: bib0005
  article-title: 4.45 Pflops astrophysical N-body simulation on K computer: the gravitational trillion-body problem
  publication-title: Proceedings of the 2012 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC)
– volume: 49
  year: 2005
  ident: bib0100
  article-title: Scalable framework for 3d FFTs on the Blue Gene/L supercomputer: implementation and early performance measurements
  publication-title: IBM J. Res. Dev.
– year: 2006
  ident: bib0065
  article-title: Optimizing bandwidth limited problems using one-sided communication and overlap
  publication-title: Proceedings of the 20th International Parallel & Distributed Processing Symposium (IPDPS)
– volume: 26
  year: 2011
  ident: bib0115
  article-title: High-performance and scalable non-blocking all-to-all with collective offload on Infiniband clusters: a study with parallel 3d FFT
  publication-title: Comput. Sci.
– year: 2012
  ident: bib0095
  article-title: Cray cascade: a scalable HPC system based on a dragonfly network
  publication-title: Proceedings of the 2012 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC)
– year: 2010
  ident: bib0025
  article-title: 2DECOMP & FFT-a highly scalable 2d decomposition library and FFT interface
  publication-title: Cray User Group 2010 Conference
– reference: .
– year: 2008
  ident: bib0075
  article-title: Message progression in parallel computing – to thread or not to thread?
  publication-title: Proceedings of the 2008 IEEE International Conference on Cluster Computing (CLUSTER)
– volume: 34
  year: 2012
  ident: bib0020
  article-title: P3DFFT: a framework for parallel computations of Fourier transforms in three dimensions
  publication-title: SIAM J. Sci. Comput.
– start-page: 1
  year: 2014
  end-page: 8
  ident: bib0015
  article-title: Scaling parallel 3-d FFT with non-blocking MPI collectives
  publication-title: Proceedings of the 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA)
– volume: 7
  year: 1965
  ident: bib0080
  article-title: A simplex method for function minimization
  publication-title: Comput. J.
– volume: 39
  year: 2013
  ident: bib0030
  article-title: Parallel implementation and scalability analysis of 3d fast Fourier transform using 2d domain decomposition
  publication-title: Parallel Comput.
– year: 2002
  ident: bib0070
  article-title: Active harmony: towards automated performance tuning
  publication-title: Proceedings of the 2002 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC)
– year: 2010
  ident: bib0105
  article-title: Overlapping methods of all-to-all communication and FFT algorithms for torus-connected massively parallel supercomputers
  publication-title: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC)
– volume: 176
  year: 2007
  ident: bib0110
  article-title: Performance of the 3d FFT on the 6d network torus QCDOC parallel supercomputer
  publication-title: Comput. Phys. Commun.
– reference: M.P.I. Forum, Mpi: A Message-Passing Interface Standard Version 3.0.
– year: 2013
  ident: bib0010
  article-title: Petascale direct numerical simulation of turbulent channel flow on up to 786k cores
  publication-title: Proceedings of the 2013 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC)
– start-page: 1
  year: 2009
  end-page: 12
  ident: bib0040
  article-title: Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap
  publication-title: Proceedings of the 23rd International Parallel & Distributed Processing Symposium (IPDPS)
– year: 2014
  ident: bib0055
  article-title: Designing and auto-tuning parallel 3-d FFT with computation–communication overlap
  publication-title: Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP)
– volume: 93
  year: 2005
  ident: bib0045
  article-title: The design and implementation of FFTW3
  publication-title: Proc. IEEE
– year: 2010
  ident: bib0035
  article-title: An implementation of parallel 3-d FFT with 2-d decomposition on a massively parallel cluster of multi-core processors
  publication-title: Parallel Processing and Applied Mathematics, vol. 6067 of Lecture Notes in Computer Science
– year: 2008
  ident: bib0090
  article-title: Technology-driven, highly-scalable dragonfly topology
  publication-title: Proceedings of the 35th International Symposium on Computer Architecture (ISCA)
– volume: 39
  issue: 1
  year: 2013
  ident: 10.1016/j.jocs.2015.12.001_bib0030
  article-title: Parallel implementation and scalability analysis of 3d fast Fourier transform using 2d domain decomposition
  publication-title: Parallel Comput.
  doi: 10.1016/j.parco.2012.12.002
– ident: 10.1016/j.jocs.2015.12.001_bib0060
– year: 2010
  ident: 10.1016/j.jocs.2015.12.001_bib0105
  article-title: Overlapping methods of all-to-all communication and FFT algorithms for torus-connected massively parallel supercomputers
– year: 2011
  ident: 10.1016/j.jocs.2015.12.001_bib0085
– volume: 26
  issue: 3–4
  year: 2011
  ident: 10.1016/j.jocs.2015.12.001_bib0115
  article-title: High-performance and scalable non-blocking all-to-all with collective offload on Infiniband clusters: a study with parallel 3d FFT
  publication-title: Comput. Sci.
– year: 2012
  ident: 10.1016/j.jocs.2015.12.001_bib0095
  article-title: Cray cascade: a scalable HPC system based on a dragonfly network
– year: 2012
  ident: 10.1016/j.jocs.2015.12.001_bib0005
  article-title: 4.45 Pflops astrophysical N-body simulation on K computer: the gravitational trillion-body problem
– year: 2014
  ident: 10.1016/j.jocs.2015.12.001_bib0055
  article-title: Designing and auto-tuning parallel 3-d FFT with computation–communication overlap
– year: 2002
  ident: 10.1016/j.jocs.2015.12.001_bib0070
  article-title: Active harmony: towards automated performance tuning
– volume: 7
  issue: 4
  year: 1965
  ident: 10.1016/j.jocs.2015.12.001_bib0080
  article-title: A simplex method for function minimization
  publication-title: Comput. J.
  doi: 10.1093/comjnl/7.4.308
– volume: 49
  issue: 2
  year: 2005
  ident: 10.1016/j.jocs.2015.12.001_bib0100
  article-title: Scalable framework for 3d FFTs on the Blue Gene/L supercomputer: implementation and early performance measurements
  publication-title: IBM J. Res. Dev.
– year: 2013
  ident: 10.1016/j.jocs.2015.12.001_bib0010
  article-title: Petascale direct numerical simulation of turbulent channel flow on up to 786k cores
– year: 2008
  ident: 10.1016/j.jocs.2015.12.001_bib0050
  article-title: Brief announcement: leveraging non-blocking collective communication in high-performance applications
– volume: 34
  issue: 4
  year: 2012
  ident: 10.1016/j.jocs.2015.12.001_bib0020
  article-title: P3DFFT: a framework for parallel computations of Fourier transforms in three dimensions
  publication-title: SIAM J. Sci. Comput.
  doi: 10.1137/11082748X
– year: 2006
  ident: 10.1016/j.jocs.2015.12.001_bib0065
  article-title: Optimizing bandwidth limited problems using one-sided communication and overlap
– volume: 176
  issue: 8
  year: 2007
  ident: 10.1016/j.jocs.2015.12.001_bib0110
  article-title: Performance of the 3d FFT on the 6d network torus QCDOC parallel supercomputer
  publication-title: Comput. Phys. Commun.
  doi: 10.1016/j.cpc.2006.12.006
– year: 2010
  ident: 10.1016/j.jocs.2015.12.001_bib0025
  article-title: 2DECOMP & FFT-a highly scalable 2d decomposition library and FFT interface
– start-page: 1
  year: 2014
  ident: 10.1016/j.jocs.2015.12.001_bib0015
  article-title: Scaling parallel 3-d FFT with non-blocking MPI collectives
– year: 2010
  ident: 10.1016/j.jocs.2015.12.001_bib0035
  article-title: An implementation of parallel 3-d FFT with 2-d decomposition on a massively parallel cluster of multi-core processors
– volume: 93
  issue: 2
  year: 2005
  ident: 10.1016/j.jocs.2015.12.001_bib0045
  article-title: The design and implementation of FFTW3
  publication-title: Proc. IEEE
  doi: 10.1109/JPROC.2004.840301
– year: 2008
  ident: 10.1016/j.jocs.2015.12.001_bib0075
  article-title: Message progression in parallel computing – to thread or not to thread?
– year: 2008
  ident: 10.1016/j.jocs.2015.12.001_bib0090
  article-title: Technology-driven, highly-scalable dragonfly topology
– start-page: 1
  year: 2009
  ident: 10.1016/j.jocs.2015.12.001_bib0040
  article-title: Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap
SSID ssj0000388913
Score 2.0852106
Snippet •We design a new method of parallel 3-D FFT based on 2-D decomposition of an input 3-D array.•We optimize the performance through computation–communication...
SourceID osti
crossref
elsevier
SourceType Open Access Repository
Enrichment Source
Index Database
Publisher
StartPage 38
SubjectTerms 3-D FFT
Auto-tuning
Computation–communication overlap
MPI
Non-blocking collective
Title Computation–communication overlap and parameter auto-tuning for scalable parallel 3-D FFT
URI https://dx.doi.org/10.1016/j.jocs.2015.12.001
https://www.osti.gov/biblio/1374664
Volume 14
WOSCitedRecordID wos000379560000005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1877-7511
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000388913
  issn: 1877-7503
  databaseCode: AIEXJ
  dateStart: 20100501
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3LbtQwFLVgihAbRAuIUkBesABFrsaxHTvLClqVhyqkFjQSi8h2HJUhyow6CSo7_oE_5EuwY-cxrVrRBZtoFMVOZs6Zmxvn3HMBeKl1LpOUacSlLBCNc46UVAJRSrlipMCmbd_25SM_OhKzWfopvMFfte0EeFWJ8_N0-V-htvss2K509gZw95PaHfazBd1uLex2-0_A-z4NXrQRlAxEj8tAIqfaLOXSewRIp85yTomyqReobqpOWrmy6LV1Ve6QsjRlRNDb6CAsOF3OZ_Vw3q7QcmDNcVD-HjffT382PSEPvSX4IFQMhWXRh93xcgROBvFfiKCCc-Rejq6FWDqKkd7NJdxtvevspTjulxTmu_OFdp7qmLVrtuE8a6bZF25mvcSwU6_NMzdH5ubIcOwkfLfBRuxc_CdgY-_d_ux9vyTnjHHStqF2_yVCmZVXBF68mKtSmcnCRudRlnLyANwPcMA9T4tNcMtUW-BuV92wBTZDLF_BV8Fw_PVD8HVEmT-_fq-RBQayQEsW2JMFjsgCLVlgRxbYkQVaskBLlkfg88H-yZtDFJpuIE0JrxGOlZFM06nWXKeMGV1IHpt4SnOcJ1zyPE_iPJFEKF0Qpe3zPpO0wImmhjMlyGMwqRaVeQIgUZgyURj7DG7z1nSqcKqkSGVupMgVmW4D3P1-mQ6O9K4xSpldDd42iPoxS-_Hcu3RrIMlC7z3mWJmiXbtuB2HoRvjrJS105zZQZhw143h6Y2uYQfcG_4nz8CkPmvMc3BH_6i_rc5eBA7-BVOCo78
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Computation%E2%80%93communication+overlap+and+parameter+auto-tuning+for+scalable+parallel+3-D+FFT&rft.jtitle=Journal+of+computational+science&rft.au=Song%2C+Sukhyun&rft.au=Hollingsworth%2C+Jeffrey+K.&rft.date=2016-05-01&rft.issn=1877-7503&rft.volume=14&rft.spage=38&rft.epage=50&rft_id=info:doi/10.1016%2Fj.jocs.2015.12.001&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_jocs_2015_12_001
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1877-7503&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1877-7503&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1877-7503&client=summon