Computation–communication overlap and parameter auto-tuning for scalable parallel 3-D FFT
•We design a new method of parallel 3-D FFT based on 2-D decomposition of an input 3-D array.•We optimize the performance through computation–communication overlap and parameter auto-tuning.•Experimental results from two supercomputers confirm that our method is faster than three existing libraries....
Saved in:
| Published in: | Journal of computational science Vol. 14; no. C; pp. 38 - 50 |
|---|---|
| Main Authors: | , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Netherlands
Elsevier B.V
01.05.2016
Elsevier |
| Subjects: | |
| ISSN: | 1877-7503, 1877-7511 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | •We design a new method of parallel 3-D FFT based on 2-D decomposition of an input 3-D array.•We optimize the performance through computation–communication overlap and parameter auto-tuning.•Experimental results from two supercomputers confirm that our method is faster than three existing libraries.
Parallel 3-D FFT is widely used in scientific applications, therefore it is important to achieve high performance on large-scale systems with many thousands of computing cores. This paper describes a new method for scalable high-performance parallel 3-D FFT. We use a 2-D decomposition of 3-D arrays to increase scaling to a large number of cores. In order to achieve high performance, we use non-blocking MPI all-to-all operations and exploit computation-communication overlap. We also auto-tune our 3-D FFT code efficiently in a large parameter space and cope with the complex trade-off in optimizing our code in various system environments. According to experimental results from two systems, our method computes parallel 3-D FFT significantly faster than three existing libraries, and scales well to at least 32,768 compute cores. |
|---|---|
| AbstractList | •We design a new method of parallel 3-D FFT based on 2-D decomposition of an input 3-D array.•We optimize the performance through computation–communication overlap and parameter auto-tuning.•Experimental results from two supercomputers confirm that our method is faster than three existing libraries.
Parallel 3-D FFT is widely used in scientific applications, therefore it is important to achieve high performance on large-scale systems with many thousands of computing cores. This paper describes a new method for scalable high-performance parallel 3-D FFT. We use a 2-D decomposition of 3-D arrays to increase scaling to a large number of cores. In order to achieve high performance, we use non-blocking MPI all-to-all operations and exploit computation-communication overlap. We also auto-tune our 3-D FFT code efficiently in a large parameter space and cope with the complex trade-off in optimizing our code in various system environments. According to experimental results from two systems, our method computes parallel 3-D FFT significantly faster than three existing libraries, and scales well to at least 32,768 compute cores. |
| Author | Hollingsworth, Jeffrey K. Song, Sukhyun |
| Author_xml | – sequence: 1 givenname: Sukhyun surname: Song fullname: Song, Sukhyun email: shsong@cs.umd.edu – sequence: 2 givenname: Jeffrey K. surname: Hollingsworth fullname: Hollingsworth, Jeffrey K. email: hollings@cs.umd.edu |
| BackLink | https://www.osti.gov/biblio/1374664$$D View this record in Osti.gov |
| BookMark | eNp9kM9KxDAQh4Mo-PcFPBXvrZmmbbrgRVZXBcGLnjyEdDrVLG2yJFnBm-_gG_oktrviwYO5TAZ-3zDzHbJd6ywxdgo8Aw7V-TJbOgxZzqHMIM84hx12ALWUqSwBdn__XOyzkxCWfHyirmcgDtjz3A2rddTROPv18YluGNbW4KZP3Bv5Xq8Sbdtkpb0eKJJP9Dq6NI4p-5J0zicBda-bnjaRvqc-EelVslg8HrO9TveBTn7qEXtaXD_Ob9P7h5u7-eV9ioWQMYW8IV1iwRElzsqSsNMyp5wXLbSV1LJtq7yttKgb7ESDMINSFx1UWJAsm1ocsbPtXBeiUQFNJHxFZy1hVCBkUVXFGMq3IfQuBE-dWnkzaP-ugKtJo1qqSaOaNCrI1ahxhOo_0Dh84yZ6bfr_0YstSuPlb4b8tBhZpNb4aa_Wmf_wbxOWkis |
| CitedBy_id | crossref_primary_10_1109_JPROC_2018_2870284 crossref_primary_10_1080_10618562_2021_1971202 crossref_primary_10_1016_j_jocs_2023_101945 crossref_primary_10_1109_ACCESS_2018_2878271 crossref_primary_10_1109_JPROC_2018_2841200 crossref_primary_10_1016_j_jocs_2016_04_014 |
| Cites_doi | 10.1016/j.parco.2012.12.002 10.1093/comjnl/7.4.308 10.1137/11082748X 10.1016/j.cpc.2006.12.006 10.1109/JPROC.2004.840301 |
| ContentType | Journal Article |
| Copyright | 2015 Elsevier B.V. |
| Copyright_xml | – notice: 2015 Elsevier B.V. |
| DBID | AAYXX CITATION OTOTI |
| DOI | 10.1016/j.jocs.2015.12.001 |
| DatabaseName | CrossRef OSTI.GOV |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Sciences (General) Business |
| EISSN | 1877-7511 |
| EndPage | 50 |
| ExternalDocumentID | 1374664 10_1016_j_jocs_2015_12_001 S187775031530048X |
| GroupedDBID | --K --M .~1 0R~ 1B1 1~. 1~5 4.4 457 4G. 5VS 7-5 71M 8P~ AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAXUO AAYFN ABBOA ABFRF ABJNI ABMAC ABXDB ABYKQ ACDAQ ACGFO ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADMUD AEBSH AEFWE AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD AXJTR BKOJK BLXMC EBS EFJIC EFLBG EJD EP3 FDB FEDTE FIRID FNPLU FYGXN GBLVA GBOLZ HVGLF HZ~ J1W KOM M41 MO0 N9A O-L O9- OAUVE P-8 P-9 P2P PC. Q38 RIG ROL SDF SES SPC SPCBC SSV SSZ T5K UNMZH ~G- 9DU AATTM AAXKI AAYWO AAYXX ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD AALMO ABPIF ABQIS OTOTI |
| ID | FETCH-LOGICAL-c437t-12bea5c40cc7c955ecfa72e204d1d67a7dd62d6a38bcf3bc1915a4f16c4e75b83 |
| ISICitedReferencesCount | 8 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000379560000005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1877-7503 |
| IngestDate | Thu May 18 22:40:36 EDT 2023 Sat Nov 29 06:58:59 EST 2025 Tue Nov 18 22:48:38 EST 2025 Fri Feb 23 02:31:14 EST 2024 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | C |
| Keywords | Non-blocking collective 3-D FFT Computation–communication overlap Auto-tuning MPI |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c437t-12bea5c40cc7c955ecfa72e204d1d67a7dd62d6a38bcf3bc1915a4f16c4e75b83 |
| Notes | USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) ER25763; ER26054; AC02-05CH11231 |
| OpenAccessLink | https://www.osti.gov/biblio/1374664 |
| PageCount | 13 |
| ParticipantIDs | osti_scitechconnect_1374664 crossref_primary_10_1016_j_jocs_2015_12_001 crossref_citationtrail_10_1016_j_jocs_2015_12_001 elsevier_sciencedirect_doi_10_1016_j_jocs_2015_12_001 |
| PublicationCentury | 2000 |
| PublicationDate | May 2016 2016-05-00 2016-05-01 |
| PublicationDateYYYYMMDD | 2016-05-01 |
| PublicationDate_xml | – month: 05 year: 2016 text: May 2016 |
| PublicationDecade | 2010 |
| PublicationPlace | Netherlands |
| PublicationPlace_xml | – name: Netherlands |
| PublicationTitle | Journal of computational science |
| PublicationYear | 2016 |
| Publisher | Elsevier B.V Elsevier |
| Publisher_xml | – name: Elsevier B.V – name: Elsevier |
| References | Lee, Malaya, Moser (bib0010) 2013 Kandalla, Subramoni, Tomko, Pekurovsky, Sur, Panda (bib0115) 2011; 26 Kim, Dally, Scott, Abts (bib0090) 2008 Pekurovsky (bib0020) 2012; 34 Li, Laizet (bib0025) 2010 Ţăpuş, Chung, Hollingsworth (bib0070) 2002 M.P.I. Forum, Mpi: A Message-Passing Interface Standard Version 3.0. Nelder, Mead (bib0080) 1965; 7 Nishtala, Hargrove, Bonachea, Yelick (bib0040) 2009 Frigo, Johnson (bib0045) 2005; 93 . Ayala, Wang (bib0030) 2013; 39 Hoefler, Lumsdaine (bib0075) 2008 Eleftheriou, Fitch, Rayshubskiy, Ward, Germain (bib0100) 2005; 49 Faanes, Bataineh, Roweth, Court, Froese, Alverson, Johnson, Kopnick, Higgins, Reinhard (bib0095) 2012 Brachos (bib0085) 2011 Bell, Bonachea, Nishtala, Yelick (bib0065) 2006 Ishiyama, Nitadori, Makino (bib0005) 2012 Song, Hollingsworth (bib0015) 2014 Hoefler, Gottschling, Lumsdaine (bib0050) 2008 Takahashi (bib0035) 2010 Song, Hollingsworth (bib0055) 2014 Doi, Negishi (bib0105) 2010 Fang, Deng, Martyna (bib0110) 2007; 176 Brachos (10.1016/j.jocs.2015.12.001_bib0085) 2011 10.1016/j.jocs.2015.12.001_bib0060 Ayala (10.1016/j.jocs.2015.12.001_bib0030) 2013; 39 Takahashi (10.1016/j.jocs.2015.12.001_bib0035) 2010 Pekurovsky (10.1016/j.jocs.2015.12.001_bib0020) 2012; 34 Ţăpuş (10.1016/j.jocs.2015.12.001_bib0070) 2002 Doi (10.1016/j.jocs.2015.12.001_bib0105) 2010 Frigo (10.1016/j.jocs.2015.12.001_bib0045) 2005; 93 Hoefler (10.1016/j.jocs.2015.12.001_bib0075) 2008 Bell (10.1016/j.jocs.2015.12.001_bib0065) 2006 Ishiyama (10.1016/j.jocs.2015.12.001_bib0005) 2012 Hoefler (10.1016/j.jocs.2015.12.001_bib0050) 2008 Nelder (10.1016/j.jocs.2015.12.001_bib0080) 1965; 7 Faanes (10.1016/j.jocs.2015.12.001_bib0095) 2012 Kandalla (10.1016/j.jocs.2015.12.001_bib0115) 2011; 26 Li (10.1016/j.jocs.2015.12.001_bib0025) 2010 Kim (10.1016/j.jocs.2015.12.001_bib0090) 2008 Lee (10.1016/j.jocs.2015.12.001_bib0010) 2013 Song (10.1016/j.jocs.2015.12.001_bib0015) 2014 Eleftheriou (10.1016/j.jocs.2015.12.001_bib0100) 2005; 49 Song (10.1016/j.jocs.2015.12.001_bib0055) 2014 Fang (10.1016/j.jocs.2015.12.001_bib0110) 2007; 176 Nishtala (10.1016/j.jocs.2015.12.001_bib0040) 2009 |
| References_xml | – year: 2008 ident: bib0050 article-title: Brief announcement: leveraging non-blocking collective communication in high-performance applications publication-title: Proceedings of the 20th Annual Symposium on Parallelism in Algorithms and Architectures (SPAA) – year: 2011 ident: bib0085 article-title: Parallel FFT Libraries (Master's thesis) – year: 2012 ident: bib0005 article-title: 4.45 Pflops astrophysical N-body simulation on K computer: the gravitational trillion-body problem publication-title: Proceedings of the 2012 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC) – volume: 49 year: 2005 ident: bib0100 article-title: Scalable framework for 3d FFTs on the Blue Gene/L supercomputer: implementation and early performance measurements publication-title: IBM J. Res. Dev. – year: 2006 ident: bib0065 article-title: Optimizing bandwidth limited problems using one-sided communication and overlap publication-title: Proceedings of the 20th International Parallel & Distributed Processing Symposium (IPDPS) – volume: 26 year: 2011 ident: bib0115 article-title: High-performance and scalable non-blocking all-to-all with collective offload on Infiniband clusters: a study with parallel 3d FFT publication-title: Comput. Sci. – year: 2012 ident: bib0095 article-title: Cray cascade: a scalable HPC system based on a dragonfly network publication-title: Proceedings of the 2012 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC) – year: 2010 ident: bib0025 article-title: 2DECOMP & FFT-a highly scalable 2d decomposition library and FFT interface publication-title: Cray User Group 2010 Conference – reference: . – year: 2008 ident: bib0075 article-title: Message progression in parallel computing – to thread or not to thread? publication-title: Proceedings of the 2008 IEEE International Conference on Cluster Computing (CLUSTER) – volume: 34 year: 2012 ident: bib0020 article-title: P3DFFT: a framework for parallel computations of Fourier transforms in three dimensions publication-title: SIAM J. Sci. Comput. – start-page: 1 year: 2014 end-page: 8 ident: bib0015 article-title: Scaling parallel 3-d FFT with non-blocking MPI collectives publication-title: Proceedings of the 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA) – volume: 7 year: 1965 ident: bib0080 article-title: A simplex method for function minimization publication-title: Comput. J. – volume: 39 year: 2013 ident: bib0030 article-title: Parallel implementation and scalability analysis of 3d fast Fourier transform using 2d domain decomposition publication-title: Parallel Comput. – year: 2002 ident: bib0070 article-title: Active harmony: towards automated performance tuning publication-title: Proceedings of the 2002 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC) – year: 2010 ident: bib0105 article-title: Overlapping methods of all-to-all communication and FFT algorithms for torus-connected massively parallel supercomputers publication-title: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC) – volume: 176 year: 2007 ident: bib0110 article-title: Performance of the 3d FFT on the 6d network torus QCDOC parallel supercomputer publication-title: Comput. Phys. Commun. – reference: M.P.I. Forum, Mpi: A Message-Passing Interface Standard Version 3.0. – year: 2013 ident: bib0010 article-title: Petascale direct numerical simulation of turbulent channel flow on up to 786k cores publication-title: Proceedings of the 2013 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC) – start-page: 1 year: 2009 end-page: 12 ident: bib0040 article-title: Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap publication-title: Proceedings of the 23rd International Parallel & Distributed Processing Symposium (IPDPS) – year: 2014 ident: bib0055 article-title: Designing and auto-tuning parallel 3-d FFT with computation–communication overlap publication-title: Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP) – volume: 93 year: 2005 ident: bib0045 article-title: The design and implementation of FFTW3 publication-title: Proc. IEEE – year: 2010 ident: bib0035 article-title: An implementation of parallel 3-d FFT with 2-d decomposition on a massively parallel cluster of multi-core processors publication-title: Parallel Processing and Applied Mathematics, vol. 6067 of Lecture Notes in Computer Science – year: 2008 ident: bib0090 article-title: Technology-driven, highly-scalable dragonfly topology publication-title: Proceedings of the 35th International Symposium on Computer Architecture (ISCA) – volume: 39 issue: 1 year: 2013 ident: 10.1016/j.jocs.2015.12.001_bib0030 article-title: Parallel implementation and scalability analysis of 3d fast Fourier transform using 2d domain decomposition publication-title: Parallel Comput. doi: 10.1016/j.parco.2012.12.002 – ident: 10.1016/j.jocs.2015.12.001_bib0060 – year: 2010 ident: 10.1016/j.jocs.2015.12.001_bib0105 article-title: Overlapping methods of all-to-all communication and FFT algorithms for torus-connected massively parallel supercomputers – year: 2011 ident: 10.1016/j.jocs.2015.12.001_bib0085 – volume: 26 issue: 3–4 year: 2011 ident: 10.1016/j.jocs.2015.12.001_bib0115 article-title: High-performance and scalable non-blocking all-to-all with collective offload on Infiniband clusters: a study with parallel 3d FFT publication-title: Comput. Sci. – year: 2012 ident: 10.1016/j.jocs.2015.12.001_bib0095 article-title: Cray cascade: a scalable HPC system based on a dragonfly network – year: 2012 ident: 10.1016/j.jocs.2015.12.001_bib0005 article-title: 4.45 Pflops astrophysical N-body simulation on K computer: the gravitational trillion-body problem – year: 2014 ident: 10.1016/j.jocs.2015.12.001_bib0055 article-title: Designing and auto-tuning parallel 3-d FFT with computation–communication overlap – year: 2002 ident: 10.1016/j.jocs.2015.12.001_bib0070 article-title: Active harmony: towards automated performance tuning – volume: 7 issue: 4 year: 1965 ident: 10.1016/j.jocs.2015.12.001_bib0080 article-title: A simplex method for function minimization publication-title: Comput. J. doi: 10.1093/comjnl/7.4.308 – volume: 49 issue: 2 year: 2005 ident: 10.1016/j.jocs.2015.12.001_bib0100 article-title: Scalable framework for 3d FFTs on the Blue Gene/L supercomputer: implementation and early performance measurements publication-title: IBM J. Res. Dev. – year: 2013 ident: 10.1016/j.jocs.2015.12.001_bib0010 article-title: Petascale direct numerical simulation of turbulent channel flow on up to 786k cores – year: 2008 ident: 10.1016/j.jocs.2015.12.001_bib0050 article-title: Brief announcement: leveraging non-blocking collective communication in high-performance applications – volume: 34 issue: 4 year: 2012 ident: 10.1016/j.jocs.2015.12.001_bib0020 article-title: P3DFFT: a framework for parallel computations of Fourier transforms in three dimensions publication-title: SIAM J. Sci. Comput. doi: 10.1137/11082748X – year: 2006 ident: 10.1016/j.jocs.2015.12.001_bib0065 article-title: Optimizing bandwidth limited problems using one-sided communication and overlap – volume: 176 issue: 8 year: 2007 ident: 10.1016/j.jocs.2015.12.001_bib0110 article-title: Performance of the 3d FFT on the 6d network torus QCDOC parallel supercomputer publication-title: Comput. Phys. Commun. doi: 10.1016/j.cpc.2006.12.006 – year: 2010 ident: 10.1016/j.jocs.2015.12.001_bib0025 article-title: 2DECOMP & FFT-a highly scalable 2d decomposition library and FFT interface – start-page: 1 year: 2014 ident: 10.1016/j.jocs.2015.12.001_bib0015 article-title: Scaling parallel 3-d FFT with non-blocking MPI collectives – year: 2010 ident: 10.1016/j.jocs.2015.12.001_bib0035 article-title: An implementation of parallel 3-d FFT with 2-d decomposition on a massively parallel cluster of multi-core processors – volume: 93 issue: 2 year: 2005 ident: 10.1016/j.jocs.2015.12.001_bib0045 article-title: The design and implementation of FFTW3 publication-title: Proc. IEEE doi: 10.1109/JPROC.2004.840301 – year: 2008 ident: 10.1016/j.jocs.2015.12.001_bib0075 article-title: Message progression in parallel computing – to thread or not to thread? – year: 2008 ident: 10.1016/j.jocs.2015.12.001_bib0090 article-title: Technology-driven, highly-scalable dragonfly topology – start-page: 1 year: 2009 ident: 10.1016/j.jocs.2015.12.001_bib0040 article-title: Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap |
| SSID | ssj0000388913 |
| Score | 2.0852106 |
| Snippet | •We design a new method of parallel 3-D FFT based on 2-D decomposition of an input 3-D array.•We optimize the performance through computation–communication... |
| SourceID | osti crossref elsevier |
| SourceType | Open Access Repository Enrichment Source Index Database Publisher |
| StartPage | 38 |
| SubjectTerms | 3-D FFT Auto-tuning Computation–communication overlap MPI Non-blocking collective |
| Title | Computation–communication overlap and parameter auto-tuning for scalable parallel 3-D FFT |
| URI | https://dx.doi.org/10.1016/j.jocs.2015.12.001 https://www.osti.gov/biblio/1374664 |
| Volume | 14 |
| WOSCitedRecordID | wos000379560000005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1877-7511 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000388913 issn: 1877-7503 databaseCode: AIEXJ dateStart: 20100501 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3LbtQwFLVgihAbRAuIUkBesABFrsaxHTvLClqVhyqkFjQSi8h2HJUhyow6CSo7_oE_5EuwY-cxrVrRBZtoFMVOZs6Zmxvn3HMBeKl1LpOUacSlLBCNc46UVAJRSrlipMCmbd_25SM_OhKzWfopvMFfte0EeFWJ8_N0-V-htvss2K509gZw95PaHfazBd1uLex2-0_A-z4NXrQRlAxEj8tAIqfaLOXSewRIp85yTomyqReobqpOWrmy6LV1Ve6QsjRlRNDb6CAsOF3OZ_Vw3q7QcmDNcVD-HjffT382PSEPvSX4IFQMhWXRh93xcgROBvFfiKCCc-Rejq6FWDqKkd7NJdxtvevspTjulxTmu_OFdp7qmLVrtuE8a6bZF25mvcSwU6_NMzdH5ubIcOwkfLfBRuxc_CdgY-_d_ux9vyTnjHHStqF2_yVCmZVXBF68mKtSmcnCRudRlnLyANwPcMA9T4tNcMtUW-BuV92wBTZDLF_BV8Fw_PVD8HVEmT-_fq-RBQayQEsW2JMFjsgCLVlgRxbYkQVaskBLlkfg88H-yZtDFJpuIE0JrxGOlZFM06nWXKeMGV1IHpt4SnOcJ1zyPE_iPJFEKF0Qpe3zPpO0wImmhjMlyGMwqRaVeQIgUZgyURj7DG7z1nSqcKqkSGVupMgVmW4D3P1-mQ6O9K4xSpldDd42iPoxS-_Hcu3RrIMlC7z3mWJmiXbtuB2HoRvjrJS105zZQZhw143h6Y2uYQfcG_4nz8CkPmvMc3BH_6i_rc5eBA7-BVOCo78 |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Computation%E2%80%93communication+overlap+and+parameter+auto-tuning+for+scalable+parallel+3-D+FFT&rft.jtitle=Journal+of+computational+science&rft.au=Song%2C+Sukhyun&rft.au=Hollingsworth%2C+Jeffrey+K.&rft.date=2016-05-01&rft.issn=1877-7503&rft.volume=14&rft.spage=38&rft.epage=50&rft_id=info:doi/10.1016%2Fj.jocs.2015.12.001&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_jocs_2015_12_001 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1877-7503&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1877-7503&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1877-7503&client=summon |