An Optimized FFT-Based Direct Poisson Solver on CUDA GPUs
A highly multithreaded FFT-based direct Poisson solver that makes effective use of the capabilities of the current NVIDIA graphics processing units (GPUs) is presented. Our algorithms carefully manage the multiple layers of the memory hierarchy of the GPUs such that almost all the global memory acce...
Saved in:
| Published in: | IEEE transactions on parallel and distributed systems Vol. 25; no. 3; pp. 550 - 559 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York
IEEE
01.03.2014
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 1045-9219, 1558-2183 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | A highly multithreaded FFT-based direct Poisson solver that makes effective use of the capabilities of the current NVIDIA graphics processing units (GPUs) is presented. Our algorithms carefully manage the multiple layers of the memory hierarchy of the GPUs such that almost all the global memory accesses are coalesced into 128-byte device memory transactions, and all computations are carried out directly on the registers. A new strategy to interleave the FFT computation along each dimension with other computations is used to minimize the total number of accesses to the 3D grid. We illustrate the performance of our algorithms on the NVIDIA Tesla and Fermi architectures for a wide range of grid sizes, up to the largest size that can fit on the device memory ((512\times 512\times 512) on the Tesla C1060/C2050 and (512\times 256\times 256) on the GeForce GTX 280/480). We achieve up to 140 GFLOPS and a bandwidth of 70 GB/s on the Tesla C1060, and up to 375 GFLOPS with a bandwidth of 120GB/s on the GTX 480. The performance of our algorithms is superior to what can be achieved using the CUDA FFT library in combination with well-known parallel algorithms for solving tridiagonal linear systems of equations. |
|---|---|
| AbstractList | A highly multithreaded FFT-based direct Poisson solver that makes effective use of the capabilities of the current NVIDIA graphics processing units (GPUs) is presented. Our algorithms carefully manage the multiple layers of the memory hierarchy of the GPUs such that almost all the global memory accesses are coalesced into 128-byte device memory transactions, and all computations are carried out directly on the registers. A new strategy to interleave the FFT computation along each dimension with other computations is used to minimize the total number of accesses to the 3D grid. We illustrate the performance of our algorithms on the NVIDIA Tesla and Fermi architectures for a wide range of grid sizes, up to the largest size that can fit on the device memory ($(512\times 512\times 512)$ on the Tesla C1060/C2050 and $(512\times 256\times 256)$ on the GeForce GTX 280/480). We achieve up to 140 GFLOPS and a bandwidth of 70 GB/s on the Tesla C1060, and up to 375 GFLOPS with a bandwidth of 120GB/s on the GTX 480. The performance of our algorithms is superior to what can be achieved using the CUDA FFT library in combination with well-known parallel algorithms for solving tridiagonal linear systems of equations. |
| Author | Jing Wu Balaras, Elias JaJa, Joseph |
| Author_xml | – sequence: 1 surname: Jing Wu fullname: Jing Wu email: jingwu@umiacs.umd.edu organization: Dept. of Electr. & Comput. Eng., Univ. of Maryland, College Park, MD, USA – sequence: 2 givenname: Joseph surname: JaJa fullname: JaJa, Joseph email: joseph@umiacs.umd.edu organization: Dept. of Electr. & Comput. Eng., Univ. of Maryland, College Park, MD, USA – sequence: 3 givenname: Elias surname: Balaras fullname: Balaras, Elias email: balaras@gwu.edu organization: George Washington Univ., Washington, DC, USA |
| BookMark | eNp10E1LAzEQBuAgCmr15s3LghcPbp3Jx2ZzrK1VQbBgew7ZbBYi201NtoL-erdUPAie5j08MwzvKTnsQucIuUAYI4K6XS5mr2MKyMaCHZATFKLMKZbscMjARa4oqmNymtIbAHIB_ISoSZe9bHq_9l-uzubzZX5n0pBmPjrbZ4vgUwpd9hraDxezIU1Xs0n2sFilM3LUmDa58585Iqv5_XL6mD-_PDxNJ8-5ZVj0eSOAVQBFVUJDwaCsFLWytOikNKqAylpZ17U1DS8rkFjVtaoLZUwFhZN1xUbken93E8P71qVer32yrm1N58I2aRTIuJAlZwO9-kPfwjZ2w3cauRJcUIp0UHSvbAwpRddo63vT-9D10fhWI-hdm3rXpt61qcXu9M2fpU30axM__-OXe-6dc7-04BIKKNk3Tg1-CQ |
| CODEN | ITDSEO |
| CitedBy_id | crossref_primary_10_1016_j_jpdc_2014_03_009 crossref_primary_10_1109_TPDS_2023_3322907 crossref_primary_10_1016_j_jpdc_2016_09_004 |
| Cites_doi | 10.1109/ICPP.2011.41 10.1145/321250.321259 10.1109/JPROC.2004.840301 10.1109/TPDS.2010.61 10.1006/jcph.1997.5716 10.1090/S0025-5718-1965-0178586-1 10.1109/InPar.2012.6339608 10.1109/IPDPS.2011.92 10.1145/1693453.1693472 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Mar 2014 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Mar 2014 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D F28 FR3 |
| DOI | 10.1109/TPDS.2013.53 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE/IET Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional ANTE: Abstracts in New Technology & Engineering Engineering Research Database |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional Engineering Research Database ANTE: Abstracts in New Technology & Engineering |
| DatabaseTitleList | Technology Research Database Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EISSN | 1558-2183 |
| EndPage | 559 |
| ExternalDocumentID | 3208824651 10_1109_TPDS_2013_53 6470608 |
| Genre | orig-research |
| GroupedDBID | --Z -~X .DC 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFO ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD HZ~ H~9 ICLAB IEDLZ IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNI RNS RZB TN5 TWZ UHB VH1 AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D F28 FR3 |
| ID | FETCH-LOGICAL-c316t-f503b006b80f20a17b92c78c1e77a960bcc7dddcaf48b071bdd9d69aab06e7db3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 9 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000334672200003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1045-9219 |
| IngestDate | Sun Nov 09 11:43:28 EST 2025 Sun Nov 09 06:18:29 EST 2025 Tue Nov 18 21:32:39 EST 2025 Sat Nov 29 08:13:03 EST 2025 Wed Aug 27 02:52:19 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 3 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c316t-f503b006b80f20a17b92c78c1e77a960bcc7dddcaf48b071bdd9d69aab06e7db3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Feature-1 content type line 23 |
| PQID | 1495452212 |
| PQPubID | 85437 |
| PageCount | 10 |
| ParticipantIDs | proquest_journals_1495452212 crossref_citationtrail_10_1109_TPDS_2013_53 ieee_primary_6470608 proquest_miscellaneous_1513457843 crossref_primary_10_1109_TPDS_2013_53 |
| PublicationCentury | 2000 |
| PublicationDate | 2014-March 2014-3-00 20140301 |
| PublicationDateYYYYMMDD | 2014-03-01 |
| PublicationDate_xml | – month: 03 year: 2014 text: 2014-March |
| PublicationDecade | 2010 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on parallel and distributed systems |
| PublicationTitleAbbrev | TPDS |
| PublicationYear | 2014 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref15 ref14 ref2 ref1 (ref10) 2012 ref8 (ref11) 2012 ref4 Nukada (ref9) 2011 ref3 ref6 Sengupta (ref13) ref5 (ref12) 2012 Kass (ref7) 2006 |
| References_xml | – ident: ref8 doi: 10.1109/ICPP.2011.41 – ident: ref6 doi: 10.1145/321250.321259 – ident: ref4 doi: 10.1109/JPROC.2004.840301 – year: 2011 ident: ref9 article-title: Nukada FFT Library Website – ident: ref5 doi: 10.1109/TPDS.2010.61 – start-page: 97 volume-title: Proc. 22nd ACM SIGGRAPH/EUROGRAPHICS Symp. Graphics Hardware ident: ref13 article-title: Scan Primitives for GPU Computing – volume-title: Technical Report 06-01 year: 2006 ident: ref7 article-title: Interactive Depth of Field Using Simulated Diffusion – year: 2012 ident: ref11 article-title: NVIDIA CUDA C Programming Guide – year: 2012 ident: ref12 article-title: NVIDIA CUDA Cufft Library – ident: ref1 doi: 10.1006/jcph.1997.5716 – ident: ref2 doi: 10.1090/S0025-5718-1965-0178586-1 – year: 2012 ident: ref10 article-title: NVIDIA CUDA C Programming Best Practices Guide – ident: ref14 doi: 10.1109/InPar.2012.6339608 – ident: ref3 doi: 10.1109/IPDPS.2011.92 – ident: ref15 doi: 10.1145/1693453.1693472 |
| SSID | ssj0014504 |
| Score | 2.1444085 |
| Snippet | A highly multithreaded FFT-based direct Poisson solver that makes effective use of the capabilities of the current NVIDIA graphics processing units (GPUs) is... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 550 |
| SubjectTerms | Algorithms Bandwidth Computation Computer architecture Economic impact elliptic equations Equations Fast-Fourier transforms Graphics processing units Instruction sets Kernel Linear systems Mathematical analysis Memory devices parallel and vector implementations Solvers Vectors |
| Title | An Optimized FFT-Based Direct Poisson Solver on CUDA GPUs |
| URI | https://ieeexplore.ieee.org/document/6470608 https://www.proquest.com/docview/1495452212 https://www.proquest.com/docview/1513457843 |
| Volume | 25 |
| WOSCitedRecordID | wos000334672200003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1558-2183 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014504 issn: 1045-9219 databaseCode: RIE dateStart: 19900101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB7Bqgc4QMsWsZQiI7WnYkjWThwft8DCAdGVdhdxi_yUkCBB--DAr2ecZKNW0AM3Sxk50Yznlfk8A_BDe8xufBTRTCtPOZ4piouIKu8wOEF_oatxPrfX4uYmu7uTozU4bu_COOcq8Jk7Ccuqlm9Lswy_yk5THnq9ZOuwLkRa39VqKwY8qUYFYnaRUIlq2ILc5elkdD4OIC52krB_3E81T-WNEa48y3D7Y9_0GbaaCJIMapF_gTVX7MD2ajoDaZR1Bzb_ajXYBTkoyB80D4_3L86S4XBCf6P_sqQ2eWRUogDKgozLgJQmuDqbng_I5Wg6_wrT4cXk7Io2YxOoYXG6oD6JWFAmnUW-H6lYaNk3IjOxE0JhwqKNEdZaozzPNEYY2lppU6mUjlInrGa70CnKwu0B0d5r7oVnVvW51XHm4oA-FMzoTHqtevBrxc3cND3Fw2iLh7zKLSKZB97ngfd5wnrws6V-qntp_IeuG_jc0jQs7sHBSlB5o2jzPCR4oSl83O_BUfsYVSTUPVThyiXSJDHjaJk4239_52-wga_mNbTsADqL2dJ9h0_meXE_nx1W5-wVnCHQOQ |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LbxMxEB6VggQcWuhDpA8wEpzArXftjdfHtCUUEUKkJii3lZ9SJdhFTcKBX9_x7mYFajlws7Qjy5rxvHY-zwC8MQGzm8AYzY0OVOCdorhgVAePwQn6C1OP8_k2kuNxPp-ryQa8797CeO9r8Jk_icu6lu8qu4q_yk77IvZ6yR_Aw0yIlDWvtbqagcjqYYGYX2RUoSJ2MHd1Op1cXEUYFz_J-F8OqJ6ocscM175luP1_p3oGW20MSQaN0J_Dhi93YHs9n4G06roDT_9oNrgLalCSr2ggflz_9o4Mh1N6hh7MkcbokUmFIqhKclVFrDTB1fnsYkA-TmaLPZgNP0zPL2k7OIFanvSXNGSMR3UyOQsp04k0KrUyt4mXUmPKYqyVzjmrg8gNxhjGOeX6SmvD-l46w_dhs6xK_wKICcGIIAN3OhXOJLlPIv5QcmtyFYzuwbs1NwvbdhWPwy2-F3V2wVQReV9E3hcZ78Hbjvpn003jH3S7kc8dTcviHhytBVW0qrYoYooX28InaQ9ed59RSWLlQ5e-WiFNlnCBtknwg_t3fgWPL6dfRsXo0_jzITzBY4gGaHYEm8ublT-GR_bX8npx87K-c7eZqNOA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Optimized+FFT-Based+Direct+Poisson+Solver+on+CUDA+GPUs&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Wu%2C+Jing&rft.au=JaJa%2C+Joseph&rft.au=Balaras%2C+Elias&rft.date=2014-03-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1045-9219&rft.eissn=1558-2183&rft.volume=25&rft.issue=3&rft.spage=550&rft_id=info:doi/10.1109%2FTPDS.2013.53&rft.externalDBID=NO_FULL_TEXT&rft.externalDocID=3208824651 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon |