An Optimized FFT-Based Direct Poisson Solver on CUDA GPUs

A highly multithreaded FFT-based direct Poisson solver that makes effective use of the capabilities of the current NVIDIA graphics processing units (GPUs) is presented. Our algorithms carefully manage the multiple layers of the memory hierarchy of the GPUs such that almost all the global memory acce...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on parallel and distributed systems Vol. 25; no. 3; pp. 550 - 559
Main Authors: Jing Wu, JaJa, Joseph, Balaras, Elias
Format: Journal Article
Language:English
Published: New York IEEE 01.03.2014
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:1045-9219, 1558-2183
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract A highly multithreaded FFT-based direct Poisson solver that makes effective use of the capabilities of the current NVIDIA graphics processing units (GPUs) is presented. Our algorithms carefully manage the multiple layers of the memory hierarchy of the GPUs such that almost all the global memory accesses are coalesced into 128-byte device memory transactions, and all computations are carried out directly on the registers. A new strategy to interleave the FFT computation along each dimension with other computations is used to minimize the total number of accesses to the 3D grid. We illustrate the performance of our algorithms on the NVIDIA Tesla and Fermi architectures for a wide range of grid sizes, up to the largest size that can fit on the device memory ((512\times 512\times 512) on the Tesla C1060/C2050 and (512\times 256\times 256) on the GeForce GTX 280/480). We achieve up to 140 GFLOPS and a bandwidth of 70 GB/s on the Tesla C1060, and up to 375 GFLOPS with a bandwidth of 120GB/s on the GTX 480. The performance of our algorithms is superior to what can be achieved using the CUDA FFT library in combination with well-known parallel algorithms for solving tridiagonal linear systems of equations.
AbstractList A highly multithreaded FFT-based direct Poisson solver that makes effective use of the capabilities of the current NVIDIA graphics processing units (GPUs) is presented. Our algorithms carefully manage the multiple layers of the memory hierarchy of the GPUs such that almost all the global memory accesses are coalesced into 128-byte device memory transactions, and all computations are carried out directly on the registers. A new strategy to interleave the FFT computation along each dimension with other computations is used to minimize the total number of accesses to the 3D grid. We illustrate the performance of our algorithms on the NVIDIA Tesla and Fermi architectures for a wide range of grid sizes, up to the largest size that can fit on the device memory ($(512\times 512\times 512)$ on the Tesla C1060/C2050 and $(512\times 256\times 256)$ on the GeForce GTX 280/480). We achieve up to 140 GFLOPS and a bandwidth of 70 GB/s on the Tesla C1060, and up to 375 GFLOPS with a bandwidth of 120GB/s on the GTX 480. The performance of our algorithms is superior to what can be achieved using the CUDA FFT library in combination with well-known parallel algorithms for solving tridiagonal linear systems of equations.
Author Jing Wu
Balaras, Elias
JaJa, Joseph
Author_xml – sequence: 1
  surname: Jing Wu
  fullname: Jing Wu
  email: jingwu@umiacs.umd.edu
  organization: Dept. of Electr. & Comput. Eng., Univ. of Maryland, College Park, MD, USA
– sequence: 2
  givenname: Joseph
  surname: JaJa
  fullname: JaJa, Joseph
  email: joseph@umiacs.umd.edu
  organization: Dept. of Electr. & Comput. Eng., Univ. of Maryland, College Park, MD, USA
– sequence: 3
  givenname: Elias
  surname: Balaras
  fullname: Balaras, Elias
  email: balaras@gwu.edu
  organization: George Washington Univ., Washington, DC, USA
BookMark eNp10E1LAzEQBuAgCmr15s3LghcPbp3Jx2ZzrK1VQbBgew7ZbBYi201NtoL-erdUPAie5j08MwzvKTnsQucIuUAYI4K6XS5mr2MKyMaCHZATFKLMKZbscMjARa4oqmNymtIbAHIB_ISoSZe9bHq_9l-uzubzZX5n0pBmPjrbZ4vgUwpd9hraDxezIU1Xs0n2sFilM3LUmDa58585Iqv5_XL6mD-_PDxNJ8-5ZVj0eSOAVQBFVUJDwaCsFLWytOikNKqAylpZ17U1DS8rkFjVtaoLZUwFhZN1xUbken93E8P71qVer32yrm1N58I2aRTIuJAlZwO9-kPfwjZ2w3cauRJcUIp0UHSvbAwpRddo63vT-9D10fhWI-hdm3rXpt61qcXu9M2fpU30axM__-OXe-6dc7-04BIKKNk3Tg1-CQ
CODEN ITDSEO
CitedBy_id crossref_primary_10_1016_j_jpdc_2014_03_009
crossref_primary_10_1109_TPDS_2023_3322907
crossref_primary_10_1016_j_jpdc_2016_09_004
Cites_doi 10.1109/ICPP.2011.41
10.1145/321250.321259
10.1109/JPROC.2004.840301
10.1109/TPDS.2010.61
10.1006/jcph.1997.5716
10.1090/S0025-5718-1965-0178586-1
10.1109/InPar.2012.6339608
10.1109/IPDPS.2011.92
10.1145/1693453.1693472
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Mar 2014
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Mar 2014
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
F28
FR3
DOI 10.1109/TPDS.2013.53
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE/IET Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
Engineering Research Database
ANTE: Abstracts in New Technology & Engineering
DatabaseTitleList Technology Research Database

Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1558-2183
EndPage 559
ExternalDocumentID 3208824651
10_1109_TPDS_2013_53
6470608
Genre orig-research
GroupedDBID --Z
-~X
.DC
0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFO
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
HZ~
H~9
ICLAB
IEDLZ
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNI
RNS
RZB
TN5
TWZ
UHB
VH1
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
F28
FR3
ID FETCH-LOGICAL-c316t-f503b006b80f20a17b92c78c1e77a960bcc7dddcaf48b071bdd9d69aab06e7db3
IEDL.DBID RIE
ISICitedReferencesCount 9
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000334672200003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1045-9219
IngestDate Sun Nov 09 11:43:28 EST 2025
Sun Nov 09 06:18:29 EST 2025
Tue Nov 18 21:32:39 EST 2025
Sat Nov 29 08:13:03 EST 2025
Wed Aug 27 02:52:19 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 3
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c316t-f503b006b80f20a17b92c78c1e77a960bcc7dddcaf48b071bdd9d69aab06e7db3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ObjectType-Article-2
ObjectType-Feature-1
content type line 23
PQID 1495452212
PQPubID 85437
PageCount 10
ParticipantIDs proquest_journals_1495452212
crossref_citationtrail_10_1109_TPDS_2013_53
ieee_primary_6470608
proquest_miscellaneous_1513457843
crossref_primary_10_1109_TPDS_2013_53
PublicationCentury 2000
PublicationDate 2014-March
2014-3-00
20140301
PublicationDateYYYYMMDD 2014-03-01
PublicationDate_xml – month: 03
  year: 2014
  text: 2014-March
PublicationDecade 2010
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on parallel and distributed systems
PublicationTitleAbbrev TPDS
PublicationYear 2014
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref15
ref14
ref2
ref1
(ref10) 2012
ref8
(ref11) 2012
ref4
Nukada (ref9) 2011
ref3
ref6
Sengupta (ref13)
ref5
(ref12) 2012
Kass (ref7) 2006
References_xml – ident: ref8
  doi: 10.1109/ICPP.2011.41
– ident: ref6
  doi: 10.1145/321250.321259
– ident: ref4
  doi: 10.1109/JPROC.2004.840301
– year: 2011
  ident: ref9
  article-title: Nukada FFT Library Website
– ident: ref5
  doi: 10.1109/TPDS.2010.61
– start-page: 97
  volume-title: Proc. 22nd ACM SIGGRAPH/EUROGRAPHICS Symp. Graphics Hardware
  ident: ref13
  article-title: Scan Primitives for GPU Computing
– volume-title: Technical Report 06-01
  year: 2006
  ident: ref7
  article-title: Interactive Depth of Field Using Simulated Diffusion
– year: 2012
  ident: ref11
  article-title: NVIDIA CUDA C Programming Guide
– year: 2012
  ident: ref12
  article-title: NVIDIA CUDA Cufft Library
– ident: ref1
  doi: 10.1006/jcph.1997.5716
– ident: ref2
  doi: 10.1090/S0025-5718-1965-0178586-1
– year: 2012
  ident: ref10
  article-title: NVIDIA CUDA C Programming Best Practices Guide
– ident: ref14
  doi: 10.1109/InPar.2012.6339608
– ident: ref3
  doi: 10.1109/IPDPS.2011.92
– ident: ref15
  doi: 10.1145/1693453.1693472
SSID ssj0014504
Score 2.1444085
Snippet A highly multithreaded FFT-based direct Poisson solver that makes effective use of the capabilities of the current NVIDIA graphics processing units (GPUs) is...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 550
SubjectTerms Algorithms
Bandwidth
Computation
Computer architecture
Economic impact
elliptic equations
Equations
Fast-Fourier transforms
Graphics processing units
Instruction sets
Kernel
Linear systems
Mathematical analysis
Memory devices
parallel and vector implementations
Solvers
Vectors
Title An Optimized FFT-Based Direct Poisson Solver on CUDA GPUs
URI https://ieeexplore.ieee.org/document/6470608
https://www.proquest.com/docview/1495452212
https://www.proquest.com/docview/1513457843
Volume 25
WOSCitedRecordID wos000334672200003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1558-2183
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014504
  issn: 1045-9219
  databaseCode: RIE
  dateStart: 19900101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB7Bqgc4QMsWsZQiI7WnYkjWThwft8DCAdGVdhdxi_yUkCBB--DAr2ecZKNW0AM3Sxk50Yznlfk8A_BDe8xufBTRTCtPOZ4piouIKu8wOEF_oatxPrfX4uYmu7uTozU4bu_COOcq8Jk7Ccuqlm9Lswy_yk5THnq9ZOuwLkRa39VqKwY8qUYFYnaRUIlq2ILc5elkdD4OIC52krB_3E81T-WNEa48y3D7Y9_0GbaaCJIMapF_gTVX7MD2ajoDaZR1Bzb_ajXYBTkoyB80D4_3L86S4XBCf6P_sqQ2eWRUogDKgozLgJQmuDqbng_I5Wg6_wrT4cXk7Io2YxOoYXG6oD6JWFAmnUW-H6lYaNk3IjOxE0JhwqKNEdZaozzPNEYY2lppU6mUjlInrGa70CnKwu0B0d5r7oVnVvW51XHm4oA-FMzoTHqtevBrxc3cND3Fw2iLh7zKLSKZB97ngfd5wnrws6V-qntp_IeuG_jc0jQs7sHBSlB5o2jzPCR4oSl83O_BUfsYVSTUPVThyiXSJDHjaJk4239_52-wga_mNbTsADqL2dJ9h0_meXE_nx1W5-wVnCHQOQ
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LbxMxEB6VggQcWuhDpA8wEpzArXftjdfHtCUUEUKkJii3lZ9SJdhFTcKBX9_x7mYFajlws7Qjy5rxvHY-zwC8MQGzm8AYzY0OVOCdorhgVAePwQn6C1OP8_k2kuNxPp-ryQa8797CeO9r8Jk_icu6lu8qu4q_yk77IvZ6yR_Aw0yIlDWvtbqagcjqYYGYX2RUoSJ2MHd1Op1cXEUYFz_J-F8OqJ6ocscM175luP1_p3oGW20MSQaN0J_Dhi93YHs9n4G06roDT_9oNrgLalCSr2ggflz_9o4Mh1N6hh7MkcbokUmFIqhKclVFrDTB1fnsYkA-TmaLPZgNP0zPL2k7OIFanvSXNGSMR3UyOQsp04k0KrUyt4mXUmPKYqyVzjmrg8gNxhjGOeX6SmvD-l46w_dhs6xK_wKICcGIIAN3OhXOJLlPIv5QcmtyFYzuwbs1NwvbdhWPwy2-F3V2wVQReV9E3hcZ78Hbjvpn003jH3S7kc8dTcviHhytBVW0qrYoYooX28InaQ9ed59RSWLlQ5e-WiFNlnCBtknwg_t3fgWPL6dfRsXo0_jzITzBY4gGaHYEm8ublT-GR_bX8npx87K-c7eZqNOA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Optimized+FFT-Based+Direct+Poisson+Solver+on+CUDA+GPUs&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Wu%2C+Jing&rft.au=JaJa%2C+Joseph&rft.au=Balaras%2C+Elias&rft.date=2014-03-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1045-9219&rft.eissn=1558-2183&rft.volume=25&rft.issue=3&rft.spage=550&rft_id=info:doi/10.1109%2FTPDS.2013.53&rft.externalDBID=NO_FULL_TEXT&rft.externalDocID=3208824651
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon