Auto-tuning 3-D FFT library for CUDA GPUs
Existing implementations of FFTs on GPUs are optimized for specific transform sizes like powers of two, and exhibit unstable and peaky performance i.e., do not perform as well in other sizes that appear in practice. Our new auto-tuning 3-D FFT on CUDA generates high performance CUDA kernels for FFTs...
Saved in:
| Published in: | Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis pp. 1 - 10 |
|---|---|
| Main Authors: | , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
New York, NY, USA
ACM
14.11.2009
|
| Series: | ACM Conferences |
| Subjects: | |
| ISBN: | 1605587443, 9781605587448 |
| ISSN: | 2167-4329 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Existing implementations of FFTs on GPUs are optimized for specific transform sizes like powers of two, and exhibit unstable and peaky performance i.e., do not perform as well in other sizes that appear in practice. Our new auto-tuning 3-D FFT on CUDA generates high performance CUDA kernels for FFTs of varying transform sizes, alleviating this problem. Although auto-tuning has been implemented on GPUs for dense kernels such as DGEMM and stencils, this is the first instance that has been applied comprehensively to bandwidth intensive and complex kernels such as 3-D FFTs. Bandwidth intensive optimizations such as selecting the number of threads and inserting padding to avoid bank conflicts on shared memory are systematically applied. Our resulting autotuner is fast and results in performance that essentially beats all 3-D FFT implementations on a single processor to date, and moreover exhibits stable performance irrespective of problem sizes or the underlying GPU hardware. |
|---|---|
| AbstractList | Existing implementations of FFTs on GPUs are optimized for specific transform sizes like powers of two, and exhibit unstable and peaky performance i.e., do not perform as well in other sizes that appear in practice. Our new auto-tuning 3-D FFT on CUDA generates high performance CUDA kernels for FFTs of varying transform sizes, alleviating this problem. Although auto-tuning has been implemented on GPUs for dense kernels such as DGEMM and stencils, this is the first instance that has been applied comprehensively to bandwidth intensive and complex kernels such as 3-D FFTs. Bandwidth intensive optimizations such as selecting the number of threads and inserting padding to avoid bank conflicts on shared memory are systematically applied. Our resulting autotuner is fast and results in performance that essentially beats all 3-D FFT implementations on a single processor to date, and moreover exhibits stable performance irrespective of problem sizes or the underlying GPU hardware. |
| Author | Matsuoka, Satoshi Nukada, Akira |
| Author_xml | – sequence: 1 givenname: Akira surname: Nukada fullname: Nukada, Akira organization: Tokyo Institute of Technology and Japan Science and Technology Agency, CREST – sequence: 2 givenname: Satoshi surname: Matsuoka fullname: Matsuoka, Satoshi organization: Tokyo Institute of Technology and National Institute of Informatics and Japan Science and Technology Agency, CREST |
| BookMark | eNqNkL1PwzAQxY0oEm3pzMDiEQYHf8ceo5QUpEowNLPlxDYKtAlK0oH_voZmYOR00tPpPZ3ufgswa7vWA3BLcEIIF49ECo6FTn5V4wuwIBILoVLO2eXfYQbmlMgUcUb1NVgNwweOpQhlSszBQ3YcOzQe26Z9hwytYVHs4L6pett_w9D1MC_XGdy8lcMNuAp2P_jVpEtQFk-7_BltXzcvebZFlol0RL4SNthUO-6cJYpRYrmqKSWeaOaCxlhQTanHXknHZKVqy4Koa2d1bGrZEtyd9zbee_PVN4d4ipEsFfHR6N6fXVsfTNV1n4Mh2PwQMRMRMxGJ0eSfUVP1jQ_sBLpIWzQ |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| Copyright | 2009 ACM |
| Copyright_xml | – notice: 2009 ACM |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/1654059.1654090 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 1605587443 9781605587448 |
| EndPage | 10 |
| ExternalDocumentID | 6375540 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IL 6IN AAJGR AARBI ACM ADPZR ALMA_UNASSIGNED_HOLDINGS APO BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK GUFHI OCL RIE RIL 6IH 6IK AAWTH ABLEC ADZIZ CHZPO IEGSK IPLJI |
| ID | FETCH-LOGICAL-a357t-eb5afa79d4dda18321a48c221e193df90052922e0e86d36b8ca3f5ccda9da92a3 |
| IEDL.DBID | RIE |
| ISBN | 1605587443 9781605587448 |
| ISICitedReferencesCount | 30 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000320136800027&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2167-4329 |
| IngestDate | Wed Jul 30 06:14:25 EDT 2025 Wed Jan 31 06:46:33 EST 2024 Wed Jan 31 06:45:55 EST 2024 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| License | Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org |
| LinkModel | DirectLink |
| MeetingName | SC '09: International Conference for High Performance Computing, Networking, Storage and Analysis |
| MergedId | FETCHMERGED-LOGICAL-a357t-eb5afa79d4dda18321a48c221e193df90052922e0e86d36b8ca3f5ccda9da92a3 |
| PageCount | 10 |
| ParticipantIDs | acm_books_10_1145_1654059_1654090 ieee_primary_6375540 acm_books_10_1145_1654059_1654090_brief |
| PublicationCentury | 2000 |
| PublicationDate | 2009-11-14 |
| PublicationDateYYYYMMDD | 2009-11-14 |
| PublicationDate_xml | – month: 11 year: 2009 text: 2009-11-14 day: 14 |
| PublicationDecade | 2000 |
| PublicationPlace | New York, NY, USA |
| PublicationPlace_xml | – name: New York, NY, USA |
| PublicationSeriesTitle | ACM Conferences |
| PublicationTitle | Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis |
| PublicationTitleAbbrev | SUPERC |
| PublicationYear | 2009 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssj0000812385 ssj0003204180 |
| Score | 1.7196482 |
| Snippet | Existing implementations of FFTs on GPUs are optimized for specific transform sizes like powers of two, and exhibit unstable and peaky performance i.e., do not... |
| SourceID | ieee acm |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Bandwidth Codes Computing methodologies -- Computer graphics -- Graphics systems and interfaces -- Graphics processors Graphics processing units Hardware Instruction sets Kernel Mathematics of computing -- Mathematical analysis -- Functional analysis -- Approximation Mathematics of computing -- Mathematical analysis -- Numerical analysis -- Computation of transforms Mathematics of computing -- Mathematical software Programming Registers Theory of computation -- Design and analysis of algorithms -- Approximation algorithms analysis Three-dimensional displays Transforms |
| Title | Auto-tuning 3-D FFT library for CUDA GPUs |
| URI | https://ieeexplore.ieee.org/document/6375540 |
| WOSCitedRecordID | wos000320136800027&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bS8MwFA7b8MGnqZs4b0QQRDBbm7RN8zg2q09jDxvsraTJCezBTbbW32-S1YogiFDolVAOTb9zyXc-hO6NtKhRcE2Uja9IZAMOIpKwIFJEOmUMIOCFF5vgs1m6Wol5Cz01XBgA8IvPYOgOfS1fb1XlUmWjhHGLfjZAb3OeHLhaTT7FQptFn7g5ZzSIQi-cRn1rb0ZF3dknjOKR4_BYx2Lo9-5_3Jbq7YfAiseXrPu_NztB_W-iHp43EHSKWrA5Q90vpQZcT9weehxX5ZaUlUuCYEamOMsWuE7gYOu24slyOsYv8-W-j5bZ82LySmqRBCJZzEsCRSyN5EJHWks3P0MZpYrSEKxrpo3wpTxKIYA00SwpUiWZiZXSUtiNSnaOOpvtBi4Q1oYWYFiQOvJbqEAokVBjbAwrE2PvDNCdtVLuvP99fiA0x3ltyby25AA9_PlMXuzWbrSes2P-fuiqkdcmvPz98hU69nUctwAvukadclfBDTpSH-V6v7v1n8In4hCpkA |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bS8MwGA1zCvo0dRPnNYIggtnapLc8js06cY49bLC3kOYCe3CTrfX3m2S1IggiFHollI-m57vkfAeAW80NamSxRMLEVygwAQeikZ8hTgOZEKKUF2dObCIej5P5nE5q4KHiwiil3OIz1bGHrpYvV6KwqbJuRGKDfiZA37XKWSVbq8qoGHAz-BNW5wR7ge-k07Br7k0wLXv7-EHYtSwe41p03N7-kXe4ePshseIQJm38790OQeubqgcnFQgdgZpaHoPGl1YDLKduE9z3inyF8sKmQSBBA5imU1imcKBxXGF_NujBp8ls0wKz9HHaH6JSJgFxEsY5UlnINY-pDKTkdob6PEgExr4yzpnU1BXzMFaeSiJJoiwRnOhQCMmp2TAnJ6C-XC3VKYBS40xp4iWW_uYLRQWNsNYmiuWRNnfa4MZYiVn_f8O2lOaQlZZkpSXb4O7PZ1i2XtjRmtaO7H3bV4OVJjz7_fI12B9OX0ds9Dx-OQcHrqpjl-MFF6Cerwt1CfbER77YrK_cZ_EJ8U-s2Q |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+Conference+on+High+Performance+Computing+Networking%2C+Storage+and+Analysis&rft.atitle=Auto-tuning+3-D+FFT+library+for+CUDA+GPUs&rft.au=Nukada%2C+Akira&rft.au=Matsuoka%2C+Satoshi&rft.series=ACM+Conferences&rft.date=2009-11-14&rft.pub=ACM&rft.isbn=1605587443&rft.spage=1&rft.epage=10&rft_id=info:doi/10.1145%2F1654059.1654090 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2167-4329&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2167-4329&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2167-4329&client=summon |

