Accelerating Fourier and Number Theoretic Transforms using Tensor Cores and Warp Shuffles
The discrete Fourier transform (DFT) and its specialized case, the number theoretic transform (NTT), are two important mathematical tools having applications in several areas of science and engineering. However, despite their usefulness and utility, their adoption continues to be a challenge as comp...
Uložené v:
| Vydané v: | 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT) s. 345 - 355 |
|---|---|
| Hlavní autori: | , , , , , , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
01.09.2021
|
| Predmet: | |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | The discrete Fourier transform (DFT) and its specialized case, the number theoretic transform (NTT), are two important mathematical tools having applications in several areas of science and engineering. However, despite their usefulness and utility, their adoption continues to be a challenge as computing the DFT of a signal can be a time-consuming and expensive operation. To speed things up, fast Fourier transform (FFT) algorithms, which are reduced-complexity formulations for computing the DFT of a sequence, have been proposed and implemented for traditional processors and their corresponding instruction sets. With the rise of GPUs, NVIDIA introduced its own FFT computation library called cuFFT, which leverages the power of GPUs to compute the DFT. However, as this paper demonstrates, there is a lot of room for improvement to accelerate the FFT and NTT algorithms on modern GPUs by utilizing specialized operations and architectural advancements. In particular, we present four major types of optimizations that leverage tensor cores and the warp-shuffle instruction. Through extensive evaluations, we show that our approach consistently outperforms existing GPU-based implementations with a speedup of up to 4× for NTT and a speed of up to 1.5× for FFT. |
|---|---|
| AbstractList | The discrete Fourier transform (DFT) and its specialized case, the number theoretic transform (NTT), are two important mathematical tools having applications in several areas of science and engineering. However, despite their usefulness and utility, their adoption continues to be a challenge as computing the DFT of a signal can be a time-consuming and expensive operation. To speed things up, fast Fourier transform (FFT) algorithms, which are reduced-complexity formulations for computing the DFT of a sequence, have been proposed and implemented for traditional processors and their corresponding instruction sets. With the rise of GPUs, NVIDIA introduced its own FFT computation library called cuFFT, which leverages the power of GPUs to compute the DFT. However, as this paper demonstrates, there is a lot of room for improvement to accelerate the FFT and NTT algorithms on modern GPUs by utilizing specialized operations and architectural advancements. In particular, we present four major types of optimizations that leverage tensor cores and the warp-shuffle instruction. Through extensive evaluations, we show that our approach consistently outperforms existing GPU-based implementations with a speedup of up to 4× for NTT and a speed of up to 1.5× for FFT. |
| Author | Hwu, Wen-mei Tahir, Rashid Durrani, Sultan Rauchwerger, Lawrence Zaffar, Fareed Chughtai, Muhammad Saad Hidayetoglu, Mert Dakkak, Abdul |
| Author_xml | – sequence: 1 givenname: Sultan surname: Durrani fullname: Durrani, Sultan email: sultand2@illinois.edu organization: University of Illinois at Urbana-Champaign – sequence: 2 givenname: Muhammad Saad surname: Chughtai fullname: Chughtai, Muhammad Saad email: chughtai@gatech.edu organization: Georgia Institute of Technology – sequence: 3 givenname: Mert surname: Hidayetoglu fullname: Hidayetoglu, Mert email: hidayet2@illinois.edu organization: University of Illinois at Urbana-Champaign – sequence: 4 givenname: Rashid surname: Tahir fullname: Tahir, Rashid email: r.tahir@upm.edu.sa organization: University of Prince Mugrin – sequence: 5 givenname: Abdul surname: Dakkak fullname: Dakkak, Abdul email: dakkak@illinois.edu organization: University of Illinois at Urbana-Champaign – sequence: 6 givenname: Lawrence surname: Rauchwerger fullname: Rauchwerger, Lawrence email: rwerger@illinois.edu organization: University of Illinois at Urbana-Champaign – sequence: 7 givenname: Fareed surname: Zaffar fullname: Zaffar, Fareed email: fareed.zaffar@lums.edu.pk organization: Lahore University of Management Sciences – sequence: 8 givenname: Wen-mei surname: Hwu fullname: Hwu, Wen-mei email: hwu@illinois.edu organization: University of Illinois at Urbana-Champaign |
| BookMark | eNotjMtOwzAURI0EElDyBbDwD7RcO37EyyqigFQBEkGIVXUTX9NIeVR2suDvKY_VjHTmzCU7HcaBGLsRsBIC3O3Luqy0tE6vJEixAoBcnrDM2UIYo5WStrDnLEuprUFbm1vpxAX7WDcNdRRxaodPvhnn2FLkOHj-NPf1sVZ7GiNNbcOriEMKY-wTn9PPuqIhjZGXR55-lXeMB_66n0PoKF2xs4Bdouw_F-xtc1eVD8vt8_1jud4uURZ6WiIZMiisqJEseC01FM4JTSoUIFF6L6wPHsCh9UbVKkAQUiljGgxF0PmCXf_9tkS0O8S2x_i1c9rkoPL8GzgFVCQ |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/PACT52795.2021.00032 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9781665442787 1665442786 |
| EndPage | 355 |
| ExternalDocumentID | 9563043 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL ACM ALMA_UNASSIGNED_HOLDINGS APO CBEJK LHSKQ RIE RIL |
| ID | FETCH-LOGICAL-a285t-ae6e6a171bae70d525089915e4f802a2dd17dfd009a7d64b4f0f124466caf8f53 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 17 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000758464500025&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Tue May 06 03:33:13 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a285t-ae6e6a171bae70d525089915e4f802a2dd17dfd009a7d64b4f0f124466caf8f53 |
| PageCount | 11 |
| ParticipantIDs | ieee_primary_9563043 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-Sept. |
| PublicationDateYYYYMMDD | 2021-09-01 |
| PublicationDate_xml | – month: 09 year: 2021 text: 2021-Sept. |
| PublicationDecade | 2020 |
| PublicationTitle | 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT) |
| PublicationTitleAbbrev | PACT |
| PublicationYear | 2021 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssib057737291 |
| Score | 2.2654371 |
| Snippet | The discrete Fourier transform (DFT) and its specialized case, the number theoretic transform (NTT), are two important mathematical tools having applications... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 345 |
| SubjectTerms | cyphertext DFT Discrete Fourier transforms FFT GPU Graphics processing units homomorphic encryption Instruction sets NTT Parallel architectures Tensor Cores Tensors Transforms |
| Title | Accelerating Fourier and Number Theoretic Transforms using Tensor Cores and Warp Shuffles |
| URI | https://ieeexplore.ieee.org/document/9563043 |
| WOSCitedRecordID | wos000758464500025&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEG2AePCkBozf6cGjK9vSbrdHQiSeCIlrxBOZbadoYhayC_x-211AD168NW3aJjNNZqYzbx4h91yCj3EUiyzPMRLO-3DALEQ5WMeEE7nlpiabUJNJOpvpaYs8HLAwiFgXn-FjGNa5fLs0m_BV1tehmZUYtElbqaTBau3fjlSBb0WzHTqOxbo_HY4yyZWWPgrkLDQqDBwjvzhUahMyPvnf5aek94PFo9ODlTkjLSy65H1ojLcXQXvFgo4b3jkKhaWTmuGDZnt8Is32nmlFQ437gmY-cF2WdOTXq3rLG5Qr-vKxce4Lqx55HT9lo-doR5IQAU_lOgJMMAGmWA6oYhuylD6EYhK93GMO3FqmrLPelQJlE5ELF7tg05PEgEudHJyTTrEs8IJQlMFZ0SlTDoRxqLUBgUk4NM5zll6SbhDLfNX0wZjvJHL19_Q1OQ5yb-qxbkhnXW7wlhyZ7fqzKu9q5X0D8bGcxg |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEA21CnpSacVvc_Do2k2abDbHUiwV61JwxXoq2WSigmzLbuvvN9ltqwcv3kJCEpgJzExm3jyErilXLsYRJDA0g4BZ58MpYlSQKWMJsywzVFdkEyJJ4slEjhvoZoOFAYCq-Axu_bDK5ZuZXvqvso70zaxYdwttc8ZoWKO11q-HC8-4IskKH0dC2Rn3-imnQnIXB1LiWxV6lpFfLCqVERns_-_6A9T-QePh8cbOHKIG5C302tPaWQyvv_wND2rmOaxyg5OK4wOna4QiTte-aYl9lfsbTl3oOitw362X1ZYXVczx0_vS2k8o2-h5cJf2h8GKJiFQNOaLQEEEkSKCZApEaHye0gVRhIOTfEgVNYYIY41zppQwEcuYDa236lGklY0t7x6hZj7L4Rhh4N5dkTERVjFtQUqtGET-0DDLSHyCWl4s03ndCWO6ksjp39NXaHeYPo6mo_vk4QzteR3U1VnnqLkolnCBdvTX4qMsLitFfgNIlaAN |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2021+30th+International+Conference+on+Parallel+Architectures+and+Compilation+Techniques+%28PACT%29&rft.atitle=Accelerating+Fourier+and+Number+Theoretic+Transforms+using+Tensor+Cores+and+Warp+Shuffles&rft.au=Durrani%2C+Sultan&rft.au=Chughtai%2C+Muhammad+Saad&rft.au=Hidayetoglu%2C+Mert&rft.au=Tahir%2C+Rashid&rft.date=2021-09-01&rft.pub=IEEE&rft.spage=345&rft.epage=355&rft_id=info:doi/10.1109%2FPACT52795.2021.00032&rft.externalDocID=9563043 |