Accelerating Fourier and Number Theoretic Transforms using Tensor Cores and Warp Shuffles

The discrete Fourier transform (DFT) and its specialized case, the number theoretic transform (NTT), are two important mathematical tools having applications in several areas of science and engineering. However, despite their usefulness and utility, their adoption continues to be a challenge as comp...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT) s. 345 - 355
Hlavní autori: Durrani, Sultan, Chughtai, Muhammad Saad, Hidayetoglu, Mert, Tahir, Rashid, Dakkak, Abdul, Rauchwerger, Lawrence, Zaffar, Fareed, Hwu, Wen-mei
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 01.09.2021
Predmet:
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract The discrete Fourier transform (DFT) and its specialized case, the number theoretic transform (NTT), are two important mathematical tools having applications in several areas of science and engineering. However, despite their usefulness and utility, their adoption continues to be a challenge as computing the DFT of a signal can be a time-consuming and expensive operation. To speed things up, fast Fourier transform (FFT) algorithms, which are reduced-complexity formulations for computing the DFT of a sequence, have been proposed and implemented for traditional processors and their corresponding instruction sets. With the rise of GPUs, NVIDIA introduced its own FFT computation library called cuFFT, which leverages the power of GPUs to compute the DFT. However, as this paper demonstrates, there is a lot of room for improvement to accelerate the FFT and NTT algorithms on modern GPUs by utilizing specialized operations and architectural advancements. In particular, we present four major types of optimizations that leverage tensor cores and the warp-shuffle instruction. Through extensive evaluations, we show that our approach consistently outperforms existing GPU-based implementations with a speedup of up to 4× for NTT and a speed of up to 1.5× for FFT.
AbstractList The discrete Fourier transform (DFT) and its specialized case, the number theoretic transform (NTT), are two important mathematical tools having applications in several areas of science and engineering. However, despite their usefulness and utility, their adoption continues to be a challenge as computing the DFT of a signal can be a time-consuming and expensive operation. To speed things up, fast Fourier transform (FFT) algorithms, which are reduced-complexity formulations for computing the DFT of a sequence, have been proposed and implemented for traditional processors and their corresponding instruction sets. With the rise of GPUs, NVIDIA introduced its own FFT computation library called cuFFT, which leverages the power of GPUs to compute the DFT. However, as this paper demonstrates, there is a lot of room for improvement to accelerate the FFT and NTT algorithms on modern GPUs by utilizing specialized operations and architectural advancements. In particular, we present four major types of optimizations that leverage tensor cores and the warp-shuffle instruction. Through extensive evaluations, we show that our approach consistently outperforms existing GPU-based implementations with a speedup of up to 4× for NTT and a speed of up to 1.5× for FFT.
Author Hwu, Wen-mei
Tahir, Rashid
Durrani, Sultan
Rauchwerger, Lawrence
Zaffar, Fareed
Chughtai, Muhammad Saad
Hidayetoglu, Mert
Dakkak, Abdul
Author_xml – sequence: 1
  givenname: Sultan
  surname: Durrani
  fullname: Durrani, Sultan
  email: sultand2@illinois.edu
  organization: University of Illinois at Urbana-Champaign
– sequence: 2
  givenname: Muhammad Saad
  surname: Chughtai
  fullname: Chughtai, Muhammad Saad
  email: chughtai@gatech.edu
  organization: Georgia Institute of Technology
– sequence: 3
  givenname: Mert
  surname: Hidayetoglu
  fullname: Hidayetoglu, Mert
  email: hidayet2@illinois.edu
  organization: University of Illinois at Urbana-Champaign
– sequence: 4
  givenname: Rashid
  surname: Tahir
  fullname: Tahir, Rashid
  email: r.tahir@upm.edu.sa
  organization: University of Prince Mugrin
– sequence: 5
  givenname: Abdul
  surname: Dakkak
  fullname: Dakkak, Abdul
  email: dakkak@illinois.edu
  organization: University of Illinois at Urbana-Champaign
– sequence: 6
  givenname: Lawrence
  surname: Rauchwerger
  fullname: Rauchwerger, Lawrence
  email: rwerger@illinois.edu
  organization: University of Illinois at Urbana-Champaign
– sequence: 7
  givenname: Fareed
  surname: Zaffar
  fullname: Zaffar, Fareed
  email: fareed.zaffar@lums.edu.pk
  organization: Lahore University of Management Sciences
– sequence: 8
  givenname: Wen-mei
  surname: Hwu
  fullname: Hwu, Wen-mei
  email: hwu@illinois.edu
  organization: University of Illinois at Urbana-Champaign
BookMark eNotjMtOwzAURI0EElDyBbDwD7RcO37EyyqigFQBEkGIVXUTX9NIeVR2suDvKY_VjHTmzCU7HcaBGLsRsBIC3O3Luqy0tE6vJEixAoBcnrDM2UIYo5WStrDnLEuprUFbm1vpxAX7WDcNdRRxaodPvhnn2FLkOHj-NPf1sVZ7GiNNbcOriEMKY-wTn9PPuqIhjZGXR55-lXeMB_66n0PoKF2xs4Bdouw_F-xtc1eVD8vt8_1jud4uURZ6WiIZMiisqJEseC01FM4JTSoUIFF6L6wPHsCh9UbVKkAQUiljGgxF0PmCXf_9tkS0O8S2x_i1c9rkoPL8GzgFVCQ
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/PACT52795.2021.00032
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781665442787
1665442786
EndPage 355
ExternalDocumentID 9563043
Genre orig-research
GroupedDBID 6IE
6IL
ACM
ALMA_UNASSIGNED_HOLDINGS
APO
CBEJK
LHSKQ
RIE
RIL
ID FETCH-LOGICAL-a285t-ae6e6a171bae70d525089915e4f802a2dd17dfd009a7d64b4f0f124466caf8f53
IEDL.DBID RIE
ISICitedReferencesCount 17
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000758464500025&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Tue May 06 03:33:13 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a285t-ae6e6a171bae70d525089915e4f802a2dd17dfd009a7d64b4f0f124466caf8f53
PageCount 11
ParticipantIDs ieee_primary_9563043
PublicationCentury 2000
PublicationDate 2021-Sept.
PublicationDateYYYYMMDD 2021-09-01
PublicationDate_xml – month: 09
  year: 2021
  text: 2021-Sept.
PublicationDecade 2020
PublicationTitle 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT)
PublicationTitleAbbrev PACT
PublicationYear 2021
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib057737291
Score 2.2654371
Snippet The discrete Fourier transform (DFT) and its specialized case, the number theoretic transform (NTT), are two important mathematical tools having applications...
SourceID ieee
SourceType Publisher
StartPage 345
SubjectTerms cyphertext
DFT
Discrete Fourier transforms
FFT
GPU
Graphics processing units
homomorphic encryption
Instruction sets
NTT
Parallel architectures
Tensor Cores
Tensors
Transforms
Title Accelerating Fourier and Number Theoretic Transforms using Tensor Cores and Warp Shuffles
URI https://ieeexplore.ieee.org/document/9563043
WOSCitedRecordID wos000758464500025&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEG2AePCkBozf6cGjK9vSbrdHQiSeCIlrxBOZbadoYhayC_x-211AD168NW3aJjNNZqYzbx4h91yCj3EUiyzPMRLO-3DALEQ5WMeEE7nlpiabUJNJOpvpaYs8HLAwiFgXn-FjGNa5fLs0m_BV1tehmZUYtElbqaTBau3fjlSBb0WzHTqOxbo_HY4yyZWWPgrkLDQqDBwjvzhUahMyPvnf5aek94PFo9ODlTkjLSy65H1ojLcXQXvFgo4b3jkKhaWTmuGDZnt8Is32nmlFQ437gmY-cF2WdOTXq3rLG5Qr-vKxce4Lqx55HT9lo-doR5IQAU_lOgJMMAGmWA6oYhuylD6EYhK93GMO3FqmrLPelQJlE5ELF7tg05PEgEudHJyTTrEs8IJQlMFZ0SlTDoRxqLUBgUk4NM5zll6SbhDLfNX0wZjvJHL19_Q1OQ5yb-qxbkhnXW7wlhyZ7fqzKu9q5X0D8bGcxg
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEA21CnpSacVvc_Do2k2abDbHUiwV61JwxXoq2WSigmzLbuvvN9ltqwcv3kJCEpgJzExm3jyErilXLsYRJDA0g4BZ58MpYlSQKWMJsywzVFdkEyJJ4slEjhvoZoOFAYCq-Axu_bDK5ZuZXvqvso70zaxYdwttc8ZoWKO11q-HC8-4IskKH0dC2Rn3-imnQnIXB1LiWxV6lpFfLCqVERns_-_6A9T-QePh8cbOHKIG5C302tPaWQyvv_wND2rmOaxyg5OK4wOna4QiTte-aYl9lfsbTl3oOitw362X1ZYXVczx0_vS2k8o2-h5cJf2h8GKJiFQNOaLQEEEkSKCZApEaHye0gVRhIOTfEgVNYYIY41zppQwEcuYDa236lGklY0t7x6hZj7L4Rhh4N5dkTERVjFtQUqtGET-0DDLSHyCWl4s03ndCWO6ksjp39NXaHeYPo6mo_vk4QzteR3U1VnnqLkolnCBdvTX4qMsLitFfgNIlaAN
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2021+30th+International+Conference+on+Parallel+Architectures+and+Compilation+Techniques+%28PACT%29&rft.atitle=Accelerating+Fourier+and+Number+Theoretic+Transforms+using+Tensor+Cores+and+Warp+Shuffles&rft.au=Durrani%2C+Sultan&rft.au=Chughtai%2C+Muhammad+Saad&rft.au=Hidayetoglu%2C+Mert&rft.au=Tahir%2C+Rashid&rft.date=2021-09-01&rft.pub=IEEE&rft.spage=345&rft.epage=355&rft_id=info:doi/10.1109%2FPACT52795.2021.00032&rft.externalDocID=9563043