ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration

Transformer has emerged as a popular deep neural network (DNN) model for Neural Language Processing (NLP) applications and demonstrated excellent performance in neural machine translation, entity recognition, etc. However, its scaled dot-product attention mechanism in auto-regressive decoder brings...

Full description

Saved in:
Bibliographic Details
Published in:Digest of technical papers - IEEE/ACM International Conference on Computer-Aided Design pp. 1 - 9
Main Authors: Yang, Xiaoxuan, Yan, Bonan, Li, Hai, Chen, Yiran
Format: Conference Proceeding
Language:English
Published: Association on Computer Machinery 02.11.2020
Subjects:
ISSN:1558-2434
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Transformer has emerged as a popular deep neural network (DNN) model for Neural Language Processing (NLP) applications and demonstrated excellent performance in neural machine translation, entity recognition, etc. However, its scaled dot-product attention mechanism in auto-regressive decoder brings a performance bottleneck during inference. Transformer is also computationally and memory intensive and demands for a hardware acceleration solution. Although researchers have successfully applied ReRAM-based Processing-in-Memory (PIM) to accelerate convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the unique computation process of the scaled dot-product attention in Transformer makes it difficult to directly apply these designs. Besides, how to handle intermediate results in Matrix-matrix Multiplication (MatMul) and how to design a pipeline at a finer granularity of Transformer remain unsolved. In this work, we propose ReTransformer - a ReRAM-based PIM architecture for Transformer acceleration. ReTransformer can not only accelerate the scaled dot-product attention of Transformer using ReRAM-based PIM but also eliminate some data dependency by avoiding writing the intermediate results using the proposed matrix decomposition technique. Moreover, we propose a new sub-matrix pipeline design for multi-head self-attention. Experimental results show that compared to GPU and Pipelayer, ReTransformer improves computing efficiency by 23.21× and 3.25×, respectively. The corresponding overall power is reduced by 1086× and 2.82×, respectively.
AbstractList Transformer has emerged as a popular deep neural network (DNN) model for Neural Language Processing (NLP) applications and demonstrated excellent performance in neural machine translation, entity recognition, etc. However, its scaled dot-product attention mechanism in auto-regressive decoder brings a performance bottleneck during inference. Transformer is also computationally and memory intensive and demands for a hardware acceleration solution. Although researchers have successfully applied ReRAM-based Processing-in-Memory (PIM) to accelerate convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the unique computation process of the scaled dot-product attention in Transformer makes it difficult to directly apply these designs. Besides, how to handle intermediate results in Matrix-matrix Multiplication (MatMul) and how to design a pipeline at a finer granularity of Transformer remain unsolved. In this work, we propose ReTransformer - a ReRAM-based PIM architecture for Transformer acceleration. ReTransformer can not only accelerate the scaled dot-product attention of Transformer using ReRAM-based PIM but also eliminate some data dependency by avoiding writing the intermediate results using the proposed matrix decomposition technique. Moreover, we propose a new sub-matrix pipeline design for multi-head self-attention. Experimental results show that compared to GPU and Pipelayer, ReTransformer improves computing efficiency by 23.21× and 3.25×, respectively. The corresponding overall power is reduced by 1086× and 2.82×, respectively.
Author Yang, Xiaoxuan
Chen, Yiran
Li, Hai
Yan, Bonan
Author_xml – sequence: 1
  givenname: Xiaoxuan
  surname: Yang
  fullname: Yang, Xiaoxuan
  email: xy92@duke.edu
  organization: Duke University,Durham,NC,USA
– sequence: 2
  givenname: Bonan
  surname: Yan
  fullname: Yan, Bonan
  email: bonan.yan@duke.edu
  organization: Duke University,Durham,NC,USA
– sequence: 3
  givenname: Hai
  surname: Li
  fullname: Li, Hai
  email: hai.li@duke.edu
  organization: Duke University,Durham,NC,USA
– sequence: 4
  givenname: Yiran
  surname: Chen
  fullname: Chen, Yiran
  email: yiran.chen@duke.edu
  organization: Duke University,Durham,NC,USA
BookMark eNpNjL1OwzAYRQ0CiVIyM7DkBVz88zm22aKKP6kVKCoDU2U7nyFS4yAnDH17IsHAdM9wzr0kZ2lISMg1ZyvOQd1KYEwysZLAVQXshBRWG15VCoQUIE_Jgitl6IxwQYpx7DwDYKCsMQvy3uAuuzTGIfeY78oGm3pLvRuxLV_zEHD20wftEt1iP-RjWefw2U0Ypu-M5VyV__KyDgEPmN3UDemKnEd3GLH42yV5e7jfrZ_o5uXxeV1vqBOgJxq0a5FbcI5Zy5x21iKwVgZ0yke0BmMURnOlfWwhMs-5D-CNQtVqHkAuyc3vb4eI-6_c9S4f91aoSgkpfwAj6lXI
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1145/3400302.3415640
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISBN 9781665423243
1665423242
EISSN 1558-2434
EndPage 9
ExternalDocumentID 9256523
Genre orig-research
GrantInformation_xml – fundername: ARO
  grantid: W911NF-19-2-0107.
  funderid: 10.13039/100000183
– fundername: NSF
  grantid: 1955246,1910299,1725456
  funderid: 10.13039/501100001809
GroupedDBID 6IE
6IF
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
FEDTE
IEGSK
IJVOP
M43
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-a247t-c7ade194aa0990a7a99e40d3cea5bfe98eff287157bfd4f0b11bc4b85e5d71c43
IEDL.DBID RIE
ISICitedReferencesCount 92
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000671087100051&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:28:32 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a247t-c7ade194aa0990a7a99e40d3cea5bfe98eff287157bfd4f0b11bc4b85e5d71c43
PageCount 9
ParticipantIDs ieee_primary_9256523
PublicationCentury 2000
PublicationDate 2020-Nov.-2
PublicationDateYYYYMMDD 2020-11-02
PublicationDate_xml – month: 11
  year: 2020
  text: 2020-Nov.-2
  day: 02
PublicationDecade 2020
PublicationTitle Digest of technical papers - IEEE/ACM International Conference on Computer-Aided Design
PublicationTitleAbbrev ICCAD
PublicationYear 2020
Publisher Association on Computer Machinery
Publisher_xml – name: Association on Computer Machinery
SSID ssib044045988
ssj0020286
Score 2.494524
Snippet Transformer has emerged as a popular deep neural network (DNN) model for Neural Language Processing (NLP) applications and demonstrated excellent performance...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Acceleration
autoregressive decoder
Computational modeling
Computer architecture
convolutional neural networks
Decoding
deep neural network model
hardware acceleration solution
learning (artificial intelligence)
mathematics computing
matrix decomposition
matrix multiplication
matrix-matrix multiplication
memory architecture
multi-threading
natural language processing
neural language processing applications
neural machine translation
performance evaluation
Pipelines
processing-in-memory
recurrent neural nets
recurrent neural networks
ReRAM
ReRAM-based PIM architecture
ReRAM-based processing-in-memory architecture
ReTransformer
scaled dot-product attention mechanism
submatrix pipeline design
Transformer
Virtual machine monitors
Title ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration
URI https://ieeexplore.ieee.org/document/9256523
WOSCitedRecordID wos000671087100051&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LawIxEA4qPbSXPrT0TQ49NrqP2WTTm5RKL4qIBXuSSTIBL1qsFvrvu1nXdg-99BZCBkIy8OWbyXzD2L1MnJIQa5GDcQJIgjAudsJjpn1x55h6LJtNqNEon830uMEefmphiKj8fEbdMCxz-W5ltyFU1tMFPhfEqcmaSsldrdbed4LMXVZKb1Vkq8BNWUn5xJD1UgjunHTTQFhCpKPWS6WEksHx_zZxwjq_NXl8_IM2p6xByzN2VJMTbLO3CU3371BaP_IJTfpDEXDK8aogoFgnFksxDP9rv3i_lkXghRWvmfO-tQUk7Rykw14Hz9OnF1G1ThCYgNoIq9BRrAExJL5QodYEkUstYWY86Zy8D1wpU8Y78JGJY2PB5BllTsUW0nPWWq6WdMG4RG1tbsDKBMER5uiiiHxO0ksPGF2ydjik-ftOHWNenc_V39PX7DAJjDUEZpMb1tqst3TLDuznZvGxviuv9BvFUaRV
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5qFdSLj1Z8m4NH0-4jm914K2Kp2JZSKtRTyWMCvWxlbQX_vZvttu7Bi7cQMhCSgS_fTOYbgHsemJgzX9CEKUMZckaV8Q21MhI2v3MZWlk0m4iHw2Q6FaMaPGxrYRCx-HyGLTcscvlmoVcuVNYWOT7nxGkHdl3nrLJaa-M9TuguKsS3SrqVIycvxXx8FrVD5hw6aIWOsrhYR6WbSgEm3aP_beMYmr9VeWS0xZsTqGF6CocVQcEGvI9xsnmJYvZIxjjuDKhDKkPKkoB8HZ2ndOB-2H6TTiWPQHIrUjEnHa1zUFq7SBPeus-Tpx4tmydQGbB4SXUsDfqCSelSXzKWQiDzTKhRRsqiSNBax5aiWFnDrKd8X2mmkggjE_uahWdQTxcpngPhUmidKKZ5IJlBmUjjeWgT5JZbJr0LaLhDmn2s9TFm5flc_j19B_u9yaA_678MX6_gIHD81YVpg2uoL7MV3sCe_lrOP7Pb4np_AIIsp54
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Digest+of+technical+papers+-+IEEE%2FACM+International+Conference+on+Computer-Aided+Design&rft.atitle=ReTransformer%3A+ReRAM-based+Processing-in-Memory+Architecture+for+Transformer+Acceleration&rft.au=Yang%2C+Xiaoxuan&rft.au=Yan%2C+Bonan&rft.au=Li%2C+Hai&rft.au=Chen%2C+Yiran&rft.date=2020-11-02&rft.pub=Association+on+Computer+Machinery&rft.eissn=1558-2434&rft.spage=1&rft.epage=9&rft_id=info:doi/10.1145%2F3400302.3415640&rft.externalDocID=9256523