ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration

Transformer has emerged as a popular deep neural network (DNN) model for Neural Language Processing (NLP) applications and demonstrated excellent performance in neural machine translation, entity recognition, etc. However, its scaled dot-product attention mechanism in auto-regressive decoder brings...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Digest of technical papers - IEEE/ACM International Conference on Computer-Aided Design S. 1 - 9
Hauptverfasser: Yang, Xiaoxuan, Yan, Bonan, Li, Hai, Chen, Yiran
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: Association on Computer Machinery 02.11.2020
Schlagworte:
ISSN:1558-2434
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Transformer has emerged as a popular deep neural network (DNN) model for Neural Language Processing (NLP) applications and demonstrated excellent performance in neural machine translation, entity recognition, etc. However, its scaled dot-product attention mechanism in auto-regressive decoder brings a performance bottleneck during inference. Transformer is also computationally and memory intensive and demands for a hardware acceleration solution. Although researchers have successfully applied ReRAM-based Processing-in-Memory (PIM) to accelerate convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the unique computation process of the scaled dot-product attention in Transformer makes it difficult to directly apply these designs. Besides, how to handle intermediate results in Matrix-matrix Multiplication (MatMul) and how to design a pipeline at a finer granularity of Transformer remain unsolved. In this work, we propose ReTransformer - a ReRAM-based PIM architecture for Transformer acceleration. ReTransformer can not only accelerate the scaled dot-product attention of Transformer using ReRAM-based PIM but also eliminate some data dependency by avoiding writing the intermediate results using the proposed matrix decomposition technique. Moreover, we propose a new sub-matrix pipeline design for multi-head self-attention. Experimental results show that compared to GPU and Pipelayer, ReTransformer improves computing efficiency by 23.21× and 3.25×, respectively. The corresponding overall power is reduced by 1086× and 2.82×, respectively.
AbstractList Transformer has emerged as a popular deep neural network (DNN) model for Neural Language Processing (NLP) applications and demonstrated excellent performance in neural machine translation, entity recognition, etc. However, its scaled dot-product attention mechanism in auto-regressive decoder brings a performance bottleneck during inference. Transformer is also computationally and memory intensive and demands for a hardware acceleration solution. Although researchers have successfully applied ReRAM-based Processing-in-Memory (PIM) to accelerate convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the unique computation process of the scaled dot-product attention in Transformer makes it difficult to directly apply these designs. Besides, how to handle intermediate results in Matrix-matrix Multiplication (MatMul) and how to design a pipeline at a finer granularity of Transformer remain unsolved. In this work, we propose ReTransformer - a ReRAM-based PIM architecture for Transformer acceleration. ReTransformer can not only accelerate the scaled dot-product attention of Transformer using ReRAM-based PIM but also eliminate some data dependency by avoiding writing the intermediate results using the proposed matrix decomposition technique. Moreover, we propose a new sub-matrix pipeline design for multi-head self-attention. Experimental results show that compared to GPU and Pipelayer, ReTransformer improves computing efficiency by 23.21× and 3.25×, respectively. The corresponding overall power is reduced by 1086× and 2.82×, respectively.
Author Yang, Xiaoxuan
Chen, Yiran
Li, Hai
Yan, Bonan
Author_xml – sequence: 1
  givenname: Xiaoxuan
  surname: Yang
  fullname: Yang, Xiaoxuan
  email: xy92@duke.edu
  organization: Duke University,Durham,NC,USA
– sequence: 2
  givenname: Bonan
  surname: Yan
  fullname: Yan, Bonan
  email: bonan.yan@duke.edu
  organization: Duke University,Durham,NC,USA
– sequence: 3
  givenname: Hai
  surname: Li
  fullname: Li, Hai
  email: hai.li@duke.edu
  organization: Duke University,Durham,NC,USA
– sequence: 4
  givenname: Yiran
  surname: Chen
  fullname: Chen, Yiran
  email: yiran.chen@duke.edu
  organization: Duke University,Durham,NC,USA
BookMark eNpNjL1OwzAYRQ0CiVIyM7DkBVz88zm22aKKP6kVKCoDU2U7nyFS4yAnDH17IsHAdM9wzr0kZ2lISMg1ZyvOQd1KYEwysZLAVQXshBRWG15VCoQUIE_Jgitl6IxwQYpx7DwDYKCsMQvy3uAuuzTGIfeY78oGm3pLvRuxLV_zEHD20wftEt1iP-RjWefw2U0Ypu-M5VyV__KyDgEPmN3UDemKnEd3GLH42yV5e7jfrZ_o5uXxeV1vqBOgJxq0a5FbcI5Zy5x21iKwVgZ0yke0BmMURnOlfWwhMs-5D-CNQtVqHkAuyc3vb4eI-6_c9S4f91aoSgkpfwAj6lXI
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1145/3400302.3415640
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISBN 9781665423243
1665423242
EISSN 1558-2434
EndPage 9
ExternalDocumentID 9256523
Genre orig-research
GrantInformation_xml – fundername: ARO
  grantid: W911NF-19-2-0107.
  funderid: 10.13039/100000183
– fundername: NSF
  grantid: 1955246,1910299,1725456
  funderid: 10.13039/501100001809
GroupedDBID 6IE
6IF
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
FEDTE
IEGSK
IJVOP
M43
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-a247t-c7ade194aa0990a7a99e40d3cea5bfe98eff287157bfd4f0b11bc4b85e5d71c43
IEDL.DBID RIE
ISICitedReferencesCount 92
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000671087100051&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:28:32 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a247t-c7ade194aa0990a7a99e40d3cea5bfe98eff287157bfd4f0b11bc4b85e5d71c43
PageCount 9
ParticipantIDs ieee_primary_9256523
PublicationCentury 2000
PublicationDate 2020-Nov.-2
PublicationDateYYYYMMDD 2020-11-02
PublicationDate_xml – month: 11
  year: 2020
  text: 2020-Nov.-2
  day: 02
PublicationDecade 2020
PublicationTitle Digest of technical papers - IEEE/ACM International Conference on Computer-Aided Design
PublicationTitleAbbrev ICCAD
PublicationYear 2020
Publisher Association on Computer Machinery
Publisher_xml – name: Association on Computer Machinery
SSID ssib044045988
ssj0020286
Score 2.494524
Snippet Transformer has emerged as a popular deep neural network (DNN) model for Neural Language Processing (NLP) applications and demonstrated excellent performance...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Acceleration
autoregressive decoder
Computational modeling
Computer architecture
convolutional neural networks
Decoding
deep neural network model
hardware acceleration solution
learning (artificial intelligence)
mathematics computing
matrix decomposition
matrix multiplication
matrix-matrix multiplication
memory architecture
multi-threading
natural language processing
neural language processing applications
neural machine translation
performance evaluation
Pipelines
processing-in-memory
recurrent neural nets
recurrent neural networks
ReRAM
ReRAM-based PIM architecture
ReRAM-based processing-in-memory architecture
ReTransformer
scaled dot-product attention mechanism
submatrix pipeline design
Transformer
Virtual machine monitors
Title ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration
URI https://ieeexplore.ieee.org/document/9256523
WOSCitedRecordID wos000671087100051&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LawIxEB5UemgvfWjpmxx6bHSzScymNymVXhQRC_YkecyCl7VYLfTfN1nXdg-99LaEBEJ2yDffTOYbgPvgCWWJM45aKzMqbJpToz2jmmMAIN7nnLuy2YQaj7P5XE8a8PBTC4OI5eMz7MbPMpfvV24bQ2U9HfA5EKcmNJXq72q19rYTZe5kKb1Vka2Am_1KyocJ2eMimnPa5ZGwxEhHrZdKCSXD4_9t4gQ6vzV5ZPKDNqfQwOIMjmpygm14m-Js74fi-pFMcToY0YhTnlQFAWEeXRZ0FN_XfpFBLYtAwipSW04GzgVI2hlIB16Hz7OnF1q1TqAmFWpDnTIemRbGxMSXUUZrFInnDo20OeoM8zxyJals7kWeWMasEzaTKL1iTvBzaBWrAi-AqCzxwWdKveBehKvRMOXR-jDTGs21vIR2PKTF-04dY1Gdz9Xfw9dwmEbGGgOz6Q20Nust3sKB-9wsP9Z35S_9BthsovY
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEJ4gmqgXH2B824NHC7vbLt16I0aCEQghmOCJ9DGbcFkMgon_3nZZcA9evDWbNtlMJ_36zXS-Abh3N6EkMMpQreOEch2lVEkbUsnQARBrMcZM3mxCDAbJZCKHFXjY1sIgYv74DBt-mOfy7dysfKisKR0-O-K0A7u-c1ZRrbXxHi90F-fiWwXdcsjZKsR8Qh43GfcOHTWYpyw-1lHqppKDSefof79xDPXfqjwy3OLNCVQwO4XDkqBgDd5HON7cRHHxSEY4avepRypLipIAN4_OMtr3L2y_SbuURyBuFSktJ21jHCitXaQOb53n8VOXFs0TqIq4WFIjlMVQcqV86ksJJSXywDKDKtYpygTT1LOlWOjU8jTQYagN10mMsRWh4ewMqtk8w3MgIgmss3dkObPcHY4qFBa1dTO1kkzGF1DzRpp-rPUxpoV9Lv_-fAf73XG_N-29DF6v4CDy_NWHaaNrqC4XK7yBPfO1nH0ubvPt_QGEVaY_
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Digest+of+technical+papers+-+IEEE%2FACM+International+Conference+on+Computer-Aided+Design&rft.atitle=ReTransformer%3A+ReRAM-based+Processing-in-Memory+Architecture+for+Transformer+Acceleration&rft.au=Yang%2C+Xiaoxuan&rft.au=Yan%2C+Bonan&rft.au=Li%2C+Hai&rft.au=Chen%2C+Yiran&rft.date=2020-11-02&rft.pub=Association+on+Computer+Machinery&rft.eissn=1558-2434&rft.spage=1&rft.epage=9&rft_id=info:doi/10.1145%2F3400302.3415640&rft.externalDocID=9256523