ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration
Transformer has emerged as a popular deep neural network (DNN) model for Neural Language Processing (NLP) applications and demonstrated excellent performance in neural machine translation, entity recognition, etc. However, its scaled dot-product attention mechanism in auto-regressive decoder brings...
Gespeichert in:
| Veröffentlicht in: | Digest of technical papers - IEEE/ACM International Conference on Computer-Aided Design S. 1 - 9 |
|---|---|
| Hauptverfasser: | , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
Association on Computer Machinery
02.11.2020
|
| Schlagworte: | |
| ISSN: | 1558-2434 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Transformer has emerged as a popular deep neural network (DNN) model for Neural Language Processing (NLP) applications and demonstrated excellent performance in neural machine translation, entity recognition, etc. However, its scaled dot-product attention mechanism in auto-regressive decoder brings a performance bottleneck during inference. Transformer is also computationally and memory intensive and demands for a hardware acceleration solution. Although researchers have successfully applied ReRAM-based Processing-in-Memory (PIM) to accelerate convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the unique computation process of the scaled dot-product attention in Transformer makes it difficult to directly apply these designs. Besides, how to handle intermediate results in Matrix-matrix Multiplication (MatMul) and how to design a pipeline at a finer granularity of Transformer remain unsolved. In this work, we propose ReTransformer - a ReRAM-based PIM architecture for Transformer acceleration. ReTransformer can not only accelerate the scaled dot-product attention of Transformer using ReRAM-based PIM but also eliminate some data dependency by avoiding writing the intermediate results using the proposed matrix decomposition technique. Moreover, we propose a new sub-matrix pipeline design for multi-head self-attention. Experimental results show that compared to GPU and Pipelayer, ReTransformer improves computing efficiency by 23.21× and 3.25×, respectively. The corresponding overall power is reduced by 1086× and 2.82×, respectively. |
|---|---|
| AbstractList | Transformer has emerged as a popular deep neural network (DNN) model for Neural Language Processing (NLP) applications and demonstrated excellent performance in neural machine translation, entity recognition, etc. However, its scaled dot-product attention mechanism in auto-regressive decoder brings a performance bottleneck during inference. Transformer is also computationally and memory intensive and demands for a hardware acceleration solution. Although researchers have successfully applied ReRAM-based Processing-in-Memory (PIM) to accelerate convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the unique computation process of the scaled dot-product attention in Transformer makes it difficult to directly apply these designs. Besides, how to handle intermediate results in Matrix-matrix Multiplication (MatMul) and how to design a pipeline at a finer granularity of Transformer remain unsolved. In this work, we propose ReTransformer - a ReRAM-based PIM architecture for Transformer acceleration. ReTransformer can not only accelerate the scaled dot-product attention of Transformer using ReRAM-based PIM but also eliminate some data dependency by avoiding writing the intermediate results using the proposed matrix decomposition technique. Moreover, we propose a new sub-matrix pipeline design for multi-head self-attention. Experimental results show that compared to GPU and Pipelayer, ReTransformer improves computing efficiency by 23.21× and 3.25×, respectively. The corresponding overall power is reduced by 1086× and 2.82×, respectively. |
| Author | Yang, Xiaoxuan Chen, Yiran Li, Hai Yan, Bonan |
| Author_xml | – sequence: 1 givenname: Xiaoxuan surname: Yang fullname: Yang, Xiaoxuan email: xy92@duke.edu organization: Duke University,Durham,NC,USA – sequence: 2 givenname: Bonan surname: Yan fullname: Yan, Bonan email: bonan.yan@duke.edu organization: Duke University,Durham,NC,USA – sequence: 3 givenname: Hai surname: Li fullname: Li, Hai email: hai.li@duke.edu organization: Duke University,Durham,NC,USA – sequence: 4 givenname: Yiran surname: Chen fullname: Chen, Yiran email: yiran.chen@duke.edu organization: Duke University,Durham,NC,USA |
| BookMark | eNpNjL1OwzAYRQ0CiVIyM7DkBVz88zm22aKKP6kVKCoDU2U7nyFS4yAnDH17IsHAdM9wzr0kZ2lISMg1ZyvOQd1KYEwysZLAVQXshBRWG15VCoQUIE_Jgitl6IxwQYpx7DwDYKCsMQvy3uAuuzTGIfeY78oGm3pLvRuxLV_zEHD20wftEt1iP-RjWefw2U0Ypu-M5VyV__KyDgEPmN3UDemKnEd3GLH42yV5e7jfrZ_o5uXxeV1vqBOgJxq0a5FbcI5Zy5x21iKwVgZ0yke0BmMURnOlfWwhMs-5D-CNQtVqHkAuyc3vb4eI-6_c9S4f91aoSgkpfwAj6lXI |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1145/3400302.3415640 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISBN | 9781665423243 1665423242 |
| EISSN | 1558-2434 |
| EndPage | 9 |
| ExternalDocumentID | 9256523 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: ARO grantid: W911NF-19-2-0107. funderid: 10.13039/100000183 – fundername: NSF grantid: 1955246,1910299,1725456 funderid: 10.13039/501100001809 |
| GroupedDBID | 6IE 6IF 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO FEDTE IEGSK IJVOP M43 OCL RIE RIL RIO |
| ID | FETCH-LOGICAL-a247t-c7ade194aa0990a7a99e40d3cea5bfe98eff287157bfd4f0b11bc4b85e5d71c43 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 92 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000671087100051&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:28:32 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a247t-c7ade194aa0990a7a99e40d3cea5bfe98eff287157bfd4f0b11bc4b85e5d71c43 |
| PageCount | 9 |
| ParticipantIDs | ieee_primary_9256523 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-Nov.-2 |
| PublicationDateYYYYMMDD | 2020-11-02 |
| PublicationDate_xml | – month: 11 year: 2020 text: 2020-Nov.-2 day: 02 |
| PublicationDecade | 2020 |
| PublicationTitle | Digest of technical papers - IEEE/ACM International Conference on Computer-Aided Design |
| PublicationTitleAbbrev | ICCAD |
| PublicationYear | 2020 |
| Publisher | Association on Computer Machinery |
| Publisher_xml | – name: Association on Computer Machinery |
| SSID | ssib044045988 ssj0020286 |
| Score | 2.494524 |
| Snippet | Transformer has emerged as a popular deep neural network (DNN) model for Neural Language Processing (NLP) applications and demonstrated excellent performance... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Acceleration autoregressive decoder Computational modeling Computer architecture convolutional neural networks Decoding deep neural network model hardware acceleration solution learning (artificial intelligence) mathematics computing matrix decomposition matrix multiplication matrix-matrix multiplication memory architecture multi-threading natural language processing neural language processing applications neural machine translation performance evaluation Pipelines processing-in-memory recurrent neural nets recurrent neural networks ReRAM ReRAM-based PIM architecture ReRAM-based processing-in-memory architecture ReTransformer scaled dot-product attention mechanism submatrix pipeline design Transformer Virtual machine monitors |
| Title | ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration |
| URI | https://ieeexplore.ieee.org/document/9256523 |
| WOSCitedRecordID | wos000671087100051&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LawIxEB5UemgvfWjpmxx6bHSzScymNymVXhQRC_YkecyCl7VYLfTfN1nXdg-99LaEBEJ2yDffTOYbgPvgCWWJM45aKzMqbJpToz2jmmMAIN7nnLuy2YQaj7P5XE8a8PBTC4OI5eMz7MbPMpfvV24bQ2U9HfA5EKcmNJXq72q19rYTZe5kKb1Vka2Am_1KyocJ2eMimnPa5ZGwxEhHrZdKCSXD4_9t4gQ6vzV5ZPKDNqfQwOIMjmpygm14m-Js74fi-pFMcToY0YhTnlQFAWEeXRZ0FN_XfpFBLYtAwipSW04GzgVI2hlIB16Hz7OnF1q1TqAmFWpDnTIemRbGxMSXUUZrFInnDo20OeoM8zxyJals7kWeWMasEzaTKL1iTvBzaBWrAi-AqCzxwWdKveBehKvRMOXR-jDTGs21vIR2PKTF-04dY1Gdz9Xfw9dwmEbGGgOz6Q20Nust3sKB-9wsP9Z35S_9BthsovY |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEJ4gmqgXH2B824NHC7vbLt16I0aCEQghmOCJ9DGbcFkMgon_3nZZcA9evDWbNtlMJ_36zXS-Abh3N6EkMMpQreOEch2lVEkbUsnQARBrMcZM3mxCDAbJZCKHFXjY1sIgYv74DBt-mOfy7dysfKisKR0-O-K0A7u-c1ZRrbXxHi90F-fiWwXdcsjZKsR8Qh43GfcOHTWYpyw-1lHqppKDSefof79xDPXfqjwy3OLNCVQwO4XDkqBgDd5HON7cRHHxSEY4avepRypLipIAN4_OMtr3L2y_SbuURyBuFSktJ21jHCitXaQOb53n8VOXFs0TqIq4WFIjlMVQcqV86ksJJSXywDKDKtYpygTT1LOlWOjU8jTQYagN10mMsRWh4ewMqtk8w3MgIgmss3dkObPcHY4qFBa1dTO1kkzGF1DzRpp-rPUxpoV9Lv_-fAf73XG_N-29DF6v4CDy_NWHaaNrqC4XK7yBPfO1nH0ubvPt_QGEVaY_ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Digest+of+technical+papers+-+IEEE%2FACM+International+Conference+on+Computer-Aided+Design&rft.atitle=ReTransformer%3A+ReRAM-based+Processing-in-Memory+Architecture+for+Transformer+Acceleration&rft.au=Yang%2C+Xiaoxuan&rft.au=Yan%2C+Bonan&rft.au=Li%2C+Hai&rft.au=Chen%2C+Yiran&rft.date=2020-11-02&rft.pub=Association+on+Computer+Machinery&rft.eissn=1558-2434&rft.spage=1&rft.epage=9&rft_id=info:doi/10.1145%2F3400302.3415640&rft.externalDocID=9256523 |