ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration
Transformer has emerged as a popular deep neural network (DNN) model for Neural Language Processing (NLP) applications and demonstrated excellent performance in neural machine translation, entity recognition, etc. However, its scaled dot-product attention mechanism in auto-regressive decoder brings...
Saved in:
| Published in: | Digest of technical papers - IEEE/ACM International Conference on Computer-Aided Design pp. 1 - 9 |
|---|---|
| Main Authors: | , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
Association on Computer Machinery
02.11.2020
|
| Subjects: | |
| ISSN: | 1558-2434 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Transformer has emerged as a popular deep neural network (DNN) model for Neural Language Processing (NLP) applications and demonstrated excellent performance in neural machine translation, entity recognition, etc. However, its scaled dot-product attention mechanism in auto-regressive decoder brings a performance bottleneck during inference. Transformer is also computationally and memory intensive and demands for a hardware acceleration solution. Although researchers have successfully applied ReRAM-based Processing-in-Memory (PIM) to accelerate convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the unique computation process of the scaled dot-product attention in Transformer makes it difficult to directly apply these designs. Besides, how to handle intermediate results in Matrix-matrix Multiplication (MatMul) and how to design a pipeline at a finer granularity of Transformer remain unsolved. In this work, we propose ReTransformer - a ReRAM-based PIM architecture for Transformer acceleration. ReTransformer can not only accelerate the scaled dot-product attention of Transformer using ReRAM-based PIM but also eliminate some data dependency by avoiding writing the intermediate results using the proposed matrix decomposition technique. Moreover, we propose a new sub-matrix pipeline design for multi-head self-attention. Experimental results show that compared to GPU and Pipelayer, ReTransformer improves computing efficiency by 23.21× and 3.25×, respectively. The corresponding overall power is reduced by 1086× and 2.82×, respectively. |
|---|---|
| AbstractList | Transformer has emerged as a popular deep neural network (DNN) model for Neural Language Processing (NLP) applications and demonstrated excellent performance in neural machine translation, entity recognition, etc. However, its scaled dot-product attention mechanism in auto-regressive decoder brings a performance bottleneck during inference. Transformer is also computationally and memory intensive and demands for a hardware acceleration solution. Although researchers have successfully applied ReRAM-based Processing-in-Memory (PIM) to accelerate convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the unique computation process of the scaled dot-product attention in Transformer makes it difficult to directly apply these designs. Besides, how to handle intermediate results in Matrix-matrix Multiplication (MatMul) and how to design a pipeline at a finer granularity of Transformer remain unsolved. In this work, we propose ReTransformer - a ReRAM-based PIM architecture for Transformer acceleration. ReTransformer can not only accelerate the scaled dot-product attention of Transformer using ReRAM-based PIM but also eliminate some data dependency by avoiding writing the intermediate results using the proposed matrix decomposition technique. Moreover, we propose a new sub-matrix pipeline design for multi-head self-attention. Experimental results show that compared to GPU and Pipelayer, ReTransformer improves computing efficiency by 23.21× and 3.25×, respectively. The corresponding overall power is reduced by 1086× and 2.82×, respectively. |
| Author | Yang, Xiaoxuan Chen, Yiran Li, Hai Yan, Bonan |
| Author_xml | – sequence: 1 givenname: Xiaoxuan surname: Yang fullname: Yang, Xiaoxuan email: xy92@duke.edu organization: Duke University,Durham,NC,USA – sequence: 2 givenname: Bonan surname: Yan fullname: Yan, Bonan email: bonan.yan@duke.edu organization: Duke University,Durham,NC,USA – sequence: 3 givenname: Hai surname: Li fullname: Li, Hai email: hai.li@duke.edu organization: Duke University,Durham,NC,USA – sequence: 4 givenname: Yiran surname: Chen fullname: Chen, Yiran email: yiran.chen@duke.edu organization: Duke University,Durham,NC,USA |
| BookMark | eNpNjL1OwzAYRQ0CiVIyM7DkBVz88zm22aKKP6kVKCoDU2U7nyFS4yAnDH17IsHAdM9wzr0kZ2lISMg1ZyvOQd1KYEwysZLAVQXshBRWG15VCoQUIE_Jgitl6IxwQYpx7DwDYKCsMQvy3uAuuzTGIfeY78oGm3pLvRuxLV_zEHD20wftEt1iP-RjWefw2U0Ypu-M5VyV__KyDgEPmN3UDemKnEd3GLH42yV5e7jfrZ_o5uXxeV1vqBOgJxq0a5FbcI5Zy5x21iKwVgZ0yke0BmMURnOlfWwhMs-5D-CNQtVqHkAuyc3vb4eI-6_c9S4f91aoSgkpfwAj6lXI |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1145/3400302.3415640 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISBN | 9781665423243 1665423242 |
| EISSN | 1558-2434 |
| EndPage | 9 |
| ExternalDocumentID | 9256523 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: ARO grantid: W911NF-19-2-0107. funderid: 10.13039/100000183 – fundername: NSF grantid: 1955246,1910299,1725456 funderid: 10.13039/501100001809 |
| GroupedDBID | 6IE 6IF 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO FEDTE IEGSK IJVOP M43 OCL RIE RIL RIO |
| ID | FETCH-LOGICAL-a247t-c7ade194aa0990a7a99e40d3cea5bfe98eff287157bfd4f0b11bc4b85e5d71c43 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 92 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000671087100051&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:28:32 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a247t-c7ade194aa0990a7a99e40d3cea5bfe98eff287157bfd4f0b11bc4b85e5d71c43 |
| PageCount | 9 |
| ParticipantIDs | ieee_primary_9256523 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-Nov.-2 |
| PublicationDateYYYYMMDD | 2020-11-02 |
| PublicationDate_xml | – month: 11 year: 2020 text: 2020-Nov.-2 day: 02 |
| PublicationDecade | 2020 |
| PublicationTitle | Digest of technical papers - IEEE/ACM International Conference on Computer-Aided Design |
| PublicationTitleAbbrev | ICCAD |
| PublicationYear | 2020 |
| Publisher | Association on Computer Machinery |
| Publisher_xml | – name: Association on Computer Machinery |
| SSID | ssib044045988 ssj0020286 |
| Score | 2.494524 |
| Snippet | Transformer has emerged as a popular deep neural network (DNN) model for Neural Language Processing (NLP) applications and demonstrated excellent performance... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Acceleration autoregressive decoder Computational modeling Computer architecture convolutional neural networks Decoding deep neural network model hardware acceleration solution learning (artificial intelligence) mathematics computing matrix decomposition matrix multiplication matrix-matrix multiplication memory architecture multi-threading natural language processing neural language processing applications neural machine translation performance evaluation Pipelines processing-in-memory recurrent neural nets recurrent neural networks ReRAM ReRAM-based PIM architecture ReRAM-based processing-in-memory architecture ReTransformer scaled dot-product attention mechanism submatrix pipeline design Transformer Virtual machine monitors |
| Title | ReTransformer: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration |
| URI | https://ieeexplore.ieee.org/document/9256523 |
| WOSCitedRecordID | wos000671087100051&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LawIxEA4qPbSXPrT0TQ49NrqP2WTTm5RKL4qIBXuSSTIBL1qsFvrvu1nXdg-99BZCBkIy8OWbyXzD2L1MnJIQa5GDcQJIgjAudsJjpn1x55h6LJtNqNEon830uMEefmphiKj8fEbdMCxz-W5ltyFU1tMFPhfEqcmaSsldrdbed4LMXVZKb1Vkq8BNWUn5xJD1UgjunHTTQFhCpKPWS6WEksHx_zZxwjq_NXl8_IM2p6xByzN2VJMTbLO3CU3371BaP_IJTfpDEXDK8aogoFgnFksxDP9rv3i_lkXghRWvmfO-tQUk7Rykw14Hz9OnF1G1ThCYgNoIq9BRrAExJL5QodYEkUstYWY86Zy8D1wpU8Y78JGJY2PB5BllTsUW0nPWWq6WdMG4RG1tbsDKBMER5uiiiHxO0ksPGF2ydjik-ftOHWNenc_V39PX7DAJjDUEZpMb1tqst3TLDuznZvGxviuv9BvFUaRV |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5qFdSLj1Z8m4NH0-4jm914K2Kp2JZSKtRTyWMCvWxlbQX_vZvttu7Bi7cQMhCSgS_fTOYbgHsemJgzX9CEKUMZckaV8Q21MhI2v3MZWlk0m4iHw2Q6FaMaPGxrYRCx-HyGLTcscvlmoVcuVNYWOT7nxGkHdl3nrLJaa-M9TuguKsS3SrqVIycvxXx8FrVD5hw6aIWOsrhYR6WbSgEm3aP_beMYmr9VeWS0xZsTqGF6CocVQcEGvI9xsnmJYvZIxjjuDKhDKkPKkoB8HZ2ndOB-2H6TTiWPQHIrUjEnHa1zUFq7SBPeus-Tpx4tmydQGbB4SXUsDfqCSelSXzKWQiDzTKhRRsqiSNBax5aiWFnDrKd8X2mmkggjE_uahWdQTxcpngPhUmidKKZ5IJlBmUjjeWgT5JZbJr0LaLhDmn2s9TFm5flc_j19B_u9yaA_678MX6_gIHD81YVpg2uoL7MV3sCe_lrOP7Pb4np_AIIsp54 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Digest+of+technical+papers+-+IEEE%2FACM+International+Conference+on+Computer-Aided+Design&rft.atitle=ReTransformer%3A+ReRAM-based+Processing-in-Memory+Architecture+for+Transformer+Acceleration&rft.au=Yang%2C+Xiaoxuan&rft.au=Yan%2C+Bonan&rft.au=Li%2C+Hai&rft.au=Chen%2C+Yiran&rft.date=2020-11-02&rft.pub=Association+on+Computer+Machinery&rft.eissn=1558-2434&rft.spage=1&rft.epage=9&rft_id=info:doi/10.1145%2F3400302.3415640&rft.externalDocID=9256523 |