Low-Complexity Distributed-Arithmetic-Based Pipelined Architecture for an LSTM Network
Long short-term memory (LSTM) networks have addressed the shortcomings of recurrent neural networks, such as vanishing gradients and the lack of ability in developing connections across discontinuous parts of sequences. However, the implementations of state-of-the-art LSTM networks face the computat...
Saved in:
| Published in: | IEEE transactions on very large scale integration (VLSI) systems Vol. 28; no. 2; pp. 329 - 338 |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York
IEEE
01.02.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 1063-8210, 1557-9999 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Long short-term memory (LSTM) networks have addressed the shortcomings of recurrent neural networks, such as vanishing gradients and the lack of ability in developing connections across discontinuous parts of sequences. However, the implementations of state-of-the-art LSTM networks face the computational bottleneck of having multiple high-order matrix-vector multiplications (MVMs). This article presents a generalized approach to accelerate a circulant MVM (C-MVM), and hence, it is applicable to many neural networks. The proposed scheme presents a novel low-complexity distributed arithmetic (DA) architecture for optimizing C-MVMs. Unlike conventional offset binary coding-based DA (OBC-DA), it is based on separate generation and selection of partial products. Only one partial product generator (PPG) with several partial product selectors (PPSs) is required. The complexity of PPSs is reduced by sharing the minterms across Boolean expressions. Fine-grained pipelining is employed to achieve approximately one adder delay. From the implementation results, the proposed design with 512 × 512 LSTM layer occupies 74.54% less core area, consumes 68.66% less core power, offers 2.61 times more throughput, and 3.89 times more hardware efficiency over the best existing design. |
|---|---|
| AbstractList | Long short-term memory (LSTM) networks have addressed the shortcomings of recurrent neural networks, such as vanishing gradients and the lack of ability in developing connections across discontinuous parts of sequences. However, the implementations of state-of-the-art LSTM networks face the computational bottleneck of having multiple high-order matrix–vector multiplications (MVMs). This article presents a generalized approach to accelerate a circulant MVM (C-MVM), and hence, it is applicable to many neural networks. The proposed scheme presents a novel low-complexity distributed arithmetic (DA) architecture for optimizing C-MVMs. Unlike conventional offset binary coding-based DA (OBC-DA), it is based on separate generation and selection of partial products. Only one partial product generator (PPG) with several partial product selectors (PPSs) is required. The complexity of PPSs is reduced by sharing the minterms across Boolean expressions. Fine-grained pipelining is employed to achieve approximately one adder delay. From the implementation results, the proposed design with [Formula Omitted] LSTM layer occupies 74.54% less core area, consumes 68.66% less core power, offers 2.61 times more throughput, and 3.89 times more hardware efficiency over the best existing design. Long short-term memory (LSTM) networks have addressed the shortcomings of recurrent neural networks, such as vanishing gradients and the lack of ability in developing connections across discontinuous parts of sequences. However, the implementations of state-of-the-art LSTM networks face the computational bottleneck of having multiple high-order matrix-vector multiplications (MVMs). This article presents a generalized approach to accelerate a circulant MVM (C-MVM), and hence, it is applicable to many neural networks. The proposed scheme presents a novel low-complexity distributed arithmetic (DA) architecture for optimizing C-MVMs. Unlike conventional offset binary coding-based DA (OBC-DA), it is based on separate generation and selection of partial products. Only one partial product generator (PPG) with several partial product selectors (PPSs) is required. The complexity of PPSs is reduced by sharing the minterms across Boolean expressions. Fine-grained pipelining is employed to achieve approximately one adder delay. From the implementation results, the proposed design with 512 × 512 LSTM layer occupies 74.54% less core area, consumes 68.66% less core power, offers 2.61 times more throughput, and 3.89 times more hardware efficiency over the best existing design. |
| Author | Khan, Mohd. Tasleem Shaik, Rafi Ahamed Yalamarthy, Krishna Praveen Dhall, Saurabh |
| Author_xml | – sequence: 1 givenname: Krishna Praveen surname: Yalamarthy fullname: Yalamarthy, Krishna Praveen email: y.krishna@iitg.ac.in organization: Georgia Institute of Technology, Atlanta, GA, USA – sequence: 2 givenname: Saurabh orcidid: 0000-0002-7764-6681 surname: Dhall fullname: Dhall, Saurabh email: d.saurabh@iitg.ac.in organization: Samsung R & D Institute, Noida, India – sequence: 3 givenname: Mohd. Tasleem orcidid: 0000-0001-6106-1534 surname: Khan fullname: Khan, Mohd. Tasleem email: tasleem@iitg.ac.in organization: Taiwan Semiconductor Manufacturing Company Limited (TSMC), Hsinchu, Taiwan – sequence: 4 givenname: Rafi Ahamed orcidid: 0000-0003-1617-2299 surname: Shaik fullname: Shaik, Rafi Ahamed email: rafiahamed@iitg.ac.in organization: Department of Electronics and Electrical Engineering, IIT Guwahati, Guwahati, India |
| BookMark | eNp9kEtPwzAMxyMEEtvgC8ClEueOJG3a5DjGa1J5SKt2rbLW0TK6pqSpxr49GZs4cMAXW7Z_fvyH6LQxDSB0RfCYECxu80U2n40pJmJMRUwEJSdoQBhLQ-Ht1Mc4iUJOCT5Hw65bY0ziWOABWmRmG07Npq3hS7tdcK87Z_Wyd1CFE6vdagNOl-Gd7KAK3nULtW58NLHlSjsoXW8hUMYGsgmyef4SvILbGvtxgc6UrDu4PPoRyh8f8ulzmL09zaaTLCypYC5UXIhqWcqE-JsVYzIu42QpK05SEP6JNI0qonAiFU590ecYpxS4xDgRikYjdHMY21rz2UPnirXpbeM3FjSKGaOUp9x38UNXaU3XWVBFqZ102jTOSl0XBBd7EYsfEYu9iMVRRI_SP2hr9Uba3f_Q9QHSAPALcJ6QCOPoG9yVf4U |
| CODEN | IEVSE9 |
| CitedBy_id | crossref_primary_10_1109_LCA_2024_3379002 crossref_primary_10_1109_TCSI_2022_3217091 crossref_primary_10_1109_TVLSI_2021_3135353 crossref_primary_10_1145_3534969 crossref_primary_10_1109_ACCESS_2025_3591720 crossref_primary_10_1109_ACCESS_2025_3604713 crossref_primary_10_1109_TCSI_2024_3464687 crossref_primary_10_1109_TCSII_2022_3196398 crossref_primary_10_1109_ACCESS_2025_3555583 crossref_primary_10_1109_ACCESS_2025_3591772 crossref_primary_10_1109_JETCAS_2023_3330428 crossref_primary_10_1007_s10470_025_02488_9 crossref_primary_10_1109_TVLSI_2025_3528244 crossref_primary_10_1145_3699512 crossref_primary_10_1007_s11063_023_11187_3 crossref_primary_10_1109_TCSI_2022_3153560 crossref_primary_10_1109_TNNLS_2024_3425569 crossref_primary_10_1109_TVLSI_2023_3294571 crossref_primary_10_1007_s00034_023_02456_6 crossref_primary_10_1007_s00034_023_02412_4 |
| Cites_doi | 10.1109/HPCA.2019.00028 10.1145/2847263.2847265 10.1109/ICCV.2015.327 10.1109/MSP.2012.2205597 10.1145/3020078.3021745 10.1145/3007787.3001163 10.1109/ISVLSI.2016.129 10.1109/ASAP.2008.4580190 10.1145/2644865.2541967 10.1145/2872887.2750389 10.1109/IJCNN.2000.861302 10.1109/SIPS.1997.626129 10.1109/TSP.2006.881269 10.1109/TCSI.2018.2867291 10.1109/SiPS.2016.48 10.1109/TCSII.2013.2251968 10.1109/TVLSI.2017.2717950 10.1145/3174243.3174253 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020 |
| DBID | 97E RIA RIE AAYXX CITATION 7SP 8FD L7M |
| DOI | 10.1109/TVLSI.2019.2941921 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Electronics & Communications Abstracts Technology Research Database Advanced Technologies Database with Aerospace |
| DatabaseTitle | CrossRef Technology Research Database Advanced Technologies Database with Aerospace Electronics & Communications Abstracts |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 1557-9999 |
| EndPage | 338 |
| ExternalDocumentID | 10_1109_TVLSI_2019_2941921 8861300 |
| Genre | orig-research |
| GroupedDBID | -~X .DC 0R~ 29I 3EH 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD HZ~ H~9 ICLAB IEDLZ IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P RIA RIE RNS TN5 VH1 AAYXX CITATION 7SP 8FD L7M |
| ID | FETCH-LOGICAL-c295t-f899dbca61419f55a4c46bad817e9921773d1f06af075a4e995822e8a0069f23 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 23 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000510674300003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1063-8210 |
| IngestDate | Sun Nov 09 06:40:41 EST 2025 Sat Nov 29 03:36:16 EST 2025 Tue Nov 18 22:12:33 EST 2025 Wed Aug 27 02:40:22 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 2 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c295t-f899dbca61419f55a4c46bad817e9921773d1f06af075a4e995822e8a0069f23 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0001-6106-1534 0000-0002-7764-6681 0000-0003-1617-2299 |
| PQID | 2345522878 |
| PQPubID | 85424 |
| PageCount | 10 |
| ParticipantIDs | ieee_primary_8861300 proquest_journals_2345522878 crossref_citationtrail_10_1109_TVLSI_2019_2941921 crossref_primary_10_1109_TVLSI_2019_2941921 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-02-01 |
| PublicationDateYYYYMMDD | 2020-02-01 |
| PublicationDate_xml | – month: 02 year: 2020 text: 2020-02-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on very large scale integration (VLSI) systems |
| PublicationTitleAbbrev | TVLSI |
| PublicationYear | 2020 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 srivastava (ref3) 2015 ref12 ref15 ref14 ref11 ref10 han (ref7) 2015 ref2 ref17 ref19 ref18 parhi (ref21) 2007 chang (ref16) 2015 ref24 ref23 ref25 ref20 ref22 sak (ref5) 2014 parthasarathi (ref6) 2019 chen (ref8) 2014; 49 ref9 ref4 sutskever (ref1) 2014 |
| References_xml | – start-page: 6670 year: 2019 ident: ref6 article-title: Lessons from building acoustic models with a million hours of speech publication-title: Proc ICASSP IEEE Int Conf Acoust Speech Signal Process (ICASSP) – ident: ref4 doi: 10.1109/HPCA.2019.00028 – ident: ref12 doi: 10.1145/2847263.2847265 – ident: ref20 doi: 10.1109/ICCV.2015.327 – year: 2015 ident: ref7 article-title: Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding publication-title: arXiv 1510 00149 [cs] – ident: ref2 doi: 10.1109/MSP.2012.2205597 – start-page: 3104 year: 2014 ident: ref1 article-title: Sequence to sequence learning with neural networks publication-title: Proc Adv Neural Inf Process Syst – ident: ref14 doi: 10.1145/3020078.3021745 – ident: ref10 doi: 10.1145/3007787.3001163 – ident: ref11 doi: 10.1109/ISVLSI.2016.129 – ident: ref22 doi: 10.1109/ASAP.2008.4580190 – volume: 49 start-page: 269 year: 2014 ident: ref8 article-title: DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning publication-title: ACM SIGPLAN Notices doi: 10.1145/2644865.2541967 – ident: ref9 doi: 10.1145/2872887.2750389 – ident: ref19 doi: 10.1109/IJCNN.2000.861302 – start-page: 843 year: 2015 ident: ref3 article-title: Unsupervised learning of video representations using LSTMs publication-title: Proc Int Conf Mach Learn – year: 2015 ident: ref16 article-title: Recurrent neural networks hardware implementation on FPGA publication-title: arXiv 1511 05552 – ident: ref23 doi: 10.1109/SIPS.1997.626129 – start-page: 338 year: 2014 ident: ref5 article-title: Long short-term memory recurrent neural network architectures for large scale acoustic modeling publication-title: Proc Annu Conf Int Speech Commun Assoc – year: 2007 ident: ref21 publication-title: VLSI Digital Signal Processing Systems Design and Implementation – ident: ref24 doi: 10.1109/TSP.2006.881269 – ident: ref18 doi: 10.1109/TCSI.2018.2867291 – ident: ref25 doi: 10.1109/SiPS.2016.48 – ident: ref17 doi: 10.1109/TCSII.2013.2251968 – ident: ref15 doi: 10.1109/TVLSI.2017.2717950 – ident: ref13 doi: 10.1145/3174243.3174253 |
| SSID | ssj0014490 |
| Score | 2.3872569 |
| Snippet | Long short-term memory (LSTM) networks have addressed the shortcomings of recurrent neural networks, such as vanishing gradients and the lack of ability in... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 329 |
| SubjectTerms | Acceleration Arithmetic Binary codes Boolean algebra Complexity Complexity theory Computer architecture Distributed arithmetic (DA) Hardware Logic gates long short-term memory (LSTM) networks Mathematical analysis Matrix algebra Matrix methods Memory management Neural networks offset binary coding (OBC) Power consumption Recurrent neural networks Selectors Table lookup |
| Title | Low-Complexity Distributed-Arithmetic-Based Pipelined Architecture for an LSTM Network |
| URI | https://ieeexplore.ieee.org/document/8861300 https://www.proquest.com/docview/2345522878 |
| Volume | 28 |
| WOSCitedRecordID | wos000510674300003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1557-9999 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014490 issn: 1063-8210 databaseCode: RIE dateStart: 19930101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEF5q8aAHX1WsVsnBm26bd3aP9VEUaim0lN7CZrOLBU1Lmyr-e2c2aakogrcl7EKYbyczk5n5hpArW7t-mKgQVJyH1BcipcKXmHAUgZZc2r4w7PrdqNdj4zHvV8jNuhdGKWWKz1QTlyaXn07lEn-VtRhDbxcC9K0oCoterXXGwPd5wTwQepRBHLNqkLF5azjqDp6wios3XY5ZT-ebETJTVX58io196ez_780OyF7pR1rtAvhDUlHZEdndYBeskVF3-kFR35HzMv-07pEjF8dbqZS255P85Q0bGOkt2LHU6k9m2JkOq_ZGasECl9YSmdUdDJ-tXlExfkyGnYfh3SMtxyhQ6fIgpxpCqjSRAgyxw3UQABYAj0iZEykOwogiL3W0HQoN7oPw4VkAXoNiAlmMteudkGo2zdQpsRLHSRTDGTEc5K1Uoj071XYSgElzhZZ14qzEGsuSYhwnXbzGJtSweWygiBGKuISiTq7XZ2YFwcafu2so_PXOUu510lihF5c6uIhdzw_Au2QRO_v91DnZcTF6NjXYDVLN50t1Qbblez5ZzC_N9foCscrMfw |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fa9swED5CVtj6sHbrRtOlrR_21inxD9mWHtOtpaFuKNSUvhlZllhgS0rqtPS_353shJSNQd-EkcDcp_Pd-e6-A_jq25AnpUlQxWXCuFIVU1xTwlHFVkvtc-XY9bN0MhF3d_K6A9_WvTDGGFd8Zga0dLn8aq6X9KtsKAR5uxigv4k5D_2mW2udM-BcNtwDScQERjKrFhlfDvPb7GZMdVxyEErKewYvzJCbq_LXx9hZmPOd173bLrxvPUlv1ED_ATpm9hG2N_gF9-A2mz8x0nhivayfvR_EkksDrkzFRotp_fM3tTCyU7RklXc9vafedFyNNpILHjq1npp52U1-5U2amvFPkJ-f5d8vWDtIgelQxjWzGFRVpVZoigNp4xjRQIBUJYLUSBRGmkZVYP1EWXQgFMdnMfoNRijiMbZh9Bm6s_nM7INXBkFpBE2JkShvY0ob-ZX1yxiNWqis7kGwEmuhW5JxmnXxq3DBhi8LB0VBUBQtFD04WZ-5byg2_rt7j4S_3tnKvQf9FXpFq4UPRRjxGP1LkYqDf586hrcX-VVWZOPJ5Rd4F1Is7Sqy-9CtF0tzCFv6sZ4-LI7cVfsDf7DPxg |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Low-Complexity+Distributed-Arithmetic-Based+Pipelined+Architecture+for+an+LSTM+Network&rft.jtitle=IEEE+transactions+on+very+large+scale+integration+%28VLSI%29+systems&rft.au=Yalamarthy%2C+Krishna+Praveen&rft.au=Dhall%2C+Saurabh&rft.au=Khan%2C+Mohd.+Tasleem&rft.au=Shaik%2C+Rafi+Ahamed&rft.date=2020-02-01&rft.issn=1063-8210&rft.eissn=1557-9999&rft.volume=28&rft.issue=2&rft.spage=329&rft.epage=338&rft_id=info:doi/10.1109%2FTVLSI.2019.2941921&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TVLSI_2019_2941921 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-8210&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-8210&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-8210&client=summon |