Low-Complexity Distributed-Arithmetic-Based Pipelined Architecture for an LSTM Network

Long short-term memory (LSTM) networks have addressed the shortcomings of recurrent neural networks, such as vanishing gradients and the lack of ability in developing connections across discontinuous parts of sequences. However, the implementations of state-of-the-art LSTM networks face the computat...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on very large scale integration (VLSI) systems Vol. 28; no. 2; pp. 329 - 338
Main Authors: Yalamarthy, Krishna Praveen, Dhall, Saurabh, Khan, Mohd. Tasleem, Shaik, Rafi Ahamed
Format: Journal Article
Language:English
Published: New York IEEE 01.02.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:1063-8210, 1557-9999
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Long short-term memory (LSTM) networks have addressed the shortcomings of recurrent neural networks, such as vanishing gradients and the lack of ability in developing connections across discontinuous parts of sequences. However, the implementations of state-of-the-art LSTM networks face the computational bottleneck of having multiple high-order matrix-vector multiplications (MVMs). This article presents a generalized approach to accelerate a circulant MVM (C-MVM), and hence, it is applicable to many neural networks. The proposed scheme presents a novel low-complexity distributed arithmetic (DA) architecture for optimizing C-MVMs. Unlike conventional offset binary coding-based DA (OBC-DA), it is based on separate generation and selection of partial products. Only one partial product generator (PPG) with several partial product selectors (PPSs) is required. The complexity of PPSs is reduced by sharing the minterms across Boolean expressions. Fine-grained pipelining is employed to achieve approximately one adder delay. From the implementation results, the proposed design with 512 × 512 LSTM layer occupies 74.54% less core area, consumes 68.66% less core power, offers 2.61 times more throughput, and 3.89 times more hardware efficiency over the best existing design.
AbstractList Long short-term memory (LSTM) networks have addressed the shortcomings of recurrent neural networks, such as vanishing gradients and the lack of ability in developing connections across discontinuous parts of sequences. However, the implementations of state-of-the-art LSTM networks face the computational bottleneck of having multiple high-order matrix–vector multiplications (MVMs). This article presents a generalized approach to accelerate a circulant MVM (C-MVM), and hence, it is applicable to many neural networks. The proposed scheme presents a novel low-complexity distributed arithmetic (DA) architecture for optimizing C-MVMs. Unlike conventional offset binary coding-based DA (OBC-DA), it is based on separate generation and selection of partial products. Only one partial product generator (PPG) with several partial product selectors (PPSs) is required. The complexity of PPSs is reduced by sharing the minterms across Boolean expressions. Fine-grained pipelining is employed to achieve approximately one adder delay. From the implementation results, the proposed design with [Formula Omitted] LSTM layer occupies 74.54% less core area, consumes 68.66% less core power, offers 2.61 times more throughput, and 3.89 times more hardware efficiency over the best existing design.
Long short-term memory (LSTM) networks have addressed the shortcomings of recurrent neural networks, such as vanishing gradients and the lack of ability in developing connections across discontinuous parts of sequences. However, the implementations of state-of-the-art LSTM networks face the computational bottleneck of having multiple high-order matrix-vector multiplications (MVMs). This article presents a generalized approach to accelerate a circulant MVM (C-MVM), and hence, it is applicable to many neural networks. The proposed scheme presents a novel low-complexity distributed arithmetic (DA) architecture for optimizing C-MVMs. Unlike conventional offset binary coding-based DA (OBC-DA), it is based on separate generation and selection of partial products. Only one partial product generator (PPG) with several partial product selectors (PPSs) is required. The complexity of PPSs is reduced by sharing the minterms across Boolean expressions. Fine-grained pipelining is employed to achieve approximately one adder delay. From the implementation results, the proposed design with 512 × 512 LSTM layer occupies 74.54% less core area, consumes 68.66% less core power, offers 2.61 times more throughput, and 3.89 times more hardware efficiency over the best existing design.
Author Khan, Mohd. Tasleem
Shaik, Rafi Ahamed
Yalamarthy, Krishna Praveen
Dhall, Saurabh
Author_xml – sequence: 1
  givenname: Krishna Praveen
  surname: Yalamarthy
  fullname: Yalamarthy, Krishna Praveen
  email: y.krishna@iitg.ac.in
  organization: Georgia Institute of Technology, Atlanta, GA, USA
– sequence: 2
  givenname: Saurabh
  orcidid: 0000-0002-7764-6681
  surname: Dhall
  fullname: Dhall, Saurabh
  email: d.saurabh@iitg.ac.in
  organization: Samsung R & D Institute, Noida, India
– sequence: 3
  givenname: Mohd. Tasleem
  orcidid: 0000-0001-6106-1534
  surname: Khan
  fullname: Khan, Mohd. Tasleem
  email: tasleem@iitg.ac.in
  organization: Taiwan Semiconductor Manufacturing Company Limited (TSMC), Hsinchu, Taiwan
– sequence: 4
  givenname: Rafi Ahamed
  orcidid: 0000-0003-1617-2299
  surname: Shaik
  fullname: Shaik, Rafi Ahamed
  email: rafiahamed@iitg.ac.in
  organization: Department of Electronics and Electrical Engineering, IIT Guwahati, Guwahati, India
BookMark eNp9kEtPwzAMxyMEEtvgC8ClEueOJG3a5DjGa1J5SKt2rbLW0TK6pqSpxr49GZs4cMAXW7Z_fvyH6LQxDSB0RfCYECxu80U2n40pJmJMRUwEJSdoQBhLQ-Ht1Mc4iUJOCT5Hw65bY0ziWOABWmRmG07Npq3hS7tdcK87Z_Wyd1CFE6vdagNOl-Gd7KAK3nULtW58NLHlSjsoXW8hUMYGsgmyef4SvILbGvtxgc6UrDu4PPoRyh8f8ulzmL09zaaTLCypYC5UXIhqWcqE-JsVYzIu42QpK05SEP6JNI0qonAiFU590ecYpxS4xDgRikYjdHMY21rz2UPnirXpbeM3FjSKGaOUp9x38UNXaU3XWVBFqZ102jTOSl0XBBd7EYsfEYu9iMVRRI_SP2hr9Uba3f_Q9QHSAPALcJ6QCOPoG9yVf4U
CODEN IEVSE9
CitedBy_id crossref_primary_10_1109_LCA_2024_3379002
crossref_primary_10_1109_TCSI_2022_3217091
crossref_primary_10_1109_TVLSI_2021_3135353
crossref_primary_10_1145_3534969
crossref_primary_10_1109_ACCESS_2025_3591720
crossref_primary_10_1109_ACCESS_2025_3604713
crossref_primary_10_1109_TCSI_2024_3464687
crossref_primary_10_1109_TCSII_2022_3196398
crossref_primary_10_1109_ACCESS_2025_3555583
crossref_primary_10_1109_ACCESS_2025_3591772
crossref_primary_10_1109_JETCAS_2023_3330428
crossref_primary_10_1007_s10470_025_02488_9
crossref_primary_10_1109_TVLSI_2025_3528244
crossref_primary_10_1145_3699512
crossref_primary_10_1007_s11063_023_11187_3
crossref_primary_10_1109_TCSI_2022_3153560
crossref_primary_10_1109_TNNLS_2024_3425569
crossref_primary_10_1109_TVLSI_2023_3294571
crossref_primary_10_1007_s00034_023_02456_6
crossref_primary_10_1007_s00034_023_02412_4
Cites_doi 10.1109/HPCA.2019.00028
10.1145/2847263.2847265
10.1109/ICCV.2015.327
10.1109/MSP.2012.2205597
10.1145/3020078.3021745
10.1145/3007787.3001163
10.1109/ISVLSI.2016.129
10.1109/ASAP.2008.4580190
10.1145/2644865.2541967
10.1145/2872887.2750389
10.1109/IJCNN.2000.861302
10.1109/SIPS.1997.626129
10.1109/TSP.2006.881269
10.1109/TCSI.2018.2867291
10.1109/SiPS.2016.48
10.1109/TCSII.2013.2251968
10.1109/TVLSI.2017.2717950
10.1145/3174243.3174253
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
DBID 97E
RIA
RIE
AAYXX
CITATION
7SP
8FD
L7M
DOI 10.1109/TVLSI.2019.2941921
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Electronics & Communications Abstracts
Technology Research Database
Advanced Technologies Database with Aerospace
DatabaseTitle CrossRef
Technology Research Database
Advanced Technologies Database with Aerospace
Electronics & Communications Abstracts
DatabaseTitleList Technology Research Database

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1557-9999
EndPage 338
ExternalDocumentID 10_1109_TVLSI_2019_2941921
8861300
Genre orig-research
GroupedDBID -~X
.DC
0R~
29I
3EH
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
HZ~
H~9
ICLAB
IEDLZ
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
RIA
RIE
RNS
TN5
VH1
AAYXX
CITATION
7SP
8FD
L7M
ID FETCH-LOGICAL-c295t-f899dbca61419f55a4c46bad817e9921773d1f06af075a4e995822e8a0069f23
IEDL.DBID RIE
ISICitedReferencesCount 23
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000510674300003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1063-8210
IngestDate Sun Nov 09 06:40:41 EST 2025
Sat Nov 29 03:36:16 EST 2025
Tue Nov 18 22:12:33 EST 2025
Wed Aug 27 02:40:22 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 2
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c295t-f899dbca61419f55a4c46bad817e9921773d1f06af075a4e995822e8a0069f23
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0001-6106-1534
0000-0002-7764-6681
0000-0003-1617-2299
PQID 2345522878
PQPubID 85424
PageCount 10
ParticipantIDs ieee_primary_8861300
proquest_journals_2345522878
crossref_citationtrail_10_1109_TVLSI_2019_2941921
crossref_primary_10_1109_TVLSI_2019_2941921
PublicationCentury 2000
PublicationDate 2020-02-01
PublicationDateYYYYMMDD 2020-02-01
PublicationDate_xml – month: 02
  year: 2020
  text: 2020-02-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on very large scale integration (VLSI) systems
PublicationTitleAbbrev TVLSI
PublicationYear 2020
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
srivastava (ref3) 2015
ref12
ref15
ref14
ref11
ref10
han (ref7) 2015
ref2
ref17
ref19
ref18
parhi (ref21) 2007
chang (ref16) 2015
ref24
ref23
ref25
ref20
ref22
sak (ref5) 2014
parthasarathi (ref6) 2019
chen (ref8) 2014; 49
ref9
ref4
sutskever (ref1) 2014
References_xml – start-page: 6670
  year: 2019
  ident: ref6
  article-title: Lessons from building acoustic models with a million hours of speech
  publication-title: Proc ICASSP IEEE Int Conf Acoust Speech Signal Process (ICASSP)
– ident: ref4
  doi: 10.1109/HPCA.2019.00028
– ident: ref12
  doi: 10.1145/2847263.2847265
– ident: ref20
  doi: 10.1109/ICCV.2015.327
– year: 2015
  ident: ref7
  article-title: Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding
  publication-title: arXiv 1510 00149 [cs]
– ident: ref2
  doi: 10.1109/MSP.2012.2205597
– start-page: 3104
  year: 2014
  ident: ref1
  article-title: Sequence to sequence learning with neural networks
  publication-title: Proc Adv Neural Inf Process Syst
– ident: ref14
  doi: 10.1145/3020078.3021745
– ident: ref10
  doi: 10.1145/3007787.3001163
– ident: ref11
  doi: 10.1109/ISVLSI.2016.129
– ident: ref22
  doi: 10.1109/ASAP.2008.4580190
– volume: 49
  start-page: 269
  year: 2014
  ident: ref8
  article-title: DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning
  publication-title: ACM SIGPLAN Notices
  doi: 10.1145/2644865.2541967
– ident: ref9
  doi: 10.1145/2872887.2750389
– ident: ref19
  doi: 10.1109/IJCNN.2000.861302
– start-page: 843
  year: 2015
  ident: ref3
  article-title: Unsupervised learning of video representations using LSTMs
  publication-title: Proc Int Conf Mach Learn
– year: 2015
  ident: ref16
  article-title: Recurrent neural networks hardware implementation on FPGA
  publication-title: arXiv 1511 05552
– ident: ref23
  doi: 10.1109/SIPS.1997.626129
– start-page: 338
  year: 2014
  ident: ref5
  article-title: Long short-term memory recurrent neural network architectures for large scale acoustic modeling
  publication-title: Proc Annu Conf Int Speech Commun Assoc
– year: 2007
  ident: ref21
  publication-title: VLSI Digital Signal Processing Systems Design and Implementation
– ident: ref24
  doi: 10.1109/TSP.2006.881269
– ident: ref18
  doi: 10.1109/TCSI.2018.2867291
– ident: ref25
  doi: 10.1109/SiPS.2016.48
– ident: ref17
  doi: 10.1109/TCSII.2013.2251968
– ident: ref15
  doi: 10.1109/TVLSI.2017.2717950
– ident: ref13
  doi: 10.1145/3174243.3174253
SSID ssj0014490
Score 2.3872569
Snippet Long short-term memory (LSTM) networks have addressed the shortcomings of recurrent neural networks, such as vanishing gradients and the lack of ability in...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 329
SubjectTerms Acceleration
Arithmetic
Binary codes
Boolean algebra
Complexity
Complexity theory
Computer architecture
Distributed arithmetic (DA)
Hardware
Logic gates
long short-term memory (LSTM) networks
Mathematical analysis
Matrix algebra
Matrix methods
Memory management
Neural networks
offset binary coding (OBC)
Power consumption
Recurrent neural networks
Selectors
Table lookup
Title Low-Complexity Distributed-Arithmetic-Based Pipelined Architecture for an LSTM Network
URI https://ieeexplore.ieee.org/document/8861300
https://www.proquest.com/docview/2345522878
Volume 28
WOSCitedRecordID wos000510674300003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1557-9999
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014490
  issn: 1063-8210
  databaseCode: RIE
  dateStart: 19930101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEF5q8aAHX1WsVsnBm26bd3aP9VEUaim0lN7CZrOLBU1Lmyr-e2c2aakogrcl7EKYbyczk5n5hpArW7t-mKgQVJyH1BcipcKXmHAUgZZc2r4w7PrdqNdj4zHvV8jNuhdGKWWKz1QTlyaXn07lEn-VtRhDbxcC9K0oCoterXXGwPd5wTwQepRBHLNqkLF5azjqDp6wios3XY5ZT-ebETJTVX58io196ez_780OyF7pR1rtAvhDUlHZEdndYBeskVF3-kFR35HzMv-07pEjF8dbqZS255P85Q0bGOkt2LHU6k9m2JkOq_ZGasECl9YSmdUdDJ-tXlExfkyGnYfh3SMtxyhQ6fIgpxpCqjSRAgyxw3UQABYAj0iZEykOwogiL3W0HQoN7oPw4VkAXoNiAlmMteudkGo2zdQpsRLHSRTDGTEc5K1Uoj071XYSgElzhZZ14qzEGsuSYhwnXbzGJtSweWygiBGKuISiTq7XZ2YFwcafu2so_PXOUu510lihF5c6uIhdzw_Au2QRO_v91DnZcTF6NjXYDVLN50t1Qbblez5ZzC_N9foCscrMfw
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fa9swED5CVtj6sHbrRtOlrR_21inxD9mWHtOtpaFuKNSUvhlZllhgS0rqtPS_353shJSNQd-EkcDcp_Pd-e6-A_jq25AnpUlQxWXCuFIVU1xTwlHFVkvtc-XY9bN0MhF3d_K6A9_WvTDGGFd8Zga0dLn8aq6X9KtsKAR5uxigv4k5D_2mW2udM-BcNtwDScQERjKrFhlfDvPb7GZMdVxyEErKewYvzJCbq_LXx9hZmPOd173bLrxvPUlv1ED_ATpm9hG2N_gF9-A2mz8x0nhivayfvR_EkksDrkzFRotp_fM3tTCyU7RklXc9vafedFyNNpILHjq1npp52U1-5U2amvFPkJ-f5d8vWDtIgelQxjWzGFRVpVZoigNp4xjRQIBUJYLUSBRGmkZVYP1EWXQgFMdnMfoNRijiMbZh9Bm6s_nM7INXBkFpBE2JkShvY0ob-ZX1yxiNWqis7kGwEmuhW5JxmnXxq3DBhi8LB0VBUBQtFD04WZ-5byg2_rt7j4S_3tnKvQf9FXpFq4UPRRjxGP1LkYqDf586hrcX-VVWZOPJ5Rd4F1Is7Sqy-9CtF0tzCFv6sZ4-LI7cVfsDf7DPxg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Low-Complexity+Distributed-Arithmetic-Based+Pipelined+Architecture+for+an+LSTM+Network&rft.jtitle=IEEE+transactions+on+very+large+scale+integration+%28VLSI%29+systems&rft.au=Yalamarthy%2C+Krishna+Praveen&rft.au=Dhall%2C+Saurabh&rft.au=Khan%2C+Mohd.+Tasleem&rft.au=Shaik%2C+Rafi+Ahamed&rft.date=2020-02-01&rft.issn=1063-8210&rft.eissn=1557-9999&rft.volume=28&rft.issue=2&rft.spage=329&rft.epage=338&rft_id=info:doi/10.1109%2FTVLSI.2019.2941921&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TVLSI_2019_2941921
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-8210&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-8210&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-8210&client=summon