TIE: Energy-efficient Tensor Train-based Inference Engine for Deep Neural Network

In the era of artificial intelligence (AI), deep neural networks (DNNs) have emerged as the most important and powerful AI technique. However, large DNN models are both storage and computation intensive, posing significant challenges for adopting DNNs in resource- constrained scenarios. Thus, model...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA) s. 264 - 277
Hlavní autoři: Deng, Chunhua, Sun, Fangxuan, Qian, Xuehai, Lin, Jun, Wang, Zhongfeng, Yuan, Bo
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: ACM 01.06.2019
Témata:
ISSN:2575-713X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract In the era of artificial intelligence (AI), deep neural networks (DNNs) have emerged as the most important and powerful AI technique. However, large DNN models are both storage and computation intensive, posing significant challenges for adopting DNNs in resource- constrained scenarios. Thus, model compression becomes a crucial technique to ensure wide deployment of DNNs. This paper advances the state-of-the-art by considering tensor train (TT) decomposition, an very promising but yet explored compression technique in architecture domain. The method features with the extremely high compression ratio. However, the challenge is that the inference on the TT-format DNN models inherently incurs massive amount of redundant computations, causing significant energy consumption. Thus, the straightforward application of TT decomposition is not feasible. To address this fundamental challenge, this paper develops a computation-efficient inference scheme for TT-format DNN, which enjoys two key merits: 1) it achieves theoretical limit of number of multiplications, thus eliminating all redundant computations; and 2) the multi-stage processing scheme reduces the intensive memory access to all tensor cores, bringing significant energy saving. Based on the novel inference scheme, we develop TIE, a TT-format compressed DNN-targeted inference engine. TIE is highly flexible, supporting different types of networks for different needs. A 16-processing elements (PE) prototype is implemented using CMOS 28nm technology. Operating on 1000MHz, the TIE accelerator consumes 1.74mm 2 and 154.8mW. Compared with EIE, TIE achieves 7.22x~ 10.66x better area efficiency and 3.03x ~ 4.48x better energy efficiency on different workloads, respectively. Compared with Circnn, Tie achieves 5.96x and 4.56x higher throughput and energy efficiency, respectively.
AbstractList In the era of artificial intelligence (AI), deep neural networks (DNNs) have emerged as the most important and powerful AI technique. However, large DNN models are both storage and computation intensive, posing significant challenges for adopting DNNs in resource- constrained scenarios. Thus, model compression becomes a crucial technique to ensure wide deployment of DNNs. This paper advances the state-of-the-art by considering tensor train (TT) decomposition, an very promising but yet explored compression technique in architecture domain. The method features with the extremely high compression ratio. However, the challenge is that the inference on the TT-format DNN models inherently incurs massive amount of redundant computations, causing significant energy consumption. Thus, the straightforward application of TT decomposition is not feasible. To address this fundamental challenge, this paper develops a computation-efficient inference scheme for TT-format DNN, which enjoys two key merits: 1) it achieves theoretical limit of number of multiplications, thus eliminating all redundant computations; and 2) the multi-stage processing scheme reduces the intensive memory access to all tensor cores, bringing significant energy saving. Based on the novel inference scheme, we develop TIE, a TT-format compressed DNN-targeted inference engine. TIE is highly flexible, supporting different types of networks for different needs. A 16-processing elements (PE) prototype is implemented using CMOS 28nm technology. Operating on 1000MHz, the TIE accelerator consumes 1.74mm 2 and 154.8mW. Compared with EIE, TIE achieves 7.22x~ 10.66x better area efficiency and 3.03x ~ 4.48x better energy efficiency on different workloads, respectively. Compared with Circnn, Tie achieves 5.96x and 4.56x higher throughput and energy efficiency, respectively.
Author Wang, Zhongfeng
Yuan, Bo
Deng, Chunhua
Sun, Fangxuan
Qian, Xuehai
Lin, Jun
Author_xml – sequence: 1
  givenname: Chunhua
  surname: Deng
  fullname: Deng, Chunhua
  organization: Rutgers University
– sequence: 2
  givenname: Fangxuan
  surname: Sun
  fullname: Sun, Fangxuan
  organization: Nanjing University
– sequence: 3
  givenname: Xuehai
  surname: Qian
  fullname: Qian, Xuehai
  organization: University of Southern California
– sequence: 4
  givenname: Jun
  surname: Lin
  fullname: Lin, Jun
  organization: Nanjing University
– sequence: 5
  givenname: Zhongfeng
  surname: Wang
  fullname: Wang, Zhongfeng
  organization: Nanjing University
– sequence: 6
  givenname: Bo
  surname: Yuan
  fullname: Yuan, Bo
  organization: Rutgers University
BookMark eNotzM1Kw0AUhuFRFKw1axdu5gamzk_O_LiTGjVQFCGCuzKZnCnROimTSundG9DVu_gevktyloaEhFwLvhCihFuluNHAF0pJKcGekMIZOw1caa1deUpmEgwwI9THBSnG8ZNzLq0RE5qRt6au7miVMG-ODGPsQ49pTxtM45Bpk32fWOtH7GidImZMASe96RPSOIEHxB19wZ_st1P2hyF_XZHz6LcjFv-dk_fHqlk-s9XrU728XzEvrd6zYD10uo1Kg3cqOA4cgtdlGa3RVroWOw2IYHSMKNpoJBeIbam7AD5aqebk5u-3R8T1LvffPh_X1lmupFW_IfdRGQ
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1145/3307650.3322258
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781450366694
1450366694
EISSN 2575-713X
EndPage 277
ExternalDocumentID 8980328
Genre orig-research
GroupedDBID 23M
29F
29O
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
APO
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
ZY4
ID FETCH-LOGICAL-a286t-c8a5d6bf365a93c90505ca644f876829bed65ee576ffe1bf7201eeb46dc5af823
IEDL.DBID RIE
ISICitedReferencesCount 57
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000521059600021&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Sep 10 07:40:26 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a286t-c8a5d6bf365a93c90505ca644f876829bed65ee576ffe1bf7201eeb46dc5af823
PageCount 14
ParticipantIDs ieee_primary_8980328
PublicationCentury 2000
PublicationDate 2019-June
PublicationDateYYYYMMDD 2019-06-01
PublicationDate_xml – month: 06
  year: 2019
  text: 2019-June
PublicationDecade 2010
PublicationTitle 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA)
PublicationTitleAbbrev ISCA
PublicationYear 2019
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0002871781
ssj0019956
Score 2.4153485
Snippet In the era of artificial intelligence (AI), deep neural networks (DNNs) have emerged as the most important and powerful AI technique. However, large DNN models...
SourceID ieee
SourceType Publisher
StartPage 264
SubjectTerms acceleration
Artificial intelligence
Artificial neural networks
CMOS technology
compression
Computational modeling
Computer architecture
Deep learning
Energy efficiency
Engines
Prototypes
tensor-train decomposition
Tensors
Throughput
Title TIE: Energy-efficient Tensor Train-based Inference Engine for Deep Neural Network
URI https://ieeexplore.ieee.org/document/8980328
WOSCitedRecordID wos000521059600021&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED6VioGpQIt4ywMj7iPOw2aFVnSpihSkbpUfZwkJtVVp-f2cnRAYWJgSZXAiW85957vv-wDujMGRtaLgNlOKp2LouVTec6Udpc9CopWV2UQxm8nFQs1bcN9wYRAxNp9hP9zGWr5b2304KhtIJYP82wEcFEVecbWa85SA_KNuTV1BCIzNWspnlGYDStsLAiN9ESoLwd_9l5dKDCWTzv8-4hh6P5w8Nm-izQm0cHUKnW9TBlbv0S68lNPxAxtHSh_HKBBB47GS0tX1lpXBEYKH0OXYtBm3EiVkhF_ZE-KGBcUO_U6X2CLeg9fJuHx85rVvAteJzHfcSp253HiRZ1oJq4JZndUEfDz9-mSiDLo8Q6RMw3scGV8QCEA0ae5spr1MxBm0V-sVngNLtaUMVgydMJjSVpWe4JPTiaW45xKTXEA3zNByU0ljLOvJufz78RUc0atU1Wl1De3ddo83cGg_d28f29u4nl-M5KBL
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4QTfSECsa3PXi0PLb7aL0qBCISTNaEG-ljmpgYIAj-fqfddfXgxdNu9tBupmlnpjPf9xFyqzX0jOEZM4mULOZdx4R0jkllMX3mAowoxCayyUTMZnJaI3cVFgYAQvMZtP1rqOXbpdn6q7KOkMLTv-2QXa-cVaK1qhsVH_sH5pqyhuAxmyWZTy9OOpi4ZxiOtLmvLXiF919qKsGZDBr_-41D0vpB5dFp5W-OSA0Wx6TxLctAy13aJC_5qH9P-wHUxyBQROB4NMeEdbmmudeEYN55WTqqxi1oCSlGsPQRYEU9Z4d6x0doEm-R10E_fxiyUjmBqUikG2aESmyqHU8TJbmRXq7OKAx9HB5-IpIabJoAYK7hHPS0yzAMANBxak2inIj4Cakvlgs4JTRWBnNY3rVcQ4ybVTg0vVWRQc9nIx2dkaa30HxVkGPMS-Oc__35huwP8-fxfDyaPF2QA5xWFn1Xl6S-WW_hiuyZz83bx_o6rO0XF9SjlA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2019+ACM%2FIEEE+46th+Annual+International+Symposium+on+Computer+Architecture+%28ISCA%29&rft.atitle=TIE%3A+Energy-efficient+Tensor+Train-based+Inference+Engine+for+Deep+Neural+Network&rft.au=Deng%2C+Chunhua&rft.au=Sun%2C+Fangxuan&rft.au=Qian%2C+Xuehai&rft.au=Lin%2C+Jun&rft.date=2019-06-01&rft.pub=ACM&rft.eissn=2575-713X&rft.spage=264&rft.epage=277&rft_id=info:doi/10.1145%2F3307650.3322258&rft.externalDocID=8980328