TIE: Energy-efficient Tensor Train-based Inference Engine for Deep Neural Network
In the era of artificial intelligence (AI), deep neural networks (DNNs) have emerged as the most important and powerful AI technique. However, large DNN models are both storage and computation intensive, posing significant challenges for adopting DNNs in resource- constrained scenarios. Thus, model...
Uloženo v:
| Vydáno v: | 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA) s. 264 - 277 |
|---|---|
| Hlavní autoři: | , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
ACM
01.06.2019
|
| Témata: | |
| ISSN: | 2575-713X |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | In the era of artificial intelligence (AI), deep neural networks (DNNs) have emerged as the most important and powerful AI technique. However, large DNN models are both storage and computation intensive, posing significant challenges for adopting DNNs in resource- constrained scenarios. Thus, model compression becomes a crucial technique to ensure wide deployment of DNNs. This paper advances the state-of-the-art by considering tensor train (TT) decomposition, an very promising but yet explored compression technique in architecture domain. The method features with the extremely high compression ratio. However, the challenge is that the inference on the TT-format DNN models inherently incurs massive amount of redundant computations, causing significant energy consumption. Thus, the straightforward application of TT decomposition is not feasible. To address this fundamental challenge, this paper develops a computation-efficient inference scheme for TT-format DNN, which enjoys two key merits: 1) it achieves theoretical limit of number of multiplications, thus eliminating all redundant computations; and 2) the multi-stage processing scheme reduces the intensive memory access to all tensor cores, bringing significant energy saving. Based on the novel inference scheme, we develop TIE, a TT-format compressed DNN-targeted inference engine. TIE is highly flexible, supporting different types of networks for different needs. A 16-processing elements (PE) prototype is implemented using CMOS 28nm technology. Operating on 1000MHz, the TIE accelerator consumes 1.74mm 2 and 154.8mW. Compared with EIE, TIE achieves 7.22x~ 10.66x better area efficiency and 3.03x ~ 4.48x better energy efficiency on different workloads, respectively. Compared with Circnn, Tie achieves 5.96x and 4.56x higher throughput and energy efficiency, respectively. |
|---|---|
| AbstractList | In the era of artificial intelligence (AI), deep neural networks (DNNs) have emerged as the most important and powerful AI technique. However, large DNN models are both storage and computation intensive, posing significant challenges for adopting DNNs in resource- constrained scenarios. Thus, model compression becomes a crucial technique to ensure wide deployment of DNNs. This paper advances the state-of-the-art by considering tensor train (TT) decomposition, an very promising but yet explored compression technique in architecture domain. The method features with the extremely high compression ratio. However, the challenge is that the inference on the TT-format DNN models inherently incurs massive amount of redundant computations, causing significant energy consumption. Thus, the straightforward application of TT decomposition is not feasible. To address this fundamental challenge, this paper develops a computation-efficient inference scheme for TT-format DNN, which enjoys two key merits: 1) it achieves theoretical limit of number of multiplications, thus eliminating all redundant computations; and 2) the multi-stage processing scheme reduces the intensive memory access to all tensor cores, bringing significant energy saving. Based on the novel inference scheme, we develop TIE, a TT-format compressed DNN-targeted inference engine. TIE is highly flexible, supporting different types of networks for different needs. A 16-processing elements (PE) prototype is implemented using CMOS 28nm technology. Operating on 1000MHz, the TIE accelerator consumes 1.74mm 2 and 154.8mW. Compared with EIE, TIE achieves 7.22x~ 10.66x better area efficiency and 3.03x ~ 4.48x better energy efficiency on different workloads, respectively. Compared with Circnn, Tie achieves 5.96x and 4.56x higher throughput and energy efficiency, respectively. |
| Author | Wang, Zhongfeng Yuan, Bo Deng, Chunhua Sun, Fangxuan Qian, Xuehai Lin, Jun |
| Author_xml | – sequence: 1 givenname: Chunhua surname: Deng fullname: Deng, Chunhua organization: Rutgers University – sequence: 2 givenname: Fangxuan surname: Sun fullname: Sun, Fangxuan organization: Nanjing University – sequence: 3 givenname: Xuehai surname: Qian fullname: Qian, Xuehai organization: University of Southern California – sequence: 4 givenname: Jun surname: Lin fullname: Lin, Jun organization: Nanjing University – sequence: 5 givenname: Zhongfeng surname: Wang fullname: Wang, Zhongfeng organization: Nanjing University – sequence: 6 givenname: Bo surname: Yuan fullname: Yuan, Bo organization: Rutgers University |
| BookMark | eNotzM1Kw0AUhuFRFKw1axdu5gamzk_O_LiTGjVQFCGCuzKZnCnROimTSundG9DVu_gevktyloaEhFwLvhCihFuluNHAF0pJKcGekMIZOw1caa1deUpmEgwwI9THBSnG8ZNzLq0RE5qRt6au7miVMG-ODGPsQ49pTxtM45Bpk32fWOtH7GidImZMASe96RPSOIEHxB19wZ_st1P2hyF_XZHz6LcjFv-dk_fHqlk-s9XrU728XzEvrd6zYD10uo1Kg3cqOA4cgtdlGa3RVroWOw2IYHSMKNpoJBeIbam7AD5aqebk5u-3R8T1LvffPh_X1lmupFW_IfdRGQ |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1145/3307650.3322258 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9781450366694 1450366694 |
| EISSN | 2575-713X |
| EndPage | 277 |
| ExternalDocumentID | 8980328 |
| Genre | orig-research |
| GroupedDBID | 23M 29F 29O 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS APO BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO ZY4 |
| ID | FETCH-LOGICAL-a286t-c8a5d6bf365a93c90505ca644f876829bed65ee576ffe1bf7201eeb46dc5af823 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 57 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000521059600021&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Sep 10 07:40:26 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a286t-c8a5d6bf365a93c90505ca644f876829bed65ee576ffe1bf7201eeb46dc5af823 |
| PageCount | 14 |
| ParticipantIDs | ieee_primary_8980328 |
| PublicationCentury | 2000 |
| PublicationDate | 2019-June |
| PublicationDateYYYYMMDD | 2019-06-01 |
| PublicationDate_xml | – month: 06 year: 2019 text: 2019-June |
| PublicationDecade | 2010 |
| PublicationTitle | 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA) |
| PublicationTitleAbbrev | ISCA |
| PublicationYear | 2019 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssj0002871781 ssj0019956 |
| Score | 2.4153485 |
| Snippet | In the era of artificial intelligence (AI), deep neural networks (DNNs) have emerged as the most important and powerful AI technique. However, large DNN models... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 264 |
| SubjectTerms | acceleration Artificial intelligence Artificial neural networks CMOS technology compression Computational modeling Computer architecture Deep learning Energy efficiency Engines Prototypes tensor-train decomposition Tensors Throughput |
| Title | TIE: Energy-efficient Tensor Train-based Inference Engine for Deep Neural Network |
| URI | https://ieeexplore.ieee.org/document/8980328 |
| WOSCitedRecordID | wos000521059600021&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED6VioGpQIt4ywMj7iPOw2aFVnSpihSkbpUfZwkJtVVp-f2cnRAYWJgSZXAiW85957vv-wDujMGRtaLgNlOKp2LouVTec6Udpc9CopWV2UQxm8nFQs1bcN9wYRAxNp9hP9zGWr5b2304KhtIJYP82wEcFEVecbWa85SA_KNuTV1BCIzNWspnlGYDStsLAiN9ESoLwd_9l5dKDCWTzv8-4hh6P5w8Nm-izQm0cHUKnW9TBlbv0S68lNPxAxtHSh_HKBBB47GS0tX1lpXBEYKH0OXYtBm3EiVkhF_ZE-KGBcUO_U6X2CLeg9fJuHx85rVvAteJzHfcSp253HiRZ1oJq4JZndUEfDz9-mSiDLo8Q6RMw3scGV8QCEA0ae5spr1MxBm0V-sVngNLtaUMVgydMJjSVpWe4JPTiaW45xKTXEA3zNByU0ljLOvJufz78RUc0atU1Wl1De3ddo83cGg_d28f29u4nl-M5KBL |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4QTfSECsa3PXi0PLb7aL0qBCISTNaEG-ljmpgYIAj-fqfddfXgxdNu9tBupmlnpjPf9xFyqzX0jOEZM4mULOZdx4R0jkllMX3mAowoxCayyUTMZnJaI3cVFgYAQvMZtP1rqOXbpdn6q7KOkMLTv-2QXa-cVaK1qhsVH_sH5pqyhuAxmyWZTy9OOpi4ZxiOtLmvLXiF919qKsGZDBr_-41D0vpB5dFp5W-OSA0Wx6TxLctAy13aJC_5qH9P-wHUxyBQROB4NMeEdbmmudeEYN55WTqqxi1oCSlGsPQRYEU9Z4d6x0doEm-R10E_fxiyUjmBqUikG2aESmyqHU8TJbmRXq7OKAx9HB5-IpIabJoAYK7hHPS0yzAMANBxak2inIj4Cakvlgs4JTRWBnNY3rVcQ4ybVTg0vVWRQc9nIx2dkaa30HxVkGPMS-Oc__35huwP8-fxfDyaPF2QA5xWFn1Xl6S-WW_hiuyZz83bx_o6rO0XF9SjlA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2019+ACM%2FIEEE+46th+Annual+International+Symposium+on+Computer+Architecture+%28ISCA%29&rft.atitle=TIE%3A+Energy-efficient+Tensor+Train-based+Inference+Engine+for+Deep+Neural+Network&rft.au=Deng%2C+Chunhua&rft.au=Sun%2C+Fangxuan&rft.au=Qian%2C+Xuehai&rft.au=Lin%2C+Jun&rft.date=2019-06-01&rft.pub=ACM&rft.eissn=2575-713X&rft.spage=264&rft.epage=277&rft_id=info:doi/10.1145%2F3307650.3322258&rft.externalDocID=8980328 |