TIE: Energy-efficient Tensor Train-based Inference Engine for Deep Neural Network

In the era of artificial intelligence (AI), deep neural networks (DNNs) have emerged as the most important and powerful AI technique. However, large DNN models are both storage and computation intensive, posing significant challenges for adopting DNNs in resource- constrained scenarios. Thus, model...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA) s. 264 - 277
Hlavní autoři: Deng, Chunhua, Sun, Fangxuan, Qian, Xuehai, Lin, Jun, Wang, Zhongfeng, Yuan, Bo
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: ACM 01.06.2019
Témata:
ISSN:2575-713X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:In the era of artificial intelligence (AI), deep neural networks (DNNs) have emerged as the most important and powerful AI technique. However, large DNN models are both storage and computation intensive, posing significant challenges for adopting DNNs in resource- constrained scenarios. Thus, model compression becomes a crucial technique to ensure wide deployment of DNNs. This paper advances the state-of-the-art by considering tensor train (TT) decomposition, an very promising but yet explored compression technique in architecture domain. The method features with the extremely high compression ratio. However, the challenge is that the inference on the TT-format DNN models inherently incurs massive amount of redundant computations, causing significant energy consumption. Thus, the straightforward application of TT decomposition is not feasible. To address this fundamental challenge, this paper develops a computation-efficient inference scheme for TT-format DNN, which enjoys two key merits: 1) it achieves theoretical limit of number of multiplications, thus eliminating all redundant computations; and 2) the multi-stage processing scheme reduces the intensive memory access to all tensor cores, bringing significant energy saving. Based on the novel inference scheme, we develop TIE, a TT-format compressed DNN-targeted inference engine. TIE is highly flexible, supporting different types of networks for different needs. A 16-processing elements (PE) prototype is implemented using CMOS 28nm technology. Operating on 1000MHz, the TIE accelerator consumes 1.74mm 2 and 154.8mW. Compared with EIE, TIE achieves 7.22x~ 10.66x better area efficiency and 3.03x ~ 4.48x better energy efficiency on different workloads, respectively. Compared with Circnn, Tie achieves 5.96x and 4.56x higher throughput and energy efficiency, respectively.
ISSN:2575-713X
DOI:10.1145/3307650.3322258