GAAS: An Efficient Group Associated Architecture and Scheduler Module for Sparse CNN Accelerators

Convolutional neural networks (CNNs) have become powerful algorithms in various tasks. Application-specific integrated circuit (ASIC) has been widely used to accelerate CNN on mobile platforms because of its tremendous energy efficiency and performance. Meanwhile, CNNs have become much sparser with...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on computer-aided design of integrated circuits and systems Jg. 39; H. 12; S. 5170 - 5182
Hauptverfasser: Wang, Jingyu, Yuan, Zhe, Liu, Ruoyang, Feng, Xiaoyu, Du, Li, Yang, Huazhong, Liu, Yongpan
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York IEEE 01.12.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:
ISSN:0278-0070, 1937-4151
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Convolutional neural networks (CNNs) have become powerful algorithms in various tasks. Application-specific integrated circuit (ASIC) has been widely used to accelerate CNN on mobile platforms because of its tremendous energy efficiency and performance. Meanwhile, CNNs have become much sparser with the development of network pruning algorithms. Recent works have employed different methods to improve the energy efficiency and performance of ASIC accelerators by utilizing the sparsity character of CNN. However, some of these methods suffer from tremendous output memory overhead and performance degradation induced by hash collisions. To overcome the aforementioned problem, we propose GAAS: an efficient group associated architecture and scheduler module for sparse CNN accelerators. It achieves smaller output memory overhead and higher performance compared with the state-of-the-art accelerator. Our proposed method GAAS mainly consists of two parts: 1) an n-way group associated architecture to reduce the output memory overhead and 2) a scheduler module to improve the performance. Besides, a load-balancing algorithm is proposed and implemented in the scheduler module to improve the performance by reducing the hash collision rate. To demonstrate the efficiency of GAAS, we implement a 4-way image-principal associated architecture with a 16×16 PE array and the scheduler module on our proposed method. The experimental results on AlexNet, VGG16, ResNet18, and MobileNet show that GAAS can reduce the output memory overhead by 50%, and it can surely improve the performance of them by 1.53×, 1.62×, 1.46×, and 1.55×, respectively.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0278-0070
1937-4151
DOI:10.1109/TCAD.2020.2966451