MARS: Multimacro Architecture SRAM CIM-Based Accelerator With Co-Designed Compressed Neural Networks

Convolutional neural networks (CNNs) play a key role in deep learning applications. However, the large storage overheads and the substantial computational cost of CNNs are problematic in hardware accelerators. Computing-in-memory (CIM) architecture has demonstrated great potential to effectively com...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE transactions on computer-aided design of integrated circuits and systems Ročník 41; číslo 5; s. 1550 - 1562
Hlavní autori: Sie, Syuan-Hao, Lee, Jye-Luen, Chen, Yi-Ren, Yeh, Zuo-Wei, Li, Zhaofang, Lu, Chih-Cheng, Hsieh, Chih-Cheng, Chang, Meng-Fan, Tang, Kea-Tiong
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: New York IEEE 01.05.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:
ISSN:0278-0070, 1937-4151
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Convolutional neural networks (CNNs) play a key role in deep learning applications. However, the large storage overheads and the substantial computational cost of CNNs are problematic in hardware accelerators. Computing-in-memory (CIM) architecture has demonstrated great potential to effectively compute large-scale matrix-vector multiplication. However, the intensive multiply and accumulation (MAC) operations executed on CIM macros remain bottlenecks for further improvement of energy efficiency and throughput. To reduce computational costs, model compression is a widely studied method to shrink the model size. For implementation in a static random access memory (SRAM) CIM-based accelerator, the model compression algorithm must consider the hardware limitations of CIM macros. In this study, a software and hardware co-design approach is proposed to design MARS, a SRAM-based CIM (SRAM CIM)-based CNN accelerator that can utilize multiple SRAM CIM macros as processing units and support a sparse CNN, and an SRAM CIM-aware model compression algorithm that considers a CIM architecture to reduce the number of network parameters. With the proposed hardware software co-designed method, MARS can reach over 700 and 400 FPS for CIFAR-10 and CIFAR-100, respectively. In addition, MARS achieves 52.3 and 88.2 TOPs/W in VGG16 and ResNet18, respectively.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0278-0070
1937-4151
DOI:10.1109/TCAD.2021.3082107