An Accelerator Design Using a MTCA Decomposition Algorithm for CNNs

Due to the high throughput and high computing capability of convolutional neural networks (CNNs), researchers are paying increasing attention to the design of CNNs hardware accelerator architecture. Accordingly, in this paper, we propose a block parallel computing algorithm based on the matrix trans...

Full description

Saved in:

Bibliographic Details
Published in:	Sensors (Basel, Switzerland) Vol. 20; no. 19; p. 5558
Main Authors:	Zhao, Yunping, Lu, Jianzhuang, Chen, Xiaowen
Format:	Journal Article
Language:	English
Published:	Basel MDPI AG 28.09.2020 MDPI
Subjects:	Algorithms CNNs accelerator Efficiency Field programmable gate arrays hardware architecture parallel computing algorithm Software
ISSN:	1424-8220, 1424-8220
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Due to the high throughput and high computing capability of convolutional neural networks (CNNs), researchers are paying increasing attention to the design of CNNs hardware accelerator architecture. Accordingly, in this paper, we propose a block parallel computing algorithm based on the matrix transformation computing algorithm (MTCA) to realize the convolution expansion and resolve the block problem of the intermediate matrix. It enables high parallel implementation on hardware. Moreover, we also provide a specific calculation method for the optimal partition of matrix multiplication to optimize performance. In our evaluation, our proposed method saves more than 60% of hardware storage space compared with the im2col(image to column) approach. More specifically, in the case of large-scale convolutions, it saves nearly 82% of storage space. Under the accelerator architecture framework designed in this paper, we realize the performance of 26.7GFLOPS-33.4GFLOPS (depending on convolution type) on FPGA(Field Programmable Gate Array) by reducing bandwidth and improving data reusability. It is 1.2×–4.0× faster than memory-efficient convolution (MEC) and im2col, respectively, and represents an effective solution for a large-scale convolution accelerator.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1424-8220 1424-8220
DOI:	10.3390/s20195558