An Efficient CNN Accelerator Using Inter-Frame Data Reuse of Videos on FPGAs

Convolutional neural networks (CNNs) have had great success when applied to computer vision technology, and many application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) CNN accelerators have been proposed. These accelerators primarily focus on the acceleration of a si...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE transactions on very large scale integration (VLSI) systems Ročník 30; číslo 11; s. 1 - 14
Hlavní autoři:	Li, Shengzhao, Wang, Qin, Jiang, Jianfei, Sheng, Weiguang, Jing, Naifeng, Mao, Zhigang
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	New York IEEE 01.11.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:	Accelerators Algorithms Application specific integrated circuits Artificial neural networks Computer vision Convolution Convolutional neural network (CNN) Convolutional neural networks Digital signal processing Digital signal processors Efficiency Field programmable gate arrays field-programmable gate array (FPGA) accelerator incremental operation input similarity Microprocessors Neural networks Operators (mathematics) Quantization (signal) Signal processing algorithms video applications Videos Winograd algorithm
ISSN:	1063-8210, 1557-9999
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Convolutional neural networks (CNNs) have had great success when applied to computer vision technology, and many application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) CNN accelerators have been proposed. These accelerators primarily focus on the acceleration of a single input, and they are not particularly optimized for video applications. In this article, we focus on the similarities between continuous inputs in video, and we propose a YOLOv3-tiny CNN FPGA accelerator using incremental operation. The accelerator can skip the convolution operation of similar data between continuous inputs. We also use the Winograd algorithm to optimize the conv3 <inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula> <inline-formula> <tex-math notation="LaTeX">3</tex-math> </inline-formula> operator in the YOLOv3-tiny network to further improve the accelerator's efficiency. Experimental results show that our accelerator achieved 74.2 frames/s on ImageNet ILSVRC2015. Compared to the original network without Winograd algorithm and incremental operation, our design provides a 4.10<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula> speedup. When compared with other YOLO network FPGA accelerators applied to video applications, our design provided a 3.13<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula>-18.34<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula> normalized digital signal processor (DSP) efficiency and 1.10<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula>-14.2<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula> energy efficiency.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1063-8210 1557-9999
DOI:	10.1109/TVLSI.2022.3151788