An Efficient CNN Accelerator Using Inter-Frame Data Reuse of Videos on FPGAs

Convolutional neural networks (CNNs) have had great success when applied to computer vision technology, and many application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) CNN accelerators have been proposed. These accelerators primarily focus on the acceleration of a si...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on very large scale integration (VLSI) systems Jg. 30; H. 11; S. 1 - 14
Hauptverfasser: Li, Shengzhao, Wang, Qin, Jiang, Jianfei, Sheng, Weiguang, Jing, Naifeng, Mao, Zhigang
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York IEEE 01.11.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:
ISSN:1063-8210, 1557-9999
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Convolutional neural networks (CNNs) have had great success when applied to computer vision technology, and many application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) CNN accelerators have been proposed. These accelerators primarily focus on the acceleration of a single input, and they are not particularly optimized for video applications. In this article, we focus on the similarities between continuous inputs in video, and we propose a YOLOv3-tiny CNN FPGA accelerator using incremental operation. The accelerator can skip the convolution operation of similar data between continuous inputs. We also use the Winograd algorithm to optimize the conv3 <inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula> <inline-formula> <tex-math notation="LaTeX">3</tex-math> </inline-formula> operator in the YOLOv3-tiny network to further improve the accelerator's efficiency. Experimental results show that our accelerator achieved 74.2 frames/s on ImageNet ILSVRC2015. Compared to the original network without Winograd algorithm and incremental operation, our design provides a 4.10<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula> speedup. When compared with other YOLO network FPGA accelerators applied to video applications, our design provided a 3.13<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula>-18.34<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula> normalized digital signal processor (DSP) efficiency and 1.10<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula>-14.2<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula> energy efficiency.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1063-8210
1557-9999
DOI:10.1109/TVLSI.2022.3151788