An Efficient CNN Accelerator Using Inter-Frame Data Reuse of Videos on FPGAs

Convolutional neural networks (CNNs) have had great success when applied to computer vision technology, and many application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) CNN accelerators have been proposed. These accelerators primarily focus on the acceleration of a si...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on very large scale integration (VLSI) systems Ročník 30; číslo 11; s. 1 - 14
Hlavní autoři: Li, Shengzhao, Wang, Qin, Jiang, Jianfei, Sheng, Weiguang, Jing, Naifeng, Mao, Zhigang
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.11.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:1063-8210, 1557-9999
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Convolutional neural networks (CNNs) have had great success when applied to computer vision technology, and many application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) CNN accelerators have been proposed. These accelerators primarily focus on the acceleration of a single input, and they are not particularly optimized for video applications. In this article, we focus on the similarities between continuous inputs in video, and we propose a YOLOv3-tiny CNN FPGA accelerator using incremental operation. The accelerator can skip the convolution operation of similar data between continuous inputs. We also use the Winograd algorithm to optimize the conv3 <inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula> <inline-formula> <tex-math notation="LaTeX">3</tex-math> </inline-formula> operator in the YOLOv3-tiny network to further improve the accelerator's efficiency. Experimental results show that our accelerator achieved 74.2 frames/s on ImageNet ILSVRC2015. Compared to the original network without Winograd algorithm and incremental operation, our design provides a 4.10<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula> speedup. When compared with other YOLO network FPGA accelerators applied to video applications, our design provided a 3.13<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula>-18.34<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula> normalized digital signal processor (DSP) efficiency and 1.10<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula>-14.2<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula> energy efficiency.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1063-8210
1557-9999
DOI:10.1109/TVLSI.2022.3151788