An Efficient CNN Accelerator Using Inter-Frame Data Reuse of Videos on FPGAs
Convolutional neural networks (CNNs) have had great success when applied to computer vision technology, and many application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) CNN accelerators have been proposed. These accelerators primarily focus on the acceleration of a si...
Gespeichert in:
| Veröffentlicht in: | IEEE transactions on very large scale integration (VLSI) systems Jg. 30; H. 11; S. 1 - 14 |
|---|---|
| Hauptverfasser: | , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
New York
IEEE
01.11.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Schlagworte: | |
| ISSN: | 1063-8210, 1557-9999 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | Convolutional neural networks (CNNs) have had great success when applied to computer vision technology, and many application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) CNN accelerators have been proposed. These accelerators primarily focus on the acceleration of a single input, and they are not particularly optimized for video applications. In this article, we focus on the similarities between continuous inputs in video, and we propose a YOLOv3-tiny CNN FPGA accelerator using incremental operation. The accelerator can skip the convolution operation of similar data between continuous inputs. We also use the Winograd algorithm to optimize the conv3 <inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula> <inline-formula> <tex-math notation="LaTeX">3</tex-math> </inline-formula> operator in the YOLOv3-tiny network to further improve the accelerator's efficiency. Experimental results show that our accelerator achieved 74.2 frames/s on ImageNet ILSVRC2015. Compared to the original network without Winograd algorithm and incremental operation, our design provides a 4.10<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula> speedup. When compared with other YOLO network FPGA accelerators applied to video applications, our design provided a 3.13<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula>-18.34<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula> normalized digital signal processor (DSP) efficiency and 1.10<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula>-14.2<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula> energy efficiency. |
|---|---|
| Bibliographie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1063-8210 1557-9999 |
| DOI: | 10.1109/TVLSI.2022.3151788 |