An Efficient CNN Accelerator Using Inter-Frame Data Reuse of Videos on FPGAs
Convolutional neural networks (CNNs) have had great success when applied to computer vision technology, and many application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) CNN accelerators have been proposed. These accelerators primarily focus on the acceleration of a si...
Uloženo v:
| Vydáno v: | IEEE transactions on very large scale integration (VLSI) systems Ročník 30; číslo 11; s. 1 - 14 |
|---|---|
| Hlavní autoři: | , , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
New York
IEEE
01.11.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Témata: | |
| ISSN: | 1063-8210, 1557-9999 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Convolutional neural networks (CNNs) have had great success when applied to computer vision technology, and many application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) CNN accelerators have been proposed. These accelerators primarily focus on the acceleration of a single input, and they are not particularly optimized for video applications. In this article, we focus on the similarities between continuous inputs in video, and we propose a YOLOv3-tiny CNN FPGA accelerator using incremental operation. The accelerator can skip the convolution operation of similar data between continuous inputs. We also use the Winograd algorithm to optimize the conv3 <inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula> <inline-formula> <tex-math notation="LaTeX">3</tex-math> </inline-formula> operator in the YOLOv3-tiny network to further improve the accelerator's efficiency. Experimental results show that our accelerator achieved 74.2 frames/s on ImageNet ILSVRC2015. Compared to the original network without Winograd algorithm and incremental operation, our design provides a 4.10<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula> speedup. When compared with other YOLO network FPGA accelerators applied to video applications, our design provided a 3.13<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula>-18.34<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula> normalized digital signal processor (DSP) efficiency and 1.10<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula>-14.2<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula> energy efficiency. |
|---|---|
| Bibliografie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1063-8210 1557-9999 |
| DOI: | 10.1109/TVLSI.2022.3151788 |