A YOLO v3-tiny FPGA Architecture using a Reconfigurable Hardware Accelerator for Real-time Region of Interest Detection

With the recent advances in the fields of machine learning, neural networks and deep-learning algorithms have become a prevalent subject of computer vision. Especially for tasks like object classification and detection Convolutional Neu-ronal Networks (CNNs) have surpassed the previous traditional a...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings (Euromicro Conference on Digital Systems Design) s. 84 - 92
Hlavní autoři: Herrmann, Viktor, Knapheide, Justin, Steinert, Fritjof, Stabernack, Benno
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.08.2022
Témata:
ISSN:2771-2508
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:With the recent advances in the fields of machine learning, neural networks and deep-learning algorithms have become a prevalent subject of computer vision. Especially for tasks like object classification and detection Convolutional Neu-ronal Networks (CNNs) have surpassed the previous traditional approaches. In addition to these applications, CNNs can recently also be found in other applications. For example the parametrization of video encoding algorithms as used in our example is quite a new application domain. Especially CNN's high recognition rate makes them particularly suitable for finding Regions of Interest (ROIs) in video sequences, which can be used for adapting the data rate of the compressed video stream accordingly. On the downside, these CNN require an immense amount of processing power and memory bandwidth. Object detection networks such as You Only Look Once (YOLO) try to balance processing speed and accuracy but still rely on power-hungry GPUs to meet real-time requirements. Specialized hardware like Field Programmable Gate Array (FPGA) implementations proved to strongly reduce this problem while still providing sufficient computational power. In this paper we propose a flexible architecture for object detection hardware acceleration based on the YOLO v3-tiny model. The reconfigurable accelerator comprises a high throughput convolution engine, custom blocks for all additional CNN operations and a programmable control unit to manage on-chip execution. The model can be deployed without significant changes based on 32-bit floating point values and without further methods that would reduce the model accuracy. Experimental results show a high capability of the design to accelerate the object detection task with a processing time of 27.5 ms per frame. It is thus real-time-capable for 30 FPS applications at frequency of 200 MHz.
ISSN:2771-2508
DOI:10.1109/DSD57027.2022.00021