High-performance video content recognition with long-term recurrent convolutional network for FPGA

FPGA is a promising candidate for the acceleration of Deep Neural Networks (DNN) with improved latency and energy consumption compared to CPU and GPU-based implementations. DNNs use sequences of layers of regular computation that are well suited for HLS-based design for FPGA. However, optimizing lar...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	International Conference on Field-programmable Logic and Applications s. 1 - 4
Hlavní autoři:	Xiaofan Zhang, Xinheng Liu, Ramachandran, Anand, Chuanhao Zhuge, Shibin Tang, Peng Ouyang, Zuofu Cheng, Rupnow, Kyle, Deming Chen
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	Ghent University 01.09.2017
Témata:	Field programmable gate arrays IP networks Mathematical model Neural networks Optimization Quantization (signal) Resource management
ISSN:	1946-1488
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	FPGA is a promising candidate for the acceleration of Deep Neural Networks (DNN) with improved latency and energy consumption compared to CPU and GPU-based implementations. DNNs use sequences of layers of regular computation that are well suited for HLS-based design for FPGA. However, optimizing large neural networks under resource constraints is still a key challenge. HLS must manage on-chip computation, buffering resources, and off-chip memory accesses to minimize the total latency. In this paper, we present a design framework for DNNs that uses highly configurable IPs for neural network layers together with a new design space exploration engine for Resource Allocation Management (REALM). We also carry out efficient memory subsystem design and fixed-point weight re-training to further improve our FPGA solution. We demonstrate our design framework on the Long-term Recurrent Convolution Network for video inputs. Our implementation on a Xilinx VC709 board achieves 3.1X speedup compared to an NVIDIA K80 and 4.75X speedup compared to an Intel Xeon with 17.5X lower energy per image.
ISSN:	1946-1488
DOI:	10.23919/FPL.2017.8056833