High-performance video content recognition with long-term recurrent convolutional network for FPGA

FPGA is a promising candidate for the acceleration of Deep Neural Networks (DNN) with improved latency and energy consumption compared to CPU and GPU-based implementations. DNNs use sequences of layers of regular computation that are well suited for HLS-based design for FPGA. However, optimizing lar...

Full description

Saved in:

Bibliographic Details
Published in:	International Conference on Field-programmable Logic and Applications pp. 1 - 4
Main Authors:	Xiaofan Zhang, Xinheng Liu, Ramachandran, Anand, Chuanhao Zhuge, Shibin Tang, Peng Ouyang, Zuofu Cheng, Rupnow, Kyle, Deming Chen
Format:	Conference Proceeding
Language:	English
Published:	Ghent University 01.09.2017
Subjects:	Field programmable gate arrays IP networks Mathematical model Neural networks Optimization Quantization (signal) Resource management
ISSN:	1946-1488
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	FPGA is a promising candidate for the acceleration of Deep Neural Networks (DNN) with improved latency and energy consumption compared to CPU and GPU-based implementations. DNNs use sequences of layers of regular computation that are well suited for HLS-based design for FPGA. However, optimizing large neural networks under resource constraints is still a key challenge. HLS must manage on-chip computation, buffering resources, and off-chip memory accesses to minimize the total latency. In this paper, we present a design framework for DNNs that uses highly configurable IPs for neural network layers together with a new design space exploration engine for Resource Allocation Management (REALM). We also carry out efficient memory subsystem design and fixed-point weight re-training to further improve our FPGA solution. We demonstrate our design framework on the Long-term Recurrent Convolution Network for video inputs. Our implementation on a Xilinx VC709 board achieves 3.1X speedup compared to an NVIDIA K80 and 4.75X speedup compared to an Intel Xeon with 17.5X lower energy per image.
ISSN:	1946-1488
DOI:	10.23919/FPL.2017.8056833