High-performance video content recognition with long-term recurrent convolutional network for FPGA

FPGA is a promising candidate for the acceleration of Deep Neural Networks (DNN) with improved latency and energy consumption compared to CPU and GPU-based implementations. DNNs use sequences of layers of regular computation that are well suited for HLS-based design for FPGA. However, optimizing lar...

Full description

Saved in:
Bibliographic Details
Published in:International Conference on Field-programmable Logic and Applications pp. 1 - 4
Main Authors: Xiaofan Zhang, Xinheng Liu, Ramachandran, Anand, Chuanhao Zhuge, Shibin Tang, Peng Ouyang, Zuofu Cheng, Rupnow, Kyle, Deming Chen
Format: Conference Proceeding
Language:English
Published: Ghent University 01.09.2017
Subjects:
ISSN:1946-1488
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:FPGA is a promising candidate for the acceleration of Deep Neural Networks (DNN) with improved latency and energy consumption compared to CPU and GPU-based implementations. DNNs use sequences of layers of regular computation that are well suited for HLS-based design for FPGA. However, optimizing large neural networks under resource constraints is still a key challenge. HLS must manage on-chip computation, buffering resources, and off-chip memory accesses to minimize the total latency. In this paper, we present a design framework for DNNs that uses highly configurable IPs for neural network layers together with a new design space exploration engine for Resource Allocation Management (REALM). We also carry out efficient memory subsystem design and fixed-point weight re-training to further improve our FPGA solution. We demonstrate our design framework on the Long-term Recurrent Convolution Network for video inputs. Our implementation on a Xilinx VC709 board achieves 3.1X speedup compared to an NVIDIA K80 and 4.75X speedup compared to an Intel Xeon with 17.5X lower energy per image.
ISSN:1946-1488
DOI:10.23919/FPL.2017.8056833