MALOC: A Fully Pipelined FPGA Accelerator for Convolutional Neural Networks With All Layers Mapped on Chip

Recently, field-programmable gate arrays (FPGAs) have been widely used in the implementations of hardware accelerator for convolutional neural networks (CNNs). However, most of these existing accelerators are designed in the same idea as their ASIC counterparts, in which all operations from differen...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE transactions on computer-aided design of integrated circuits and systems Ročník 37; číslo 11; s. 2601 - 2612
Hlavní autoři:	Gong, Lei, Wang, Chao, Li, Xi, Chen, Huaping, Zhou, Xuehai
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	New York IEEE 01.11.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:	Accelerators Artificial neural networks Computational efficiency Computational modeling Computer architecture Computing time Convolutional neural network (CNN) Convolutional neural networks design space exploration (DSE) Efficiency Embedded systems Field programmable gate arrays field-programmable gate array (FPGA)-based accelerator Gate arrays Hardware Neural networks Optimization pipeline Pipelines programming framework Redundancy redundancy elimination Space exploration System-on-chip
ISSN:	0278-0070, 1937-4151
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Recently, field-programmable gate arrays (FPGAs) have been widely used in the implementations of hardware accelerator for convolutional neural networks (CNNs). However, most of these existing accelerators are designed in the same idea as their ASIC counterparts, in which all operations from different layers are mapped to the same hardware units and working in a multiplexed way. This manner does not take full advantage of reconfigurability and customizability of FPGAs, resulting in a certain degree of computational efficiency degradation. In this paper, we propose a new architecture for FPGA-based CNN accelerator that maps all the layers to their own on-chip units and working concurrently as a pipeline. A comprehensive mapping and optimizing methodology based on establishing roofline model oriented optimization model is proposed, which can achieve maximum resource utilization as well as optimal computational efficiency. Besides, to ease the programming burden, we propose a design framework which can provide a one-stop function for developers to generate the accelerator with our optimizing methodology. We evaluate our proposal by implementing different modern CNN models on Xilinx Zynq-7020 and Virtex-7 690t FPGA platforms. Experimental results show that our implementations can achieve a peak performance of 910.2 GOPS on Virtex-7 690t, and 36.36 GOP/s/W energy efficiency on Zynq-7020, which are superior to the previous approaches.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0278-0070 1937-4151
DOI:	10.1109/TCAD.2018.2857078