Heterogeneous FPGA-Based Cost-Optimal Design for Timing-Constrained CNNs

Field programmable gate array (FPGA) has been one of the most popular platforms to implement convolutional neural networks (CNNs) due to its high performance and cost efficiency; however, limited by the on-chip resources, the existing single-FPGA architectures cannot fully exploit the parallelism in...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on computer-aided design of integrated circuits and systems Jg. 37; H. 11; S. 2542 - 2554
Hauptverfasser:	Jiang, Weiwen, Sha, Edwin Hsing-Mean, Zhuge, Qingfeng, Yang, Lei, Chen, Xianzhang, Hu, Jingtong
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	New York IEEE 01.11.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:	Algorithms Artificial neural networks Buffers Convolutional neural networks Convolutional neural networks (CNNs) Dynamic programming Embedded systems Field programmable gate arrays Hetero-nanocrystal memory heterogeneous field programmable gate array (FPGA) cluster Integer programming Linear programming Memory management Minimum cost Mixed integer Optimization optimization algorithms Parallel processing partitioning and mapping Pipelines Task analysis Timing
ISSN:	0278-0070, 1937-4151
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Field programmable gate array (FPGA) has been one of the most popular platforms to implement convolutional neural networks (CNNs) due to its high performance and cost efficiency; however, limited by the on-chip resources, the existing single-FPGA architectures cannot fully exploit the parallelism in CNNs. In this paper, we explore heterogeneous FPGA-based designs to effectively leverage both task and data parallelism, such that the resultant system can achieve the minimum cost while satisfying timing constraints. In order to maximize the task parallelism, we investigate two critical problems: 1) buffer placement , where to place buffers to partition CNNs into pipeline stages and 2) task assignment , what type of FPGA to implement different CNN layers. We first formulate the system-level optimization problem with a mixed integer linear programming model. Then, we propose an efficient dynamic programming algorithm to obtain the optimal solutions. On top of that, we devise an efficient algorithm that exploits data parallelism within CNN layers to further improve cost efficiency. Evaluations on well-known CNNs demonstrate that the proposed techniques can obtain an average of 30.82% reduction in system cost under the same timing constraint, and an average of 1.5 times speedup in performance under the same cost budget, compared with the state-of-the-art techniques.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0278-0070 1937-4151
DOI:	10.1109/TCAD.2018.2857098