FCNNLib: A Flexible Convolution Algorithm Library for Deep Learning on FPGAs

Convolution features huge complexity and demands high computation capability. Among hardware platforms, field programmable gate array (FPGA) emerges as a promising solution for its substantial available parallelism and energy efficiency. Besides, convolution can be implemented with different algorit...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on computer-aided design of integrated circuits and systems Jg. 41; H. 8; S. 2546 - 2559
Hauptverfasser: Liang, Yun, Xiao, Qingcheng, Lu, Liqiang, Xie, Jiaming
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York IEEE 01.08.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:
ISSN:0278-0070, 1937-4151
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Convolution features huge complexity and demands high computation capability. Among hardware platforms, field programmable gate array (FPGA) emerges as a promising solution for its substantial available parallelism and energy efficiency. Besides, convolution can be implemented with different algorithms, including conventional, general matrix-matrix multiplication (GEMM), Winograd, and fast Fourier transformation (FFT) algorithms, which are diverse in arithmetic complexity, resource requirement, etc. Different convolutional neural network (CNN) models have different topologies and structures, favoring different convolution algorithms. In response, software libraries such as cuDNN provide a variety of computational primitives to support these algorithms. However, supporting such libraries on FPGAs is challenging. First, multiple algorithms can share the FPGA resources spatially as well as temporally, introducing either reconfiguration overhead or resource underutilization. Second, FPGA implementation remains a significant challenge for library developers. It typically requires significant specialized hardware knowledge. In this article, we propose FCNNLib , an efficient and scalable convolution algorithm library on FPGAs. To coordinate multiple convolution algorithms on FPGAs, we develop three schedulings: 1) spatial; 2) temporal; and 3) hybrid, which exhibit different tradeoffs in latency and throughput. We explore these schedulings by balancing the reconfiguration overhead, resource utilization, and optimization objectives of the CNNs. Then, we provide efficient and tunable algorithm templates that allow performance tuning through performance and resource models. To arm the users, FCNNLib exposes a set of interfaces to support high-level application designs. We demonstrate the usability of FCNNLib with state-of-the-art CNNs. FCNNLib achieves up to <inline-formula> <tex-math notation="LaTeX">44.6\times </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">1.76\times </tex-math></inline-formula> energy efficiency in various scenarios compared with software libraries for CPUs and GPUs, respectively.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0278-0070
1937-4151
DOI:10.1109/TCAD.2021.3108065