Automatic CNN Model Partitioning for GPU/FPGA-based Embedded Heterogeneous Accelerators using Geometric Programming

Graphics Processing Unit (GPU), dedicated Application Specific Integrated Circuit (ASIC) and Field Programmable Gate Array (FPGA) accelerators are currently platforms of choice for porting Convolutional Neural Networks (CNNs). In this work, an automated Central Processing Unit (CPU)-GPU-FPGA partiti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of signal processing systems Jg. 95; H. 10; S. 1203 - 1218
Hauptverfasser: Carballo-Hernández, Walther, Pelcat, Maxime, Berry, François
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York Springer US 01.10.2023
Springer Nature B.V
Springer
Schlagworte:
ISSN:1939-8018, 1939-8115
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Graphics Processing Unit (GPU), dedicated Application Specific Integrated Circuit (ASIC) and Field Programmable Gate Array (FPGA) accelerators are currently platforms of choice for porting Convolutional Neural Networks (CNNs). In this work, an automated Central Processing Unit (CPU)-GPU-FPGA partitioning selection is proposed for a given CNN layer. It is shown that using a Generalized Geometric Programming (GGP) optimization problem formulation, the CPU-GPU-FPGA partitioning problem can be modeled by considering a set of system performance metrics and constraints. Each metric is expressed in a posynomial form depending on CNN hyperparameters and architecture resource models. As for the partitioning method, the state-of-the-art techniques covered are: tiling, grouped convolution and fused-layer. The proposed analytical formalization is then employed to derive a set of objective functions and constraints as a GGP problem. It is demonstrated that it is possible to relax some problem constraints by including a penalization term, and reduce the problem to multiple simpler Geometric Programming (GP) sub-problems. Experimental results targeting an embedded FPGA-GPU platform with CNN layer configurations from state-of-the-art CNN models (AlexNet, VGG16 and ResNet18) show that the simplified problem is solvable in polynomial time with a speed-up gain and energy reduction of around 20% and 15%, respectively, when compared against an arbitrary balanced partitioning. If the models for objective and constraints functions preserve the posynomial form and log-log convexity, it is demonstrated that GGP is an efficient optimization solution to the Design Space Exploration (DSE) problem.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1939-8018
1939-8115
DOI:10.1007/s11265-023-01898-0