Automatic CNN Model Partitioning for GPU/FPGA-based Embedded Heterogeneous Accelerators using Geometric Programming

Graphics Processing Unit (GPU), dedicated Application Specific Integrated Circuit (ASIC) and Field Programmable Gate Array (FPGA) accelerators are currently platforms of choice for porting Convolutional Neural Networks (CNNs). In this work, an automated Central Processing Unit (CPU)-GPU-FPGA partiti...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of signal processing systems Jg. 95; H. 10; S. 1203 - 1218
Hauptverfasser:	Carballo-Hernández, Walther, Pelcat, Maxime, Berry, François
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	New York Springer US 01.10.2023 Springer Nature B.V Springer
Schlagworte:	Accelerators Application specific integrated circuits Artificial neural networks Central processing units Circuits and Systems Co-design Communication Computer Imaging Computer Science Constraints Convexity CPUs Deep learning Design optimization Electrical Engineering Embedded systems Energy consumption Engineering Field programmable gate arrays Graphics processing units Image Processing and Computer Vision Linear programming Mathematical programming Optimization techniques Partitioning Pattern Recognition Pattern Recognition and Graphics Performance measurement Polynomials Signal,Image and Speech Processing Tiling Vision Workloads Convolutional Neural Network Embedded design Heterogeneous platform Mathematical optimization Geometric programming
ISSN:	1939-8018, 1939-8115
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Graphics Processing Unit (GPU), dedicated Application Specific Integrated Circuit (ASIC) and Field Programmable Gate Array (FPGA) accelerators are currently platforms of choice for porting Convolutional Neural Networks (CNNs). In this work, an automated Central Processing Unit (CPU)-GPU-FPGA partitioning selection is proposed for a given CNN layer. It is shown that using a Generalized Geometric Programming (GGP) optimization problem formulation, the CPU-GPU-FPGA partitioning problem can be modeled by considering a set of system performance metrics and constraints. Each metric is expressed in a posynomial form depending on CNN hyperparameters and architecture resource models. As for the partitioning method, the state-of-the-art techniques covered are: tiling, grouped convolution and fused-layer. The proposed analytical formalization is then employed to derive a set of objective functions and constraints as a GGP problem. It is demonstrated that it is possible to relax some problem constraints by including a penalization term, and reduce the problem to multiple simpler Geometric Programming (GP) sub-problems. Experimental results targeting an embedded FPGA-GPU platform with CNN layer configurations from state-of-the-art CNN models (AlexNet, VGG16 and ResNet18) show that the simplified problem is solvable in polynomial time with a speed-up gain and energy reduction of around 20% and 15%, respectively, when compared against an arbitrary balanced partitioning. If the models for objective and constraints functions preserve the posynomial form and log-log convexity, it is demonstrated that GGP is an efficient optimization solution to the Design Space Exploration (DSE) problem.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1939-8018 1939-8115
DOI:	10.1007/s11265-023-01898-0