Automatic CNN Model Partitioning for GPU/FPGA-based Embedded Heterogeneous Accelerators using Geometric Programming

Graphics Processing Unit (GPU), dedicated Application Specific Integrated Circuit (ASIC) and Field Programmable Gate Array (FPGA) accelerators are currently platforms of choice for porting Convolutional Neural Networks (CNNs). In this work, an automated Central Processing Unit (CPU)-GPU-FPGA partiti...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Journal of signal processing systems Ročník 95; číslo 10; s. 1203 - 1218
Hlavní autoři:	Carballo-Hernández, Walther, Pelcat, Maxime, Berry, François
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	New York Springer US 01.10.2023 Springer Nature B.V Springer
Témata:	Accelerators Application specific integrated circuits Artificial neural networks Central processing units Circuits and Systems Co-design Communication Computer Imaging Computer Science Constraints Convexity CPUs Deep learning Design optimization Electrical Engineering Embedded systems Energy consumption Engineering Field programmable gate arrays Graphics processing units Image Processing and Computer Vision Linear programming Mathematical programming Optimization techniques Partitioning Pattern Recognition Pattern Recognition and Graphics Performance measurement Polynomials Signal,Image and Speech Processing Tiling Vision Workloads Convolutional Neural Network Embedded design Heterogeneous platform Mathematical optimization Geometric programming
ISSN:	1939-8018, 1939-8115
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Graphics Processing Unit (GPU), dedicated Application Specific Integrated Circuit (ASIC) and Field Programmable Gate Array (FPGA) accelerators are currently platforms of choice for porting Convolutional Neural Networks (CNNs). In this work, an automated Central Processing Unit (CPU)-GPU-FPGA partitioning selection is proposed for a given CNN layer. It is shown that using a Generalized Geometric Programming (GGP) optimization problem formulation, the CPU-GPU-FPGA partitioning problem can be modeled by considering a set of system performance metrics and constraints. Each metric is expressed in a posynomial form depending on CNN hyperparameters and architecture resource models. As for the partitioning method, the state-of-the-art techniques covered are: tiling, grouped convolution and fused-layer. The proposed analytical formalization is then employed to derive a set of objective functions and constraints as a GGP problem. It is demonstrated that it is possible to relax some problem constraints by including a penalization term, and reduce the problem to multiple simpler Geometric Programming (GP) sub-problems. Experimental results targeting an embedded FPGA-GPU platform with CNN layer configurations from state-of-the-art CNN models (AlexNet, VGG16 and ResNet18) show that the simplified problem is solvable in polynomial time with a speed-up gain and energy reduction of around 20% and 15%, respectively, when compared against an arbitrary balanced partitioning. If the models for objective and constraints functions preserve the posynomial form and log-log convexity, it is demonstrated that GGP is an efficient optimization solution to the Design Space Exploration (DSE) problem.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1939-8018 1939-8115
DOI:	10.1007/s11265-023-01898-0