Automatic CNN Model Partitioning for GPU/FPGA-based Embedded Heterogeneous Accelerators using Geometric Programming

Graphics Processing Unit (GPU), dedicated Application Specific Integrated Circuit (ASIC) and Field Programmable Gate Array (FPGA) accelerators are currently platforms of choice for porting Convolutional Neural Networks (CNNs). In this work, an automated Central Processing Unit (CPU)-GPU-FPGA partiti...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of signal processing systems Ročník 95; číslo 10; s. 1203 - 1218
Hlavní autoři: Carballo-Hernández, Walther, Pelcat, Maxime, Berry, François
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York Springer US 01.10.2023
Springer Nature B.V
Springer
Témata:
ISSN:1939-8018, 1939-8115
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Graphics Processing Unit (GPU), dedicated Application Specific Integrated Circuit (ASIC) and Field Programmable Gate Array (FPGA) accelerators are currently platforms of choice for porting Convolutional Neural Networks (CNNs). In this work, an automated Central Processing Unit (CPU)-GPU-FPGA partitioning selection is proposed for a given CNN layer. It is shown that using a Generalized Geometric Programming (GGP) optimization problem formulation, the CPU-GPU-FPGA partitioning problem can be modeled by considering a set of system performance metrics and constraints. Each metric is expressed in a posynomial form depending on CNN hyperparameters and architecture resource models. As for the partitioning method, the state-of-the-art techniques covered are: tiling, grouped convolution and fused-layer. The proposed analytical formalization is then employed to derive a set of objective functions and constraints as a GGP problem. It is demonstrated that it is possible to relax some problem constraints by including a penalization term, and reduce the problem to multiple simpler Geometric Programming (GP) sub-problems. Experimental results targeting an embedded FPGA-GPU platform with CNN layer configurations from state-of-the-art CNN models (AlexNet, VGG16 and ResNet18) show that the simplified problem is solvable in polynomial time with a speed-up gain and energy reduction of around 20% and 15%, respectively, when compared against an arbitrary balanced partitioning. If the models for objective and constraints functions preserve the posynomial form and log-log convexity, it is demonstrated that GGP is an efficient optimization solution to the Design Space Exploration (DSE) problem.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1939-8018
1939-8115
DOI:10.1007/s11265-023-01898-0