Compressing and Fine-tuning DNNs for Efficient Inference in Mobile Device-Edge Continuum

Pruning deep neural networks (DNN) is a well-known technique that allows for a sensible reduction in inference cost. However, this may severely degrade the accuracy achieved by the model unless the latter is properly fine-tuned, which may, in turn, result in increased computational cost and latency....

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:2024 IEEE International Mediterranean Conference on Communications and Networking (MeditCom) s. 305 - 310
Hlavní autori: Singh, Gurtaj, Chukhno, Olga, Campolo, Claudia, Molinaro, Antonella, Chiasserini, Carla Fabiana
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 08.07.2024
Predmet:
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Pruning deep neural networks (DNN) is a well-known technique that allows for a sensible reduction in inference cost. However, this may severely degrade the accuracy achieved by the model unless the latter is properly fine-tuned, which may, in turn, result in increased computational cost and latency. Thus, upon deploying a DNN in resource-constrained edge environments, it is critical to find the best trade-off between accuracy (hence, model complexity) and latency and energy consumption. In this work, we explore the different options for the deployment of a machine learning pipeline, encompassing pruning, finetuning, and inference, across a mobile device requesting inference tasks and an edge server, and considering privacy constraints on the data to be used for fine-tuning. Our experimental analysis provides insights for an efficient allocation of the pipeline tasks across network edge and mobile device in terms of energy and network costs, as the target inference latency and accuracy vary. In particular, our results highlight that the higher the edge server load and the number of inference requests, the more convenient it becomes to deploy the entire pipeline at the mobile device using a pruned model, with a cost reduction of up to a factor two compared to deploying the whole pipeline at the edge.
DOI:10.1109/MeditCom61057.2024.10621155