TCL: Time-Dependent Clustering Loss for Optimizing Post-Training Feature Map Quantization for Partitioned DNNs

This paper introduces an enhanced approach for deploying deep learning models on resource-constrained IoT devices by combining model partitioning, autoencoder-based compression, quantization with Time Dependent Clustering Loss (TCL) regularization, and lossless compression, to reduce communication o...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access Jg. 13; S. 103640 - 103648
Hauptverfasser:	Berg, Oscar Artur Bernd, Saqib, Eiraj, Jantsch, Axel, Shallari, Irida, Krug, Silvia, Sanchez Leal, Isaac, O'Nils, Mattias
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Piscataway IEEE 2025 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:	Accuracy Adaptation models Autoencoders Clustering CNN Computational modeling Constraints Feature maps Internet of Things IoT Load modeling Machine learning Object detection Partitioning Quantization Quantization (signal) Real time Regularization Servers Time dependence Training
ISSN:	2169-3536, 2169-3536
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper introduces an enhanced approach for deploying deep learning models on resource-constrained IoT devices by combining model partitioning, autoencoder-based compression, quantization with Time Dependent Clustering Loss (TCL) regularization, and lossless compression, to reduce communication overhead, minimizing latency while maintaining accuracy. The autoencoder compresses feature maps at the partitioning point before quantization, effectively reducing data size and preserving accuracy. TCL regularization clusters activations at the partitioning point to align with quantization levels, minimizing quantization error and ensuring accuracy even with extreme low-bitwidth quantization. Our method is evaluated on classification models (ResNet-50, EfficientNetV2-S) and an object detection model (YOLOv10n) using the TinyImageNet-200 and Pascal VOC datasets. Deployed on Raspberry Pi 4 B and GPU, each model is tested across various partitioning points, quantization bit-widths (1-bit, 2-bit, and 3-bit), communication datarate (1MB/s to 10MB/s), and LZMA lossless compression. For a partitioned ResNet-50 after the convolutional stem block, the speed-up against a server solution is <inline-formula> <tex-math notation="LaTeX">2.33\times </tex-math></inline-formula> and 1.85x compared to the all-in-node solution, with only a minimal accuracy drop of less than one percentage points. The proposed framework offers a scalable solution for deploying high-performance AI models on IoT devices, extending the feasibility of real-time inference in resource-constrained environments.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2025.3579107