Edge Deployable Distributed Evolutionary Optimization based Calibration method for Neural Quantization

Accuracy drop in neural quantization is addressed in prior-art through Post Training Quantization (PTQ) schemes such as Percentile and Range-based calibration that remain sensitive to the data distribution. On the other hand, the sophisticated methods that efficiently handle the variability in data...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) S. 6240 - 6244
Hauptverfasser:	Tiwari, Utsav, Miriyala, Srinivas S, Rajendiran, Vikram N
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	IEEE 14.04.2024
Schlagworte:	Block-wise calibration Calibration Deep Neural Networks Evolutionary algorithm Memory management On-Device deployment Performance evaluation Post Training Quantization Programming Quantization (signal) Signal processing algorithms Training
ISSN:	2379-190X
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Accuracy drop in neural quantization is addressed in prior-art through Post Training Quantization (PTQ) schemes such as Percentile and Range-based calibration that remain sensitive to the data distribution. On the other hand, the sophisticated methods that efficiently handle the variability in data require their deployment also on the embedded devices significantly increasing the memory and latency. We solve this issue by translating PTQ as a non-linear programming problem, which is then efficiently solved block-wise in distributed manner using an evolutionary algorithm. The quantized models are also deployed on the Galaxy S23 smartphone to measure the on-device performance. MobileNetV2 and ResNet18 in Int8 precision resulted in 0.33 and 0.03 accuracy drop, respectively, which is best by the standards of PTQ. Our approach is the first-of-its-kind hardware-agnostic high-accuracy PTQ method that allows the seamless deployment of quantized networks on embedded devices.
ISSN:	2379-190X
DOI:	10.1109/ICASSP48485.2024.10446281