Edge Deployable Distributed Evolutionary Optimization based Calibration method for Neural Quantization
Accuracy drop in neural quantization is addressed in prior-art through Post Training Quantization (PTQ) schemes such as Percentile and Range-based calibration that remain sensitive to the data distribution. On the other hand, the sophisticated methods that efficiently handle the variability in data...
Gespeichert in:
| Veröffentlicht in: | Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) S. 6240 - 6244 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
14.04.2024
|
| Schlagworte: | |
| ISSN: | 2379-190X |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | Accuracy drop in neural quantization is addressed in prior-art through Post Training Quantization (PTQ) schemes such as Percentile and Range-based calibration that remain sensitive to the data distribution. On the other hand, the sophisticated methods that efficiently handle the variability in data require their deployment also on the embedded devices significantly increasing the memory and latency. We solve this issue by translating PTQ as a non-linear programming problem, which is then efficiently solved block-wise in distributed manner using an evolutionary algorithm. The quantized models are also deployed on the Galaxy S23 smartphone to measure the on-device performance. MobileNetV2 and ResNet18 in Int8 precision resulted in 0.33 and 0.03 accuracy drop, respectively, which is best by the standards of PTQ. Our approach is the first-of-its-kind hardware-agnostic high-accuracy PTQ method that allows the seamless deployment of quantized networks on embedded devices. |
|---|---|
| ISSN: | 2379-190X |
| DOI: | 10.1109/ICASSP48485.2024.10446281 |