Learning Low-Rank Representations for Model Compression
Vector Quantization (VQ) is an appealing model compression method to obtain a tiny model with less accuracy loss. While methods to obtain better codebooks and codes under fixed clustering dimensionality have been extensively studied, optimizations via the reduction of subvector dimensionality are no...
Uložené v:
| Vydané v: | Proceedings of ... International Joint Conference on Neural Networks s. 1 - 9 |
|---|---|
| Hlavní autori: | , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
18.06.2023
|
| Predmet: | |
| ISSN: | 2161-4407 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | Vector Quantization (VQ) is an appealing model compression method to obtain a tiny model with less accuracy loss. While methods to obtain better codebooks and codes under fixed clustering dimensionality have been extensively studied, optimizations via the reduction of subvector dimensionality are not carefully considered. This paper reports our recent progress on model compression with the combination of dimensionality reduction and vector quantization, proposing Low-Rank Representation Vector Quantization (LR 2 VQ). LR 2 VQ joins low-rank representation with subvector clustering to construct a new kind of building block that is optimized by end-to-end training. In our method, the compression ratio could be directly controlled by the dimensionality of subvectors, and the final accuracy is solely determined by clustering dimensionality \tilde{d} . We recognize \tilde{d} as a trade-off between low-rank approximation error and clustering error and carry out both theoretical analysis and experimental observations that empower the estimation of the proper \tilde{d} before fine-tuning. With a proper \tilde{d} , we evaluate LR 2 VQ with ResNet-18/ResNet-50 on ImageNet classification datasets, achieving 2.8%/1.0% top-1 accuracy improvements over the current state-of-the-art model compression algorithms with 43×/31× compression factor. |
|---|---|
| ISSN: | 2161-4407 |
| DOI: | 10.1109/IJCNN54540.2023.10191936 |