Learning Low-Rank Representations for Model Compression

Vector Quantization (VQ) is an appealing model compression method to obtain a tiny model with less accuracy loss. While methods to obtain better codebooks and codes under fixed clustering dimensionality have been extensively studied, optimizations via the reduction of subvector dimensionality are no...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of ... International Joint Conference on Neural Networks pp. 1 - 9
Main Authors: Zhu, Zezhou, Dong, Yuan, Zhao, Zhong
Format: Conference Proceeding
Language:English
Published: IEEE 18.06.2023
Subjects:
ISSN:2161-4407
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Vector Quantization (VQ) is an appealing model compression method to obtain a tiny model with less accuracy loss. While methods to obtain better codebooks and codes under fixed clustering dimensionality have been extensively studied, optimizations via the reduction of subvector dimensionality are not carefully considered. This paper reports our recent progress on model compression with the combination of dimensionality reduction and vector quantization, proposing Low-Rank Representation Vector Quantization (LR 2 VQ). LR 2 VQ joins low-rank representation with subvector clustering to construct a new kind of building block that is optimized by end-to-end training. In our method, the compression ratio could be directly controlled by the dimensionality of subvectors, and the final accuracy is solely determined by clustering dimensionality \tilde{d} . We recognize \tilde{d} as a trade-off between low-rank approximation error and clustering error and carry out both theoretical analysis and experimental observations that empower the estimation of the proper \tilde{d} before fine-tuning. With a proper \tilde{d} , we evaluate LR 2 VQ with ResNet-18/ResNet-50 on ImageNet classification datasets, achieving 2.8%/1.0% top-1 accuracy improvements over the current state-of-the-art model compression algorithms with 43×/31× compression factor.
ISSN:2161-4407
DOI:10.1109/IJCNN54540.2023.10191936