An Empirical Study on Neural Networks Pruning: Trimming for Reducing Memory or Workload

Most of existing studies on neural network pruning only consider memory-based pruning strategies. However pruning for computational workload is often more important in hardware deployments due to a greater focus on model computation reductions. In addition, most pruning schemes restore model accurac...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2023 5th International Conference on Data-driven Optimization of Complex Systems (DOCS) s. 1 - 7
Hlavní autoři: Xiao, Kaiwen, Cai, Xiaodong, Xu, Ke
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 22.09.2023
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Most of existing studies on neural network pruning only consider memory-based pruning strategies. However pruning for computational workload is often more important in hardware deployments due to a greater focus on model computation reductions. In addition, most pruning schemes restore model accuracy during pruning at the expense of adding hyperparameters, extending training time and training complexity. This work proposes a statistical-based globally soft iterative pruning scheme. With little extra calculation, an extremely sparse model can be obtained without additional hyperparameters and extended training time. Moreover, this work proposes the concept of computational intensity to balance model memory and computational workload during pruning. Focusing on memory orientated pruning, we can achieve \mathbf{303}\times, \mathbf{100}\times and \mathbf{25}\times parameter compression on LeNet-5 (MNIST), VGG (CIFAR-10) and AlexNet (ImageNet) models, respectively. In particular, combined with cluster quantization, the LeNet-5 model parameters can be compressed to \mathbf{3232}\times . Focusing on workload orientated pruning, we can reduce the computation by \mathbf{7}.\mathbf{6}\times on the AlexNet model, without accuracy loss, significantly higher than prior work. In addition, in order to verify the versatility of the pruning method, we also migrate the pruning task to the object detection and implement \mathbf{10}\times parameter compression and \mathbf{2}.\mathbf{8}\times computation compression for YOLOv2 with reduced mAP within 1%.
DOI:10.1109/DOCS60977.2023.10294956