An Empirical Study on Neural Networks Pruning: Trimming for Reducing Memory or Workload

Most of existing studies on neural network pruning only consider memory-based pruning strategies. However pruning for computational workload is often more important in hardware deployments due to a greater focus on model computation reductions. In addition, most pruning schemes restore model accurac...

Full description

Saved in:
Bibliographic Details
Published in:2023 5th International Conference on Data-driven Optimization of Complex Systems (DOCS) pp. 1 - 7
Main Authors: Xiao, Kaiwen, Cai, Xiaodong, Xu, Ke
Format: Conference Proceeding
Language:English
Published: IEEE 22.09.2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Most of existing studies on neural network pruning only consider memory-based pruning strategies. However pruning for computational workload is often more important in hardware deployments due to a greater focus on model computation reductions. In addition, most pruning schemes restore model accuracy during pruning at the expense of adding hyperparameters, extending training time and training complexity. This work proposes a statistical-based globally soft iterative pruning scheme. With little extra calculation, an extremely sparse model can be obtained without additional hyperparameters and extended training time. Moreover, this work proposes the concept of computational intensity to balance model memory and computational workload during pruning. Focusing on memory orientated pruning, we can achieve \mathbf{303}\times, \mathbf{100}\times and \mathbf{25}\times parameter compression on LeNet-5 (MNIST), VGG (CIFAR-10) and AlexNet (ImageNet) models, respectively. In particular, combined with cluster quantization, the LeNet-5 model parameters can be compressed to \mathbf{3232}\times . Focusing on workload orientated pruning, we can reduce the computation by \mathbf{7}.\mathbf{6}\times on the AlexNet model, without accuracy loss, significantly higher than prior work. In addition, in order to verify the versatility of the pruning method, we also migrate the pruning task to the object detection and implement \mathbf{10}\times parameter compression and \mathbf{2}.\mathbf{8}\times computation compression for YOLOv2 with reduced mAP within 1%.
DOI:10.1109/DOCS60977.2023.10294956