An Empirical Study on Neural Networks Pruning: Trimming for Reducing Memory or Workload

Most of existing studies on neural network pruning only consider memory-based pruning strategies. However pruning for computational workload is often more important in hardware deployments due to a greater focus on model computation reductions. In addition, most pruning schemes restore model accurac...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	2023 5th International Conference on Data-driven Optimization of Complex Systems (DOCS) S. 1 - 7
Hauptverfasser:	Xiao, Kaiwen, Cai, Xiaodong, Xu, Ke
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	IEEE 22.09.2023
Schlagworte:	Computational modeling Focusing Image coding Neural networks Object detection Quantization (signal) Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Most of existing studies on neural network pruning only consider memory-based pruning strategies. However pruning for computational workload is often more important in hardware deployments due to a greater focus on model computation reductions. In addition, most pruning schemes restore model accuracy during pruning at the expense of adding hyperparameters, extending training time and training complexity. This work proposes a statistical-based globally soft iterative pruning scheme. With little extra calculation, an extremely sparse model can be obtained without additional hyperparameters and extended training time. Moreover, this work proposes the concept of computational intensity to balance model memory and computational workload during pruning. Focusing on memory orientated pruning, we can achieve \mathbf{303}\times, \mathbf{100}\times and \mathbf{25}\times parameter compression on LeNet-5 (MNIST), VGG (CIFAR-10) and AlexNet (ImageNet) models, respectively. In particular, combined with cluster quantization, the LeNet-5 model parameters can be compressed to \mathbf{3232}\times . Focusing on workload orientated pruning, we can reduce the computation by \mathbf{7}.\mathbf{6}\times on the AlexNet model, without accuracy loss, significantly higher than prior work. In addition, in order to verify the versatility of the pruning method, we also migrate the pruning task to the object detection and implement \mathbf{10}\times parameter compression and \mathbf{2}.\mathbf{8}\times computation compression for YOLOv2 with reduced mAP within 1%.
DOI:	10.1109/DOCS60977.2023.10294956