Heuristic-based automatic pruning of deep neural networks

The performance of a deep neural network (deep NN) is dependent upon a significant number of weight parameters that need to be trained which is a computational bottleneck. The growing trend of deeper architectures poses a restriction on the training and inference scheme on resource-constrained devic...

Full description

Saved in:

Bibliographic Details
Published in:	Neural computing & applications Vol. 34; no. 6; pp. 4889 - 4903
Main Authors:	Choudhary, Tejalal, Mishra, Vipul, Goswami, Anurag, Sarangapani, Jagannathan
Format:	Journal Article
Language:	English
Published:	London Springer London 01.03.2022 Springer Nature B.V
Subjects:	Artificial Intelligence Artificial neural networks Computational Biology/Bioinformatics Computational Science and Engineering Computer Science Data Mining and Knowledge Discovery Datasets Image Processing and Computer Vision Inference Neural networks Original Article Parameters Probability and Statistics in Computer Science Pruning Deep neural network Filter pruning Convolutional neural network Efficient inference Model compression and acceleration
ISSN:	0941-0643, 1433-3058
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The performance of a deep neural network (deep NN) is dependent upon a significant number of weight parameters that need to be trained which is a computational bottleneck. The growing trend of deeper architectures poses a restriction on the training and inference scheme on resource-constrained devices. Pruning is an important method for removing the deep NN’s unimportant parameters and making their deployment easier on resource-constrained devices for practical applications. In this paper, we proposed a heuristics-based novel filter pruning method to automatically identify and prune the unimportant filters and make the inference process faster on devices with limited resource availability. The selection of the unimportant filters is made by a novel pruning estimator ( γ ). The proposed method is tested on various convolutional architectures AlexNet, VGG16, ResNet34, and datasets CIFAR10, CIFAR100, and ImageNet. The experimental results on a large-scale ImageNet dataset show that the FLOPs of the VGG16 can be reduced up to 77.47%, achieving ≈ 5 x inference speedup. The FLOPs of a more popular ResNet34 model are reduced by 41.94% while retaining competitive performance compared to other state-of-the-art methods.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0941-0643 1433-3058
DOI:	10.1007/s00521-021-06679-z