A ResNet-101 deep learning framework induced transfer learning strategy for moving object detection

Background subtraction is a crucial stage in many visual surveillance systems. The prime objective of any such system is to detect moving objects such that the system could be utilized to face many real-time challenges. In the last few decades, various methods have been developed to detect moving ob...

Full description

Saved in:
Bibliographic Details
Published in:Image and vision computing Vol. 146; p. 105021
Main Authors: Panigrahi, Upasana, Sahoo, Prabodh Kumar, Panda, Manoj Kumar, Panda, Ganapati
Format: Journal Article
Language:English
Published: Elsevier B.V 01.06.2024
Subjects:
ISSN:0262-8856, 1872-8138
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background subtraction is a crucial stage in many visual surveillance systems. The prime objective of any such system is to detect moving objects such that the system could be utilized to face many real-time challenges. In the last few decades, various methods have been developed to detect moving objects. However, the performance of many existing methods needs further improvement for slow, moderate, and fast-moving object detection in videos simultaneously and also for unseen video setups. In this article, a noteworthy effort is made to detect moving objects in complex videos by harnessing the potential of an encoder-decoder-type deep framework, employing a customized ResNet-101 model along with a feature pooling framework (FPF). The proposed algorithm has four-fold innovations including: A pre-trained modified ResNet-101 network with a transfer learning technique is proposed as an encoder to learn the challenging video scene adequately. The proposed encoder network employs a total of twenty three numbers of layers with skip connections making the model less complex. In between the encoder and decoder framework, the FPF module is used that combines a max-pooling layer, a convolutional layer, and multiple convolutional layers with varying sampling rates. This FPM module can preserve multi-scale and multi-dimensional features across different levels accurately. A decoder architecture consisting of stacked convolution layers is implemented to transform the features into image space efficiently. The efficiency of the proposed scheme is corroborated using subjective and objective analysis. The efficiency of the developed model is highlighted through a comparison with thirty-three existing methods, effectively illustrating its superior efficacy. •A ResNet-101 encoder-decoder with feature pooling is developed for moving object detection.•Transfer learning updates the modified ResNet-101 weights on challenging video datasets.•The modified ResNet-101 network with 23 layers is computationally efficient.•The technique detects objects of varying speeds with high accuracy on challenging datasets.
ISSN:0262-8856
1872-8138
DOI:10.1016/j.imavis.2024.105021