A ResNet-101 deep learning framework induced transfer learning strategy for moving object detection
Background subtraction is a crucial stage in many visual surveillance systems. The prime objective of any such system is to detect moving objects such that the system could be utilized to face many real-time challenges. In the last few decades, various methods have been developed to detect moving ob...
Uložené v:
| Vydané v: | Image and vision computing Ročník 146; s. 105021 |
|---|---|
| Hlavní autori: | , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Elsevier B.V
01.06.2024
|
| Predmet: | |
| ISSN: | 0262-8856, 1872-8138 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | Background subtraction is a crucial stage in many visual surveillance systems. The prime objective of any such system is to detect moving objects such that the system could be utilized to face many real-time challenges. In the last few decades, various methods have been developed to detect moving objects. However, the performance of many existing methods needs further improvement for slow, moderate, and fast-moving object detection in videos simultaneously and also for unseen video setups. In this article, a noteworthy effort is made to detect moving objects in complex videos by harnessing the potential of an encoder-decoder-type deep framework, employing a customized ResNet-101 model along with a feature pooling framework (FPF). The proposed algorithm has four-fold innovations including: A pre-trained modified ResNet-101 network with a transfer learning technique is proposed as an encoder to learn the challenging video scene adequately. The proposed encoder network employs a total of twenty three numbers of layers with skip connections making the model less complex. In between the encoder and decoder framework, the FPF module is used that combines a max-pooling layer, a convolutional layer, and multiple convolutional layers with varying sampling rates. This FPM module can preserve multi-scale and multi-dimensional features across different levels accurately. A decoder architecture consisting of stacked convolution layers is implemented to transform the features into image space efficiently. The efficiency of the proposed scheme is corroborated using subjective and objective analysis. The efficiency of the developed model is highlighted through a comparison with thirty-three existing methods, effectively illustrating its superior efficacy.
•A ResNet-101 encoder-decoder with feature pooling is developed for moving object detection.•Transfer learning updates the modified ResNet-101 weights on challenging video datasets.•The modified ResNet-101 network with 23 layers is computationally efficient.•The technique detects objects of varying speeds with high accuracy on challenging datasets. |
|---|---|
| ISSN: | 0262-8856 1872-8138 |
| DOI: | 10.1016/j.imavis.2024.105021 |