An optimized multi-scale convolutional autoencoder for efficient abnormal event detection using rgb, depth and optical flow data

In this study, we propose a novel framework for detecting abnormal events in surveillance videos, a critical yet challenging task in security applications. This research introduces a robust and efficient solution for video anomaly detection, offering substantial improvements in surveillance systems&...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Multimedia tools and applications Ročník 84; číslo 28; s. 34401 - 34435
Hlavný autor:	Alqahtani, Abdullah
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	New York Springer US 01.08.2025 Springer Nature B.V
Predmet:	Accuracy Algorithms Anomalies Artificial neural networks Aspect ratio Automation Cameras Computer Communication Networks Computer Science Data Structures and Information Theory Datasets Deep learning Effectiveness Machine learning Multimedia Information Systems Object recognition Optical flow (image analysis) Optimization Public safety Public spaces Security Smart cities Special Purpose and Application-Based Systems Surveillance Surveillance systems Video data Deep learning Optimization algorithm Object detection Video anomaly detection Feature fusion Key frame extraction Abnormal event detection
ISSN:	1573-7721, 1380-7501, 1573-7721
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	In this study, we propose a novel framework for detecting abnormal events in surveillance videos, a critical yet challenging task in security applications. This research introduces a robust and efficient solution for video anomaly detection, offering substantial improvements in surveillance systems' ability to detect abnormal events, thereby contributing to enhanced security measures in public spaces. The proposed framework utilizes a Multiscale Convolutional Autoencoder (MSCAE) that processes inputs from RGB, depth, and optical flow video clips, enhancing the detection accuracy in complex scenes characterized by varying object scales, aspect ratios, and occlusions. To address the challenge of noise and preserve edges in video data, we implement a two-pass bilateral smooth filtering method, which is effective for noise-invariant, edge-preserving image smoothing. For object detection within these complex scenes, an enhanced Faster R-CNN model is employed. This model's performance is further refined through transfer learning on a dataset specifically composed of abnormal event videos. We also introduce significant improvements to the region proposal network (RPN) of the Faster R-CNN, particularly in non-maximum suppression (NMS) and anchor generation techniques, to better detect anomalies in diverse and complex environments. Furthermore, the MSCAE is integrated with Long Short-Term Memory (LSTM) neural networks to classify the detected anomalies, creating an end-to-end solution for video anomaly detection. Hyperparameter optimization for our deep learning models is performed using the Chameleon Swarm Algorithm, ensuring optimal model performance. Our framework was rigorously tested on the CUHK Avenue dataset, where it achieved a remarkable 99.5% accuracy, significantly outperforming existing methods and demonstrating the effectiveness of our approach.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1573-7721 1380-7501 1573-7721
DOI:	10.1007/s11042-025-20608-5