Deep Crowd Anomaly Detection by Fusing Reconstruction and Prediction Networks

Abnormal event detection is one of the most challenging tasks in computer vision. Many existing deep anomaly detection models are based on reconstruction errors, where the training phase is performed using only videos of normal events and the model is then capable to estimate frame-level scores for...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Electronics (Basel) Jg. 12; H. 7; S. 1517
Hauptverfasser:	Sharif, Md. Haidar, Jiao, Lei, Omlin, Christian W.
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Basel MDPI AG 01.04.2023
Schlagworte:	Algorithms Anomalies Architecture Artificial neural networks Computer vision Crowds Datasets Design Errors Feature extraction Image processing Machine learning Machine vision Neural networks Reconstruction Statistical analysis Surveillance Video Norway
ISSN:	2079-9292, 2079-9292
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Abnormal event detection is one of the most challenging tasks in computer vision. Many existing deep anomaly detection models are based on reconstruction errors, where the training phase is performed using only videos of normal events and the model is then capable to estimate frame-level scores for an unknown input. It is assumed that the reconstruction error gap between frames of normal and abnormal scores is high for abnormal events during the testing phase. Yet, this assumption may not always hold due to superior capacity and generalization of deep neural networks. In this paper, we design a generalized framework (rpNet) for proposing a series of deep models by fusing several options of a reconstruction network (rNet) and a prediction network (pNet) to detect anomaly in videos efficiently. In the rNet, either a convolutional autoencoder (ConvAE) or a skip connected ConvAE (AEc) can be used, whereas in the pNet, either a traditional U-Net, a non-local block U-Net, or an attention block U-Net (aUnet) can be applied. The fusion of both rNet and pNet increases the error gap. Our deep models have distinct degree of feature extraction capabilities. One of our models (AEcaUnet) consists of an AEc with our proposed aUnet has capability to confirm better error gap and to extract high quality of features needed for video anomaly detection. Experimental results on UCSD-Ped1, UCSD-Ped2, CUHK-Avenue, ShanghaiTech-Campus, and UMN datasets with rigorous statistical analysis show the effectiveness of our models.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2079-9292 2079-9292
DOI:	10.3390/electronics12071517