Deep unsupervised multi-modal fusion network for detecting driver distraction

•A state-of-the-art, unsupervised, end-to-end method to detect driver distraction.•Different network architectures to perform embedding subnetworks for multiple heterogeneous sensors.•Multi-scale feature fusion approach to aggregate multi-modal features.•A data collection system using multivariate s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neurocomputing (Amsterdam) Jg. 421; S. 26 - 38
Hauptverfasser: Zhang, Yuxin, Chen, Yiqiang, Gao, Chenlong
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier B.V 15.01.2021
ISSN:0925-2312, 1872-8286
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•A state-of-the-art, unsupervised, end-to-end method to detect driver distraction.•Different network architectures to perform embedding subnetworks for multiple heterogeneous sensors.•Multi-scale feature fusion approach to aggregate multi-modal features.•A data collection system using multivariate sensor signal, acoustic signal and visual signal during simulated driving task. The risk of incurring a road traffic crash has increased year by year. Studies show that lack of attention during driving is one of the major causes of traffic accidents. In this work, in order to detect driver distraction, e.g., phone conversation, eating, texting, we introduce a deep unsupervised multi-modal fusion network, termed UMMFN. It is an end-to-end model composing of three main modules: multi-modal representation learning, multi-scale feature fusion and unsupervised driver distraction detection. The first module is to learn low-dimensional representation of multiple heterogeneous sensors using embedding subnetworks. The goal of multi-scale feature fusion is to learn both the temporal dependency for each modality and spatio dependencies from different modalities. The last module utilizes a ConvLSTM Encoder-Decoder model to perform an unsupervised classification task that is not affected by new types of driver behaviors. During the detection phase, a fine-grained detection decision can be made through calculating reconstruction error of UMMFN as a score for each captured testing data. We empirically compare the proposed approach with several state-of-the-art methods on our own multi-modal dataset for distracted driving behavior. Experimental results show that UMMFN has superior performance over the existing approaches.
ISSN:0925-2312
1872-8286
DOI:10.1016/j.neucom.2020.09.023