Deep unsupervised multi-modal fusion network for detecting driver distraction

•A state-of-the-art, unsupervised, end-to-end method to detect driver distraction.•Different network architectures to perform embedding subnetworks for multiple heterogeneous sensors.•Multi-scale feature fusion approach to aggregate multi-modal features.•A data collection system using multivariate s...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Neurocomputing (Amsterdam) Ročník 421; s. 26 - 38
Hlavní autoři:	Zhang, Yuxin, Chen, Yiqiang, Gao, Chenlong
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier B.V 15.01.2021
ISSN:	0925-2312, 1872-8286
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	•A state-of-the-art, unsupervised, end-to-end method to detect driver distraction.•Different network architectures to perform embedding subnetworks for multiple heterogeneous sensors.•Multi-scale feature fusion approach to aggregate multi-modal features.•A data collection system using multivariate sensor signal, acoustic signal and visual signal during simulated driving task. The risk of incurring a road traffic crash has increased year by year. Studies show that lack of attention during driving is one of the major causes of traffic accidents. In this work, in order to detect driver distraction, e.g., phone conversation, eating, texting, we introduce a deep unsupervised multi-modal fusion network, termed UMMFN. It is an end-to-end model composing of three main modules: multi-modal representation learning, multi-scale feature fusion and unsupervised driver distraction detection. The first module is to learn low-dimensional representation of multiple heterogeneous sensors using embedding subnetworks. The goal of multi-scale feature fusion is to learn both the temporal dependency for each modality and spatio dependencies from different modalities. The last module utilizes a ConvLSTM Encoder-Decoder model to perform an unsupervised classification task that is not affected by new types of driver behaviors. During the detection phase, a fine-grained detection decision can be made through calculating reconstruction error of UMMFN as a score for each captured testing data. We empirically compare the proposed approach with several state-of-the-art methods on our own multi-modal dataset for distracted driving behavior. Experimental results show that UMMFN has superior performance over the existing approaches.
ISSN:	0925-2312 1872-8286
DOI:	10.1016/j.neucom.2020.09.023