Multi-Modal Cross Learning for an FMCW Radar Assisted by Thermal and RGB Cameras to Monitor Gestures and Cooking Processes
This paper proposes a multi-modal cross learning approach to augment the neural network training phase by additional sensor data. The approach is multi-modal during training (i.e., radar Range-Doppler maps, thermal camera images, and RGB camera images are used for training). In inference, the approa...
Gespeichert in:
| Veröffentlicht in: | IEEE access Jg. 9; S. 22295 - 22303 |
|---|---|
| Hauptverfasser: | , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Piscataway
IEEE
2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Schlagworte: | |
| ISSN: | 2169-3536, 2169-3536 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | This paper proposes a multi-modal cross learning approach to augment the neural network training phase by additional sensor data. The approach is multi-modal during training (i.e., radar Range-Doppler maps, thermal camera images, and RGB camera images are used for training). In inference, the approach is single-modal (i.e., only radar Range-Doppler maps are needed for classification). The proposed approach uses a multi-modal autoencoder training which creates a compressed data representation containing correlated features across modalities. The encoder part is then used as a pretrained network for the classification task. The benefits are that expensive sensors like high resolution thermal cameras are not needed in the application but a higher classification accuracy is achieved because of the multi-modal cross learning during training. The autoencoders can also be used to generate hallucinated data of the absent sensors. The hallucinated data can be used for user interfaces, a further classification, or other tasks. The proposed approach is verified within a simultaneous cooking process classification, <inline-formula> <tex-math notation="LaTeX">2\times 2 </tex-math></inline-formula> cooktop occupancy detection, and gesture recognition task. The main functionality is an overboil protection and gesture control of a <inline-formula> <tex-math notation="LaTeX">2\times 2 </tex-math></inline-formula> cooktop. The multi-modal cross learning approach considerably outperforms single-modal approaches on that challenging classification task. |
|---|---|
| Bibliographie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 2169-3536 2169-3536 |
| DOI: | 10.1109/ACCESS.2021.3056878 |