SMAB: Simple Multimodal Attention for Effective BEV Fusion

Uložené v:
Podrobná bibliografia
Názov: SMAB: Simple Multimodal Attention for Effective BEV Fusion
Autori: Mustajbasic, Amer, 1976, Chen, Shuangshuang, Stenborg, Erik, Selpi, Selpi, 1977
Zdroj: Djupt multimodalt lärande för fordonstillämpningar 36th IEEE Intelligent Vehicles Symposium, IV 2025, Cluj - Napoca, Romania IEEE Intelligent Vehicles Symposium, Proceedings. :1766-1772
Predmety: lightweight sensor fusion architecture, radar, multimodal fusion, camera, sparse signal fusion, multimodal BEV fusion, sensor fusion, multimodal attention BEV, Multimodal learning, lidar, deep learning, BEV feature aggregation, BEV
Popis: Sensor fusion plays a crucial role in accurate and robust environment perception for autonomous driving. Recent works utilize Bird's-Eye-View (BEV) grid as a 3D representation, however, only using a partial set of multimodal signals. This paper introduces Simple-Multimodal-Attention-BEV (SMAB), a novel and simple approach to multimodal sensor fusion in BEV perception. We propose an attention mechanism called BEV feature aggregation that effectively enhances BEV feature representations. It integrates bilinearly interpolated semantic data from cameras with rasterized distance information from radars and/or lidars, and facilitates training with full-modality data or partial-modality data without modification of the method. In addition to the simplicity of the design, we demonstrate that using all sensor modalities improves segmentation accuracy. Meanwhile, SMAB is resilient to sporadic sensor signal loss, which enhances the robustness of the perception system. The proposed method outperforms state-of-the-art methods while simplifying the model.
Popis súboru: electronic
Prístupová URL adresa: https://research.chalmers.se/publication/548078
https://research.chalmers.se/publication/546739
https://research.chalmers.se/publication/548078/file/548078_Fulltext.pdf
Databáza: SwePub
Popis
Abstrakt:Sensor fusion plays a crucial role in accurate and robust environment perception for autonomous driving. Recent works utilize Bird's-Eye-View (BEV) grid as a 3D representation, however, only using a partial set of multimodal signals. This paper introduces Simple-Multimodal-Attention-BEV (SMAB), a novel and simple approach to multimodal sensor fusion in BEV perception. We propose an attention mechanism called BEV feature aggregation that effectively enhances BEV feature representations. It integrates bilinearly interpolated semantic data from cameras with rasterized distance information from radars and/or lidars, and facilitates training with full-modality data or partial-modality data without modification of the method. In addition to the simplicity of the design, we demonstrate that using all sensor modalities improves segmentation accuracy. Meanwhile, SMAB is resilient to sporadic sensor signal loss, which enhances the robustness of the perception system. The proposed method outperforms state-of-the-art methods while simplifying the model.
ISSN:19310587
26427214
DOI:10.1109/IV64158.2025.11097770