SMAB: Simple Multimodal Attention for Effective BEV Fusion
Uložené v:
| Názov: | SMAB: Simple Multimodal Attention for Effective BEV Fusion |
|---|---|
| Autori: | Mustajbasic, Amer, 1976, Chen, Shuangshuang, Stenborg, Erik, Selpi, Selpi, 1977 |
| Zdroj: | Djupt multimodalt lärande för fordonstillämpningar 36th IEEE Intelligent Vehicles Symposium, IV 2025, Cluj - Napoca, Romania IEEE Intelligent Vehicles Symposium, Proceedings. :1766-1772 |
| Predmety: | lightweight sensor fusion architecture, radar, multimodal fusion, camera, sparse signal fusion, multimodal BEV fusion, sensor fusion, multimodal attention BEV, Multimodal learning, lidar, deep learning, BEV feature aggregation, BEV |
| Popis: | Sensor fusion plays a crucial role in accurate and robust environment perception for autonomous driving. Recent works utilize Bird's-Eye-View (BEV) grid as a 3D representation, however, only using a partial set of multimodal signals. This paper introduces Simple-Multimodal-Attention-BEV (SMAB), a novel and simple approach to multimodal sensor fusion in BEV perception. We propose an attention mechanism called BEV feature aggregation that effectively enhances BEV feature representations. It integrates bilinearly interpolated semantic data from cameras with rasterized distance information from radars and/or lidars, and facilitates training with full-modality data or partial-modality data without modification of the method. In addition to the simplicity of the design, we demonstrate that using all sensor modalities improves segmentation accuracy. Meanwhile, SMAB is resilient to sporadic sensor signal loss, which enhances the robustness of the perception system. The proposed method outperforms state-of-the-art methods while simplifying the model. |
| Popis súboru: | electronic |
| Prístupová URL adresa: | https://research.chalmers.se/publication/548078 https://research.chalmers.se/publication/546739 https://research.chalmers.se/publication/548078/file/548078_Fulltext.pdf |
| Databáza: | SwePub |
| Abstrakt: | Sensor fusion plays a crucial role in accurate and robust environment perception for autonomous driving. Recent works utilize Bird's-Eye-View (BEV) grid as a 3D representation, however, only using a partial set of multimodal signals. This paper introduces Simple-Multimodal-Attention-BEV (SMAB), a novel and simple approach to multimodal sensor fusion in BEV perception. We propose an attention mechanism called BEV feature aggregation that effectively enhances BEV feature representations. It integrates bilinearly interpolated semantic data from cameras with rasterized distance information from radars and/or lidars, and facilitates training with full-modality data or partial-modality data without modification of the method. In addition to the simplicity of the design, we demonstrate that using all sensor modalities improves segmentation accuracy. Meanwhile, SMAB is resilient to sporadic sensor signal loss, which enhances the robustness of the perception system. The proposed method outperforms state-of-the-art methods while simplifying the model. |
|---|---|
| ISSN: | 19310587 26427214 |
| DOI: | 10.1109/IV64158.2025.11097770 |
Full Text Finder
Nájsť tento článok vo Web of Science