BEVCon: Advancing Bird's Eye View Perception with Contrastive Learning

We present BEVCon, a simple yet effective contrastive learning framework designed to improve Bird's Eye View (BEV) perception in autonomous driving. BEV perception offers a top-down-view representation of the surrounding environment, making it crucial for 3D object detection, segmentation, and...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	IEEE robotics and automation letters Ročník 10; číslo 4; s. 1 - 7
Hlavní autori:	Leng, Ziyang, Yang, Jiawei, Ren, Zhicheng, Zhou, Bolei
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Piscataway IEEE 01.04.2025 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:	and Categorization Annotations Autonomous vehicles Coders Contrastive learning Deep Learning for Visual Perception Feature extraction Head Learning Modules Object detection Object recognition Perception Representation learning Representations Segmentation Three-dimensional displays Training Transforms
ISSN:	2377-3766, 2377-3766
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	We present BEVCon, a simple yet effective contrastive learning framework designed to improve Bird's Eye View (BEV) perception in autonomous driving. BEV perception offers a top-down-view representation of the surrounding environment, making it crucial for 3D object detection, segmentation, and trajectory prediction tasks. While prior work has primarily focused on enhancing BEV encoders and task-specific heads, we address the underexplored potential of representation learning in BEV models. BEVCon introduces two contrastive learning modules: an instance feature contrast module for refining BEV features and a perspective view contrast module that enhances the image backbone. The dense contrastive learning designed on top of detection losses leads to improved feature representations across both the BEV encoder and the backbone. Extensive experiments on the nuScenes dataset demonstrate that BEVCon achieves consistent performance gains, achieving up to +2.4% mAP improvement over state-of-the-art baselines. Our results highlight the critical role of representation learning in BEV perception and offer a complementary avenue to conventional task-specific optimizations. Code and models are available at https://github.com/matthew-leng/BEVCon .
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2377-3766 2377-3766
DOI:	10.1109/LRA.2025.3540386