BEVCon: Advancing Bird's Eye View Perception with Contrastive Learning

We present BEVCon, a simple yet effective contrastive learning framework designed to improve Bird's Eye View (BEV) perception in autonomous driving. BEV perception offers a top-down-view representation of the surrounding environment, making it crucial for 3D object detection, segmentation, and...

Full description

Saved in:
Bibliographic Details
Published in:IEEE robotics and automation letters Vol. 10; no. 4; pp. 1 - 7
Main Authors: Leng, Ziyang, Yang, Jiawei, Ren, Zhicheng, Zhou, Bolei
Format: Journal Article
Language:English
Published: Piscataway IEEE 01.04.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:2377-3766, 2377-3766
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract We present BEVCon, a simple yet effective contrastive learning framework designed to improve Bird's Eye View (BEV) perception in autonomous driving. BEV perception offers a top-down-view representation of the surrounding environment, making it crucial for 3D object detection, segmentation, and trajectory prediction tasks. While prior work has primarily focused on enhancing BEV encoders and task-specific heads, we address the underexplored potential of representation learning in BEV models. BEVCon introduces two contrastive learning modules: an instance feature contrast module for refining BEV features and a perspective view contrast module that enhances the image backbone. The dense contrastive learning designed on top of detection losses leads to improved feature representations across both the BEV encoder and the backbone. Extensive experiments on the nuScenes dataset demonstrate that BEVCon achieves consistent performance gains, achieving up to +2.4% mAP improvement over state-of-the-art baselines. Our results highlight the critical role of representation learning in BEV perception and offer a complementary avenue to conventional task-specific optimizations. Code and models are available at https://github.com/matthew-leng/BEVCon .
AbstractList We present BEVCon, a simple yet effective contrastive learning framework designed to improve Bird's Eye View (BEV) perception in autonomous driving. BEV perception offers a top-down-view representation of the surrounding environment, making it crucial for 3D object detection, segmentation, and trajectory prediction tasks. While prior work has primarily focused on enhancing BEV encoders and task-specific heads, we address the underexplored potential of representation learning in BEV models. BEVCon introduces two contrastive learning modules: an instance feature contrast module for refining BEV features and a perspective view contrast module that enhances the image backbone. The dense contrastive learning designed on top of detection losses leads to improved feature representations across both the BEV encoder and the backbone. Extensive experiments on the nuScenes dataset demonstrate that BEVCon achieves consistent performance gains, achieving up to +2.4% mAP improvement over state-of-the-art baselines. Our results highlight the critical role of representation learning in BEV perception and offer a complementary avenue to conventional task-specific optimizations.
We present BEVCon, a simple yet effective contrastive learning framework designed to improve Bird's Eye View (BEV) perception in autonomous driving. BEV perception offers a top-down-view representation of the surrounding environment, making it crucial for 3D object detection, segmentation, and trajectory prediction tasks. While prior work has primarily focused on enhancing BEV encoders and task-specific heads, we address the underexplored potential of representation learning in BEV models. BEVCon introduces two contrastive learning modules: an instance feature contrast module for refining BEV features and a perspective view contrast module that enhances the image backbone. The dense contrastive learning designed on top of detection losses leads to improved feature representations across both the BEV encoder and the backbone. Extensive experiments on the nuScenes dataset demonstrate that BEVCon achieves consistent performance gains, achieving up to +2.4% mAP improvement over state-of-the-art baselines. Our results highlight the critical role of representation learning in BEV perception and offer a complementary avenue to conventional task-specific optimizations. Code and models are available at https://github.com/matthew-leng/BEVCon .
Author Leng, Ziyang
Yang, Jiawei
Ren, Zhicheng
Zhou, Bolei
Author_xml – sequence: 1
  givenname: Ziyang
  surname: Leng
  fullname: Leng, Ziyang
  organization: University of California, Los Angeles, USA
– sequence: 2
  givenname: Jiawei
  surname: Yang
  fullname: Yang, Jiawei
  organization: University of Southern California, USA
– sequence: 3
  givenname: Zhicheng
  surname: Ren
  fullname: Ren, Zhicheng
  organization: Aurora Innovation, USA
– sequence: 4
  givenname: Bolei
  surname: Zhou
  fullname: Zhou, Bolei
  organization: University of California, Los Angeles, USA
BookMark eNp9kLFPAjEUhxuDiYjsDg5NHJwOe-1de3UDAmpCojHK2vR677QEe9geEP57S2AgDk7vDb_vvfy-S9RxjQOErlMySFMi72dvwwElNB-wPCOs4GeoS5kQCROcd072C9QPYUEISXMqmMy7aDqazMeNe8DDaqOdse4Tj6yv7gKe7ADPLWzxK3gDq9Y2Dm9t-4VjvPU6tHYDeAbauwhdofNaLwP0j7OHPqaT9_FTMnt5fB4PZ4mhkrZJyYCSsua1qajm1Oi65LkppMlryk2phSlzGTtUtCKGlCnPhOS6KiNQCyCa9dDt4e7KNz9rCK1aNGvv4kvFUi5pJnghYoocUsY3IXio1crbb-13KiVqL0xFYWovTB2FRYT_QYxt9b507GqX_4E3B9ACwMmfQhSZFOwXdg15oQ
CODEN IRALC6
CitedBy_id crossref_primary_10_1016_j_dsp_2025_105518
Cites_doi 10.1109/CVPR42600.2020.01164
10.5555/3495724.3497510
10.1109/ICCV51070.2023.00310
10.1109/CVPR52729.2023.01385
10.5555/3524938.3525087
10.1109/ICCV51070.2023.00637
10.1109/ICCV51070.2023.00302
10.1007/978-3-031-19812-0_31
10.1109/ICCV48922.2021.00986
10.1007/978-3-030-58568-6_12
10.1109/CVPR.2009.5206848
10.1109/ICCV51070.2023.00335
10.1109/CVPR42600.2020.00975
10.1109/CVPR42600.2020.00252
10.1109/CVPR52729.2023.01712
10.1109/ICCV51070.2023.00575
10.1109/LRA.2020.3004325
10.1109/LRA.2022.3146898
10.1109/ICCV48922.2021.00718
10.1007/978-3-030-58452-8_13
10.1109/ICRA48891.2023.10160968
10.1609/aaai.v37i2.25233
10.1109/ICCV.2017.322
10.1007/978-3-031-20077-9_1
10.1609/aaai.v37i1.25185
10.1007/978-3-031-19809-0_7
10.1109/ICRA48891.2023.10160831
10.1109/CVPR.2016.90
10.1109/ICCV48922.2021.00811
10.1109/CVPR52729.2023.01710
10.1109/ICRA57147.2024.10611615
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/LRA.2025.3540386
DatabaseName IEEE Xplore (IEEE)
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList Technology Research Database

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2377-3766
EndPage 7
ExternalDocumentID 10_1109_LRA_2025_3540386
10878497
Genre orig-research
GroupedDBID 0R~
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFS
AGQYO
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
IFIPE
IPLJI
JAVBF
KQ8
M43
M~E
O9-
OCL
RIA
RIE
AAYXX
AGSQL
CITATION
EJD
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c292t-b3e20bf6fcd2a62cafb65c89c5f26cba7cb59354d2d0c0b164796adb6fcf7e0a3
IEDL.DBID RIE
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001453168100007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2377-3766
IngestDate Mon Jun 30 12:32:47 EDT 2025
Sat Nov 29 08:16:58 EST 2025
Tue Nov 18 22:18:42 EST 2025
Wed Aug 27 01:52:53 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 4
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c292t-b3e20bf6fcd2a62cafb65c89c5f26cba7cb59354d2d0c0b164796adb6fcf7e0a3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0009-0007-1817-071X
0000-0003-4030-0684
PQID 3169247687
PQPubID 4437225
PageCount 7
ParticipantIDs crossref_primary_10_1109_LRA_2025_3540386
proquest_journals_3169247687
ieee_primary_10878497
crossref_citationtrail_10_1109_LRA_2025_3540386
PublicationCentury 2000
PublicationDate 2025-04-01
PublicationDateYYYYMMDD 2025-04-01
PublicationDate_xml – month: 04
  year: 2025
  text: 2025-04-01
  day: 01
PublicationDecade 2020
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE robotics and automation letters
PublicationTitleAbbrev LRA
PublicationYear 2025
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref34
Huang (ref42) 2022
ref15
ref37
Yu (ref14) 2023
ref36
ref30
ref11
ref33
ref10
ref32
ref2
ref1
Park (ref31) 2023
ref39
ref16
ref38
ref19
Zbontar (ref35) 2021
Blumenkamp (ref4) 2024
Huang (ref5) 2021
Lin (ref22) 2022
Walke (ref18) 2023
Wilson (ref17) 2021; 1
Khosla (ref12) 2020; 33
ref24
Balasubramanian (ref13) 2022
ref45
ref26
ref25
ref20
ref41
ref21
ref43
Huang (ref28) 2022
ref27
ref29
Chen (ref44) 2020
ref8
ref7
ref9
ref3
ref6
ref40
Wang (ref23) 2022
References_xml – ident: ref15
  doi: 10.1109/CVPR42600.2020.01164
– year: 2022
  ident: ref22
  article-title: Sparse4D: Multi-view 3D object detection with sparse spatial-temporal fusion
– year: 2020
  ident: ref44
  article-title: Improved baselines with momentum contrastive learning
– ident: ref34
  doi: 10.5555/3495724.3497510
– ident: ref29
  doi: 10.1109/ICCV51070.2023.00310
– ident: ref33
  doi: 10.1109/CVPR52729.2023.01385
– ident: ref10
  doi: 10.5555/3524938.3525087
– volume: 1
  volume-title: Proc. Neural Inf. Process. Syst. Track Datasets Benchmarks
  year: 2021
  ident: ref17
  article-title: Argoverse 2: Next generation datasets for self-driving perception and forecasting
– ident: ref30
  doi: 10.1109/ICCV51070.2023.00637
– ident: ref25
  doi: 10.1109/ICCV51070.2023.00302
– ident: ref9
  doi: 10.1007/978-3-031-19812-0_31
– ident: ref32
  doi: 10.1109/ICCV48922.2021.00986
– ident: ref2
  doi: 10.1007/978-3-030-58568-6_12
– start-page: 1723
  volume-title: Proc. Conf. Robot Learn.
  year: 2023
  ident: ref18
  article-title: BridgeData V2: A dataset for robot learning at scale
– ident: ref20
  doi: 10.1109/CVPR.2009.5206848
– year: 2022
  ident: ref42
  article-title: BEVDET4D: Exploit temporal CUES in multi-camera 3D object detection
– ident: ref26
  doi: 10.1109/ICCV51070.2023.00335
– ident: ref11
  doi: 10.1109/CVPR42600.2020.00975
– volume: 33
  start-page: 18661
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  year: 2020
  ident: ref12
  article-title: Supervised contrastive learning
– ident: ref16
  doi: 10.1109/CVPR42600.2020.00252
– ident: ref39
  doi: 10.1109/CVPR52729.2023.01712
– year: 2022
  ident: ref13
  article-title: Contrastive learning for object detection
– start-page: 180
  volume-title: Proc. Conf. Robot Learn.
  year: 2022
  ident: ref23
  article-title: DETR3D: 3D object detection from multi-view images via 3D-to-2D queries
– year: 2022
  ident: ref28
  article-title: TIG-BEV: Multi-view BEV 3D object detection via target inner-geometry learning
  publication-title: arXiv:2212.13979
– ident: ref38
  doi: 10.1109/ICCV51070.2023.00575
– ident: ref1
  doi: 10.1109/LRA.2020.3004325
– ident: ref24
  doi: 10.1109/LRA.2022.3146898
– ident: ref37
  doi: 10.1109/ICCV48922.2021.00718
– ident: ref40
  doi: 10.1007/978-3-030-58452-8_13
– year: 2021
  ident: ref5
  article-title: BEVDET: High-performance multi-camera 3D object detection in bird-eye-view
– ident: ref6
  doi: 10.1109/ICRA48891.2023.10160968
– year: 2023
  ident: ref14
  article-title: ICPC: Instance-conditioned prompting with contrastive learning for semantic segmentation
– ident: ref8
  doi: 10.1609/aaai.v37i2.25233
– start-page: 12310
  volume-title: Proc. Int. Conf. Mach. Learn.
  year: 2021
  ident: ref35
  article-title: Barlow twins: Self-supervised learning via redundancy reduction
– ident: ref41
  doi: 10.1109/ICCV.2017.322
– volume-title: Proc. Conf. Robot Learn.
  year: 2024
  ident: ref4
  article-title: Covis-Net: A cooperative visual spatial foundation model for multi-robot applications
– ident: ref21
  doi: 10.1007/978-3-031-20077-9_1
– ident: ref27
  doi: 10.1609/aaai.v37i1.25185
– ident: ref45
  doi: 10.1007/978-3-031-19809-0_7
– volume-title: Proc. Int. Conf. Learn. Representations
  year: 2023
  ident: ref31
  article-title: Time will tell: New outlooks and a baseline for temporal multi-view 3D object detection
– ident: ref7
  doi: 10.1109/ICRA48891.2023.10160831
– ident: ref43
  doi: 10.1109/CVPR.2016.90
– ident: ref36
  doi: 10.1109/ICCV48922.2021.00811
– ident: ref3
  doi: 10.1109/CVPR52729.2023.01710
– ident: ref19
  doi: 10.1109/ICRA57147.2024.10611615
SSID ssj0001527395
Score 2.2942913
Snippet We present BEVCon, a simple yet effective contrastive learning framework designed to improve Bird's Eye View (BEV) perception in autonomous driving. BEV...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1
SubjectTerms and Categorization
Annotations
Autonomous vehicles
Coders
Contrastive learning
Deep Learning for Visual Perception
Feature extraction
Head
Learning
Modules
Object detection
Object recognition
Perception
Representation learning
Representations
Segmentation
Three-dimensional displays
Training
Transforms
Title BEVCon: Advancing Bird's Eye View Perception with Contrastive Learning
URI https://ieeexplore.ieee.org/document/10878497
https://www.proquest.com/docview/3169247687
Volume 10
WOSCitedRecordID wos001453168100007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 2377-3766
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001527395
  issn: 2377-3766
  databaseCode: RIE
  dateStart: 20160101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2377-3766
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001527395
  issn: 2377-3766
  databaseCode: M~E
  dateStart: 20160101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB6seNCDb7E-Sg6CeFhdk26y8ValxYOKiBZvy-QlBWmlD8WLv90ku9WCKHhbQhKWb5JMZjLzDcCBRC2xyViiBMekKZEnudckCQqnKFpUJroGulfi5iZ_fJS3VbJ6zIWx1sbgM3scPuNbvhnoSXCV-R2ei7wpRQ1qQvAyWevboRKoxGQ2fYpM5cnVXcsbgDQ7Dr4NFrKlZ1RPrKXy4wCOWqWz8s__WYXl6vpIWqW812DO9tdhaYZUcAM65-3uxaB_RmLBZO3byHlvaA5HpP1uSbdn38jtVzQLCX5YEiiqhjgKJx-pCFefNuGh076_uEyqagmJppKOE8UsTZXjThuKnGp0imc6lzpzlGuFQqtMeiQMNalOVeARkxxDFp52wqbItmC-P-jbbSBeWkzJ3J06zJr-_oCGIQqutBGGuRzrcDIFstAVlXioaPFcRJMilYWHvgjQFxX0dTj6GvFS0mj80XczQD3Tr0S5DntTYRXVRhsV7JR7C9LbTGLnl2G7sBhmL6Nt9mB-PJzYfVjQr-PeaNiA2vVHuxFX0ifUwcch
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LaxsxEB5Sp9D20CStQ506qQ6B0sPGa2lXWuWWBJuEOiaU1OS2jF7BENbFj5b--0radWooDeS2CAkt30gazWjmG4BjiVpixliiBMckk8iTwmuSBIVTFC0qE10Dk5EYj4u7O3nTJKvHXBhrbQw-syfhM77lm5leBVeZ3-GFKDIpXsB2nmU0rdO1_rpUApmYzNePkansjb6deROQ5ifBu8FCvvSG8onVVP45gqNeGe4884924W1zgSRntcT3YMtW7-DNBq3gexieDyYXs-qUxJLJ2reR8-ncfF6QwW9LJlP7i9w8xrOQ4IklgaRqjotw9pGGcvW-Dd-Hg9uLy6Spl5BoKukyUczSVDnutKHIqUaneK4LqXNHuVYotMqlR8JQk-pUBSYxyTHk4WknbIpsH1rVrLIfgHh5MSUL13eYZ_4GgYYhCq60EYa5AjvQWwNZ6oZMPNS0eCijUZHK0kNfBujLBvoOfHkc8aMm0niibztAvdGvRrkD3bWwymarLUrW596G9FaTOPjPsE_w6vL2elSOrsZfP8LrMFMde9OF1nK-sofwUv9cThfzo7ie_gCix8k3
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=BEVCon%3A+Advancing+Bird%27s+Eye+View+Perception+with+Contrastive+Learning&rft.jtitle=IEEE+robotics+and+automation+letters&rft.au=Leng%2C+Ziyang&rft.au=Yang%2C+Jiawei&rft.au=Ren%2C+Zhicheng&rft.au=Zhou%2C+Bolei&rft.date=2025-04-01&rft.pub=IEEE&rft.eissn=2377-3766&rft.spage=1&rft.epage=7&rft_id=info:doi/10.1109%2FLRA.2025.3540386&rft.externalDocID=10878497
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2377-3766&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2377-3766&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2377-3766&client=summon