BEVCon: Advancing Bird's Eye View Perception with Contrastive Learning
We present BEVCon, a simple yet effective contrastive learning framework designed to improve Bird's Eye View (BEV) perception in autonomous driving. BEV perception offers a top-down-view representation of the surrounding environment, making it crucial for 3D object detection, segmentation, and...
Saved in:
| Published in: | IEEE robotics and automation letters Vol. 10; no. 4; pp. 1 - 7 |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Piscataway
IEEE
01.04.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 2377-3766, 2377-3766 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | We present BEVCon, a simple yet effective contrastive learning framework designed to improve Bird's Eye View (BEV) perception in autonomous driving. BEV perception offers a top-down-view representation of the surrounding environment, making it crucial for 3D object detection, segmentation, and trajectory prediction tasks. While prior work has primarily focused on enhancing BEV encoders and task-specific heads, we address the underexplored potential of representation learning in BEV models. BEVCon introduces two contrastive learning modules: an instance feature contrast module for refining BEV features and a perspective view contrast module that enhances the image backbone. The dense contrastive learning designed on top of detection losses leads to improved feature representations across both the BEV encoder and the backbone. Extensive experiments on the nuScenes dataset demonstrate that BEVCon achieves consistent performance gains, achieving up to +2.4% mAP improvement over state-of-the-art baselines. Our results highlight the critical role of representation learning in BEV perception and offer a complementary avenue to conventional task-specific optimizations. Code and models are available at https://github.com/matthew-leng/BEVCon . |
|---|---|
| AbstractList | We present BEVCon, a simple yet effective contrastive learning framework designed to improve Bird's Eye View (BEV) perception in autonomous driving. BEV perception offers a top-down-view representation of the surrounding environment, making it crucial for 3D object detection, segmentation, and trajectory prediction tasks. While prior work has primarily focused on enhancing BEV encoders and task-specific heads, we address the underexplored potential of representation learning in BEV models. BEVCon introduces two contrastive learning modules: an instance feature contrast module for refining BEV features and a perspective view contrast module that enhances the image backbone. The dense contrastive learning designed on top of detection losses leads to improved feature representations across both the BEV encoder and the backbone. Extensive experiments on the nuScenes dataset demonstrate that BEVCon achieves consistent performance gains, achieving up to +2.4% mAP improvement over state-of-the-art baselines. Our results highlight the critical role of representation learning in BEV perception and offer a complementary avenue to conventional task-specific optimizations. We present BEVCon, a simple yet effective contrastive learning framework designed to improve Bird's Eye View (BEV) perception in autonomous driving. BEV perception offers a top-down-view representation of the surrounding environment, making it crucial for 3D object detection, segmentation, and trajectory prediction tasks. While prior work has primarily focused on enhancing BEV encoders and task-specific heads, we address the underexplored potential of representation learning in BEV models. BEVCon introduces two contrastive learning modules: an instance feature contrast module for refining BEV features and a perspective view contrast module that enhances the image backbone. The dense contrastive learning designed on top of detection losses leads to improved feature representations across both the BEV encoder and the backbone. Extensive experiments on the nuScenes dataset demonstrate that BEVCon achieves consistent performance gains, achieving up to +2.4% mAP improvement over state-of-the-art baselines. Our results highlight the critical role of representation learning in BEV perception and offer a complementary avenue to conventional task-specific optimizations. Code and models are available at https://github.com/matthew-leng/BEVCon . |
| Author | Leng, Ziyang Yang, Jiawei Ren, Zhicheng Zhou, Bolei |
| Author_xml | – sequence: 1 givenname: Ziyang surname: Leng fullname: Leng, Ziyang organization: University of California, Los Angeles, USA – sequence: 2 givenname: Jiawei surname: Yang fullname: Yang, Jiawei organization: University of Southern California, USA – sequence: 3 givenname: Zhicheng surname: Ren fullname: Ren, Zhicheng organization: Aurora Innovation, USA – sequence: 4 givenname: Bolei surname: Zhou fullname: Zhou, Bolei organization: University of California, Los Angeles, USA |
| BookMark | eNp9kLFPAjEUhxuDiYjsDg5NHJwOe-1de3UDAmpCojHK2vR677QEe9geEP57S2AgDk7vDb_vvfy-S9RxjQOErlMySFMi72dvwwElNB-wPCOs4GeoS5kQCROcd072C9QPYUEISXMqmMy7aDqazMeNe8DDaqOdse4Tj6yv7gKe7ADPLWzxK3gDq9Y2Dm9t-4VjvPU6tHYDeAbauwhdofNaLwP0j7OHPqaT9_FTMnt5fB4PZ4mhkrZJyYCSsua1qajm1Oi65LkppMlryk2phSlzGTtUtCKGlCnPhOS6KiNQCyCa9dDt4e7KNz9rCK1aNGvv4kvFUi5pJnghYoocUsY3IXio1crbb-13KiVqL0xFYWovTB2FRYT_QYxt9b507GqX_4E3B9ACwMmfQhSZFOwXdg15oQ |
| CODEN | IRALC6 |
| CitedBy_id | crossref_primary_10_1016_j_dsp_2025_105518 |
| Cites_doi | 10.1109/CVPR42600.2020.01164 10.5555/3495724.3497510 10.1109/ICCV51070.2023.00310 10.1109/CVPR52729.2023.01385 10.5555/3524938.3525087 10.1109/ICCV51070.2023.00637 10.1109/ICCV51070.2023.00302 10.1007/978-3-031-19812-0_31 10.1109/ICCV48922.2021.00986 10.1007/978-3-030-58568-6_12 10.1109/CVPR.2009.5206848 10.1109/ICCV51070.2023.00335 10.1109/CVPR42600.2020.00975 10.1109/CVPR42600.2020.00252 10.1109/CVPR52729.2023.01712 10.1109/ICCV51070.2023.00575 10.1109/LRA.2020.3004325 10.1109/LRA.2022.3146898 10.1109/ICCV48922.2021.00718 10.1007/978-3-030-58452-8_13 10.1109/ICRA48891.2023.10160968 10.1609/aaai.v37i2.25233 10.1109/ICCV.2017.322 10.1007/978-3-031-20077-9_1 10.1609/aaai.v37i1.25185 10.1007/978-3-031-19809-0_7 10.1109/ICRA48891.2023.10160831 10.1109/CVPR.2016.90 10.1109/ICCV48922.2021.00811 10.1109/CVPR52729.2023.01710 10.1109/ICRA57147.2024.10611615 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/LRA.2025.3540386 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 2377-3766 |
| EndPage | 7 |
| ExternalDocumentID | 10_1109_LRA_2025_3540386 10878497 |
| Genre | orig-research |
| GroupedDBID | 0R~ 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFS AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS IFIPE IPLJI JAVBF KQ8 M43 M~E O9- OCL RIA RIE AAYXX AGSQL CITATION EJD 7SC 7SP 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c292t-b3e20bf6fcd2a62cafb65c89c5f26cba7cb59354d2d0c0b164796adb6fcf7e0a3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001453168100007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2377-3766 |
| IngestDate | Mon Jun 30 12:32:47 EDT 2025 Sat Nov 29 08:16:58 EST 2025 Tue Nov 18 22:18:42 EST 2025 Wed Aug 27 01:52:53 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 4 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c292t-b3e20bf6fcd2a62cafb65c89c5f26cba7cb59354d2d0c0b164796adb6fcf7e0a3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0009-0007-1817-071X 0000-0003-4030-0684 |
| PQID | 3169247687 |
| PQPubID | 4437225 |
| PageCount | 7 |
| ParticipantIDs | crossref_primary_10_1109_LRA_2025_3540386 proquest_journals_3169247687 ieee_primary_10878497 crossref_citationtrail_10_1109_LRA_2025_3540386 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-04-01 |
| PublicationDateYYYYMMDD | 2025-04-01 |
| PublicationDate_xml | – month: 04 year: 2025 text: 2025-04-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | Piscataway |
| PublicationPlace_xml | – name: Piscataway |
| PublicationTitle | IEEE robotics and automation letters |
| PublicationTitleAbbrev | LRA |
| PublicationYear | 2025 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref34 Huang (ref42) 2022 ref15 ref37 Yu (ref14) 2023 ref36 ref30 ref11 ref33 ref10 ref32 ref2 ref1 Park (ref31) 2023 ref39 ref16 ref38 ref19 Zbontar (ref35) 2021 Blumenkamp (ref4) 2024 Huang (ref5) 2021 Lin (ref22) 2022 Walke (ref18) 2023 Wilson (ref17) 2021; 1 Khosla (ref12) 2020; 33 ref24 Balasubramanian (ref13) 2022 ref45 ref26 ref25 ref20 ref41 ref21 ref43 Huang (ref28) 2022 ref27 ref29 Chen (ref44) 2020 ref8 ref7 ref9 ref3 ref6 ref40 Wang (ref23) 2022 |
| References_xml | – ident: ref15 doi: 10.1109/CVPR42600.2020.01164 – year: 2022 ident: ref22 article-title: Sparse4D: Multi-view 3D object detection with sparse spatial-temporal fusion – year: 2020 ident: ref44 article-title: Improved baselines with momentum contrastive learning – ident: ref34 doi: 10.5555/3495724.3497510 – ident: ref29 doi: 10.1109/ICCV51070.2023.00310 – ident: ref33 doi: 10.1109/CVPR52729.2023.01385 – ident: ref10 doi: 10.5555/3524938.3525087 – volume: 1 volume-title: Proc. Neural Inf. Process. Syst. Track Datasets Benchmarks year: 2021 ident: ref17 article-title: Argoverse 2: Next generation datasets for self-driving perception and forecasting – ident: ref30 doi: 10.1109/ICCV51070.2023.00637 – ident: ref25 doi: 10.1109/ICCV51070.2023.00302 – ident: ref9 doi: 10.1007/978-3-031-19812-0_31 – ident: ref32 doi: 10.1109/ICCV48922.2021.00986 – ident: ref2 doi: 10.1007/978-3-030-58568-6_12 – start-page: 1723 volume-title: Proc. Conf. Robot Learn. year: 2023 ident: ref18 article-title: BridgeData V2: A dataset for robot learning at scale – ident: ref20 doi: 10.1109/CVPR.2009.5206848 – year: 2022 ident: ref42 article-title: BEVDET4D: Exploit temporal CUES in multi-camera 3D object detection – ident: ref26 doi: 10.1109/ICCV51070.2023.00335 – ident: ref11 doi: 10.1109/CVPR42600.2020.00975 – volume: 33 start-page: 18661 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2020 ident: ref12 article-title: Supervised contrastive learning – ident: ref16 doi: 10.1109/CVPR42600.2020.00252 – ident: ref39 doi: 10.1109/CVPR52729.2023.01712 – year: 2022 ident: ref13 article-title: Contrastive learning for object detection – start-page: 180 volume-title: Proc. Conf. Robot Learn. year: 2022 ident: ref23 article-title: DETR3D: 3D object detection from multi-view images via 3D-to-2D queries – year: 2022 ident: ref28 article-title: TIG-BEV: Multi-view BEV 3D object detection via target inner-geometry learning publication-title: arXiv:2212.13979 – ident: ref38 doi: 10.1109/ICCV51070.2023.00575 – ident: ref1 doi: 10.1109/LRA.2020.3004325 – ident: ref24 doi: 10.1109/LRA.2022.3146898 – ident: ref37 doi: 10.1109/ICCV48922.2021.00718 – ident: ref40 doi: 10.1007/978-3-030-58452-8_13 – year: 2021 ident: ref5 article-title: BEVDET: High-performance multi-camera 3D object detection in bird-eye-view – ident: ref6 doi: 10.1109/ICRA48891.2023.10160968 – year: 2023 ident: ref14 article-title: ICPC: Instance-conditioned prompting with contrastive learning for semantic segmentation – ident: ref8 doi: 10.1609/aaai.v37i2.25233 – start-page: 12310 volume-title: Proc. Int. Conf. Mach. Learn. year: 2021 ident: ref35 article-title: Barlow twins: Self-supervised learning via redundancy reduction – ident: ref41 doi: 10.1109/ICCV.2017.322 – volume-title: Proc. Conf. Robot Learn. year: 2024 ident: ref4 article-title: Covis-Net: A cooperative visual spatial foundation model for multi-robot applications – ident: ref21 doi: 10.1007/978-3-031-20077-9_1 – ident: ref27 doi: 10.1609/aaai.v37i1.25185 – ident: ref45 doi: 10.1007/978-3-031-19809-0_7 – volume-title: Proc. Int. Conf. Learn. Representations year: 2023 ident: ref31 article-title: Time will tell: New outlooks and a baseline for temporal multi-view 3D object detection – ident: ref7 doi: 10.1109/ICRA48891.2023.10160831 – ident: ref43 doi: 10.1109/CVPR.2016.90 – ident: ref36 doi: 10.1109/ICCV48922.2021.00811 – ident: ref3 doi: 10.1109/CVPR52729.2023.01710 – ident: ref19 doi: 10.1109/ICRA57147.2024.10611615 |
| SSID | ssj0001527395 |
| Score | 2.294195 |
| Snippet | We present BEVCon, a simple yet effective contrastive learning framework designed to improve Bird's Eye View (BEV) perception in autonomous driving. BEV... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 1 |
| SubjectTerms | and Categorization Annotations Autonomous vehicles Coders Contrastive learning Deep Learning for Visual Perception Feature extraction Head Learning Modules Object detection Object recognition Perception Representation learning Representations Segmentation Three-dimensional displays Training Transforms |
| Title | BEVCon: Advancing Bird's Eye View Perception with Contrastive Learning |
| URI | https://ieeexplore.ieee.org/document/10878497 https://www.proquest.com/docview/3169247687 |
| Volume | 10 |
| WOSCitedRecordID | wos001453168100007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 2377-3766 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001527395 issn: 2377-3766 databaseCode: RIE dateStart: 20160101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2377-3766 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001527395 issn: 2377-3766 databaseCode: M~E dateStart: 20160101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFH448aAHf06czpGDIB66dW3zy9uUDQ9zDNGxW0nSRAayyToVL_7tJmmnA1HwVkJSype-JO_lfd8DOAtVmxomkkBm1sgTa04B40wEmEVcE0NZrL2Ia58OBmw85sOSrO65MFprn3ymm-7R3-VnM_XiQmXWwhllCacVqFBKCrLWd0DFSYlxvLyKDHmrf9exDmCEmy62ETu29MrW42up_FiA_a7S2_nn9-zCdnl8RJ1ivvdgTU_3YWtFVPAAelfd0fVseol8wWRl29DVZJ6d56j7rtFoot_Q8CubBbk4LHISVXORu5UPlYKrj1V46HXvr2-CslpCoCIeLQJpUQ2lIUZlkSCREkYSrBhX2ERESUGVxNwikUVZqELpdMQ4EY6FpwzVoYgPYX06m-ojQEYoQ3AsBKYywSYRjHKJZSJMnAlNRQ1aSyBTVUqJu4oWT6l3KUKeWuhTB31aQl-Di68Rz4WMxh99qw7qlX4FyjWoLycrLQ0tT-M2sR6k9Zno8S_DTmDTvb3ItqnD-mL-ok9hQ70uJvm8AZXbj27D_0mfKXnG3g |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bS8MwFD54A_XB68R5zYMgPnTr0qZJfFPZUJxDZI69lSRNZCCdbFPx35uknQ5EwbcSElq-9CQ5J-f7DsBJqBrUMBEHMrNGHltzChhnIiAMc50YyiLtRVzbtNNh_T6_L8nqngujtfbJZ7rmHv1dfjZUry5UZi2cURZzOg-LJI5xWNC1vkMqTkyMk-llZMjr7YcL6wJiUnPRjcjxpWc2H19N5ccS7PeV1vo_v2gD1soDJLooZnwT5nS-BaszsoLb0Lps9q6G-TnyJZOVbUOXg1F2OkbND416A_2O7r_yWZCLxCInUjUSY7f2oVJy9akCj61m9-o6KOslBApzPAmkxTWUJjEqwyLBShiZEMW4IgYnSgqqJOEWiQxnoQqlUxLjiXA8PGWoDkW0Awv5MNe7gIxQJiGREITKmJhYMMolkbEwUSY0FVWoT4FMVSkm7mpaPKfeqQh5aqFPHfRpCX0Vzr5GvBRCGn_0rTioZ_oVKFfhYDpZaWlq4zRqJNaHtF4T3ftl2DEsX3fv2mn7pnO7DyvuTUXuzQEsTEav-hCW1NtkMB4d-f_pE_QqyPQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=BEVCon%3A+Advancing+Bird%27s+Eye+View+Perception+With+Contrastive+Learning&rft.jtitle=IEEE+robotics+and+automation+letters&rft.au=Leng%2C+Ziyang&rft.au=Yang%2C+Jiawei&rft.au=Ren%2C+Zhicheng&rft.au=Zhou%2C+Bolei&rft.date=2025-04-01&rft.issn=2377-3766&rft.eissn=2377-3766&rft.volume=10&rft.issue=4&rft.spage=3158&rft.epage=3165&rft_id=info:doi/10.1109%2FLRA.2025.3540386&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_LRA_2025_3540386 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2377-3766&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2377-3766&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2377-3766&client=summon |