Embracing Single Stride 3D Object Detector with Sparse Transformer
In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases. Over-looking this difference, many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps...
Uložené v:
| Vydané v: | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) s. 8448 - 8458 |
|---|---|
| Hlavní autori: | , , , , , , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
01.06.2022
|
| Predmet: | |
| ISSN: | 1063-6919 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases. Over-looking this difference, many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds. In this paper, we start by rethinking how such multi-stride stereotype affects the LiDAR-based 3D object detectors. Our experiments point out that the downsampling operations bring few advantages, and lead to inevitable information loss. To remedy this issue, we propose Single-stride Sparse Transformer (SST) to maintain the original resolution from the beginning to the end of the network. Armed with transformers, our method addresses the problem of insufficient receptive field in single-stride architectures. It also cooperates well with the sparsity of point clouds and naturally avoids expensive computation. Eventually, our SST achieves state-of-the-art results on the large-scale Waymo Open Dataset. It is worth mentioning that our method can achieve exciting performance (83.8 LEVEL_1 AP on validation split) on small object (pedestrian) detection due to the characteristic of single stride. Our codes will be public soon. |
|---|---|
| AbstractList | In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases. Over-looking this difference, many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds. In this paper, we start by rethinking how such multi-stride stereotype affects the LiDAR-based 3D object detectors. Our experiments point out that the downsampling operations bring few advantages, and lead to inevitable information loss. To remedy this issue, we propose Single-stride Sparse Transformer (SST) to maintain the original resolution from the beginning to the end of the network. Armed with transformers, our method addresses the problem of insufficient receptive field in single-stride architectures. It also cooperates well with the sparsity of point clouds and naturally avoids expensive computation. Eventually, our SST achieves state-of-the-art results on the large-scale Waymo Open Dataset. It is worth mentioning that our method can achieve exciting performance (83.8 LEVEL_1 AP on validation split) on small object (pedestrian) detection due to the characteristic of single stride. Our codes will be public soon. |
| Author | Pang, Ziqi Fan, Lue Zhao, Hang Wang, Naiyan Zhang, Tianyuan Wang, Yu-Xiong Zhang, Zhaoxiang Wang, Feng |
| Author_xml | – sequence: 1 givenname: Lue surname: Fan fullname: Fan, Lue email: fanlue2019@ia.ac.cn organization: Institute of Automation, Chinese Academy of Sciences – sequence: 2 givenname: Ziqi surname: Pang fullname: Pang, Ziqi email: ziqip2@illinois.edu organization: University of Illinois Urbana-Champaign – sequence: 3 givenname: Tianyuan surname: Zhang fullname: Zhang, Tianyuan email: tianyuaz@andrew.cmu.edu organization: Carnegie Mellon University – sequence: 4 givenname: Yu-Xiong surname: Wang fullname: Wang, Yu-Xiong email: yxw@illinois.edu organization: Institute of Automation, Chinese Academy of Sciences – sequence: 5 givenname: Hang surname: Zhao fullname: Zhao, Hang email: hangzhao@mail.tsinghua.edu.cn organization: Tsinghua University – sequence: 6 givenname: Feng surname: Wang fullname: Wang, Feng email: feng.wff@gmail.com organization: Institute of Automation, Chinese Academy of Sciences – sequence: 7 givenname: Naiyan surname: Wang fullname: Wang, Naiyan email: winsty@gmail.com organization: TuSimple – sequence: 8 givenname: Zhaoxiang surname: Zhang fullname: Zhang, Zhaoxiang email: zhaoxiang.zhang@ia.ac.cn organization: Institute of Automation, Chinese Academy of Sciences |
| BookMark | eNotzM9OwkAQgPHVaCIgT6CHfYHWmd12dveogH8SEoygVzJtp1pCC9k2Mb69JHr5frdvrC66QydK3SKkiBDuZh-vb7kh71MDxqQA3rgzNUaiPKOQkT1XIwSyCQUMV2ra9zsAsAaRgh-ph0VbRC6b7lOvT9mLXg-xqUTbuV4VOykHPZfhxCHq72b40usjx170JnLX14fYSrxWlzXve5n-O1Hvj4vN7DlZrp5eZvfLpDFghwSr3Bphz2iYCzRYZDXWDBZ8xmWBJDlTLbYiR5krgy8cc03E4Cvv0NiJuvn7NiKyPcam5fizDd5573L7C8GITFE |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/CVPR52688.2022.00827 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Applied Sciences |
| EISBN | 1665469463 9781665469463 |
| EISSN | 1063-6919 |
| EndPage | 8458 |
| ExternalDocumentID | 9878875 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO |
| ID | FETCH-LOGICAL-i203t-1d532ea8a12aab121b4f1fa03084acb16e5a6fe3d67647c98b7aaf66a08d87123 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 167 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000870759101049&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:15:11 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i203t-1d532ea8a12aab121b4f1fa03084acb16e5a6fe3d67647c98b7aaf66a08d87123 |
| PageCount | 11 |
| ParticipantIDs | ieee_primary_9878875 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-June |
| PublicationDateYYYYMMDD | 2022-06-01 |
| PublicationDate_xml | – month: 06 year: 2022 text: 2022-June |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) |
| PublicationTitleAbbrev | CVPR |
| PublicationYear | 2022 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0003211698 |
| Score | 2.6384342 |
| Snippet | In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 8448 |
| SubjectTerms | 3D from multi-view and sensors; Navigation and autonomous driving Detectors Navigation Object detection Point cloud compression Sensor phenomena and characterization Three-dimensional displays Transformers |
| Title | Embracing Single Stride 3D Object Detector with Sparse Transformer |
| URI | https://ieeexplore.ieee.org/document/9878875 |
| WOSCitedRecordID | wos000870759101049&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB3a4sFT1Va0fpCDR9fuJrtJ9mo_8CC12Fp6K8lmAgVty3br7zfZrhXBi5cQQkhgQsi8l5k3AHdJrG0mTRpQy2UQo2CBVIkNjOReGkRxVUa7z57FaCTn83Rcg_tDLgwilsFn-OC75V--WWc7T5V1HT52dyKpQ10Ivs_VOvApzCEZnsoqOy4K025vNn71YiY-gIt6WU5Jf9dQKZ-QYfN_m59A-ycXj4wPr8wp1HB1Bs3KeSTV1dy24HHw4YBv5uaQiWvekUyKfGmQsD550Z5tIX0sSo6eePKVTDYO0yKZfnuumLfhbTiY9p6CqkBCsKQhK4LIJIyikiqiSumIRjq2kVVegiZWmY44OmNbZIYLHosslVooZTlXoTQOKFF2Do3VeoUXQJwXo0WoQ1_FNHaYJOU2DI2bHbsFGROX0PImWWz2GhiLyhqdv4ev4NjbfB9SdQ2NIt_hDRxln8Vym9-WB_cFNOSZLQ |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0gmugJFYzf9uDRlW7b7Xav8hGMiESQcCPttk1IFAgs_n7bZcWYePHSNE3TJtM0nfc68wbgNmLKpkInAbFcBMzENBAysoEW3EuDSC7zaPdRN-71xHic9Etwt82FMcbkwWfm3nfzv3w9T9eeKqs7fOzuRLQDuxFjBG-ytbaMCnVYhieiyI8LcVJvjPqvXs7Eh3ARL8wpyO8qKvkj0q78b_tDqP1k46H-9p05gpKZHUOlcB9RcTlXVXhofTjom7o5aOCad4MG2XKqDaJN9KI834KaJstZeuTpVzRYOFRr0PDbdzXLGry1W8NGJyhKJARTgmkWhDqixEghQyKlCkmomA2t9CI0TKYq5MaZ2xqqecxZnCZCxVJaziUW2kElQk-gPJvPzCkg58eoGCvs65gyh0oSbjHWbjZzC1Ian0HVm2Sy2KhgTAprnP89fAP7neFzd9J97D1dwIG3_ybA6hLK2XJtrmAv_cymq-V1fohfXfGcdA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=Embracing+Single+Stride+3D+Object+Detector+with+Sparse+Transformer&rft.au=Fan%2C+Lue&rft.au=Pang%2C+Ziqi&rft.au=Zhang%2C+Tianyuan&rft.au=Wang%2C+Yu-Xiong&rft.date=2022-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=8448&rft.epage=8458&rft_id=info:doi/10.1109%2FCVPR52688.2022.00827&rft.externalDocID=9878875 |