Embracing Single Stride 3D Object Detector with Sparse Transformer

In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases. Over-looking this difference, many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) s. 8448 - 8458
Hlavní autori: Fan, Lue, Pang, Ziqi, Zhang, Tianyuan, Wang, Yu-Xiong, Zhao, Hang, Wang, Feng, Wang, Naiyan, Zhang, Zhaoxiang
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 01.06.2022
Predmet:
ISSN:1063-6919
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases. Over-looking this difference, many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds. In this paper, we start by rethinking how such multi-stride stereotype affects the LiDAR-based 3D object detectors. Our experiments point out that the downsampling operations bring few advantages, and lead to inevitable information loss. To remedy this issue, we propose Single-stride Sparse Transformer (SST) to maintain the original resolution from the beginning to the end of the network. Armed with transformers, our method addresses the problem of insufficient receptive field in single-stride architectures. It also cooperates well with the sparsity of point clouds and naturally avoids expensive computation. Eventually, our SST achieves state-of-the-art results on the large-scale Waymo Open Dataset. It is worth mentioning that our method can achieve exciting performance (83.8 LEVEL_1 AP on validation split) on small object (pedestrian) detection due to the characteristic of single stride. Our codes will be public soon.
AbstractList In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases. Over-looking this difference, many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds. In this paper, we start by rethinking how such multi-stride stereotype affects the LiDAR-based 3D object detectors. Our experiments point out that the downsampling operations bring few advantages, and lead to inevitable information loss. To remedy this issue, we propose Single-stride Sparse Transformer (SST) to maintain the original resolution from the beginning to the end of the network. Armed with transformers, our method addresses the problem of insufficient receptive field in single-stride architectures. It also cooperates well with the sparsity of point clouds and naturally avoids expensive computation. Eventually, our SST achieves state-of-the-art results on the large-scale Waymo Open Dataset. It is worth mentioning that our method can achieve exciting performance (83.8 LEVEL_1 AP on validation split) on small object (pedestrian) detection due to the characteristic of single stride. Our codes will be public soon.
Author Pang, Ziqi
Fan, Lue
Zhao, Hang
Wang, Naiyan
Zhang, Tianyuan
Wang, Yu-Xiong
Zhang, Zhaoxiang
Wang, Feng
Author_xml – sequence: 1
  givenname: Lue
  surname: Fan
  fullname: Fan, Lue
  email: fanlue2019@ia.ac.cn
  organization: Institute of Automation, Chinese Academy of Sciences
– sequence: 2
  givenname: Ziqi
  surname: Pang
  fullname: Pang, Ziqi
  email: ziqip2@illinois.edu
  organization: University of Illinois Urbana-Champaign
– sequence: 3
  givenname: Tianyuan
  surname: Zhang
  fullname: Zhang, Tianyuan
  email: tianyuaz@andrew.cmu.edu
  organization: Carnegie Mellon University
– sequence: 4
  givenname: Yu-Xiong
  surname: Wang
  fullname: Wang, Yu-Xiong
  email: yxw@illinois.edu
  organization: Institute of Automation, Chinese Academy of Sciences
– sequence: 5
  givenname: Hang
  surname: Zhao
  fullname: Zhao, Hang
  email: hangzhao@mail.tsinghua.edu.cn
  organization: Tsinghua University
– sequence: 6
  givenname: Feng
  surname: Wang
  fullname: Wang, Feng
  email: feng.wff@gmail.com
  organization: Institute of Automation, Chinese Academy of Sciences
– sequence: 7
  givenname: Naiyan
  surname: Wang
  fullname: Wang, Naiyan
  email: winsty@gmail.com
  organization: TuSimple
– sequence: 8
  givenname: Zhaoxiang
  surname: Zhang
  fullname: Zhang, Zhaoxiang
  email: zhaoxiang.zhang@ia.ac.cn
  organization: Institute of Automation, Chinese Academy of Sciences
BookMark eNotzM9OwkAQgPHVaCIgT6CHfYHWmd12dveogH8SEoygVzJtp1pCC9k2Mb69JHr5frdvrC66QydK3SKkiBDuZh-vb7kh71MDxqQA3rgzNUaiPKOQkT1XIwSyCQUMV2ra9zsAsAaRgh-ph0VbRC6b7lOvT9mLXg-xqUTbuV4VOykHPZfhxCHq72b40usjx170JnLX14fYSrxWlzXve5n-O1Hvj4vN7DlZrp5eZvfLpDFghwSr3Bphz2iYCzRYZDXWDBZ8xmWBJDlTLbYiR5krgy8cc03E4Cvv0NiJuvn7NiKyPcam5fizDd5573L7C8GITFE
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR52688.2022.00827
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 1665469463
9781665469463
EISSN 1063-6919
EndPage 8458
ExternalDocumentID 9878875
Genre orig-research
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i203t-1d532ea8a12aab121b4f1fa03084acb16e5a6fe3d67647c98b7aaf66a08d87123
IEDL.DBID RIE
ISICitedReferencesCount 167
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000870759101049&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:15:11 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-1d532ea8a12aab121b4f1fa03084acb16e5a6fe3d67647c98b7aaf66a08d87123
PageCount 11
ParticipantIDs ieee_primary_9878875
PublicationCentury 2000
PublicationDate 2022-June
PublicationDateYYYYMMDD 2022-06-01
PublicationDate_xml – month: 06
  year: 2022
  text: 2022-June
PublicationDecade 2020
PublicationTitle Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev CVPR
PublicationYear 2022
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003211698
Score 2.6384342
Snippet In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection...
SourceID ieee
SourceType Publisher
StartPage 8448
SubjectTerms 3D from multi-view and sensors; Navigation and autonomous driving
Detectors
Navigation
Object detection
Point cloud compression
Sensor phenomena and characterization
Three-dimensional displays
Transformers
Title Embracing Single Stride 3D Object Detector with Sparse Transformer
URI https://ieeexplore.ieee.org/document/9878875
WOSCitedRecordID wos000870759101049&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB3a4sFT1Va0fpCDR9fuJrtJ9mo_8CC12Fp6K8lmAgVty3br7zfZrhXBi5cQQkhgQsi8l5k3AHdJrG0mTRpQy2UQo2CBVIkNjOReGkRxVUa7z57FaCTn83Rcg_tDLgwilsFn-OC75V--WWc7T5V1HT52dyKpQ10Ivs_VOvApzCEZnsoqOy4K025vNn71YiY-gIt6WU5Jf9dQKZ-QYfN_m59A-ycXj4wPr8wp1HB1Bs3KeSTV1dy24HHw4YBv5uaQiWvekUyKfGmQsD550Z5tIX0sSo6eePKVTDYO0yKZfnuumLfhbTiY9p6CqkBCsKQhK4LIJIyikiqiSumIRjq2kVVegiZWmY44OmNbZIYLHosslVooZTlXoTQOKFF2Do3VeoUXQJwXo0WoQ1_FNHaYJOU2DI2bHbsFGROX0PImWWz2GhiLyhqdv4ev4NjbfB9SdQ2NIt_hDRxln8Vym9-WB_cFNOSZLQ
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0gmugJFYzf9uDRlW7b7Xav8hGMiESQcCPttk1IFAgs_n7bZcWYePHSNE3TJtM0nfc68wbgNmLKpkInAbFcBMzENBAysoEW3EuDSC7zaPdRN-71xHic9Etwt82FMcbkwWfm3nfzv3w9T9eeKqs7fOzuRLQDuxFjBG-ytbaMCnVYhieiyI8LcVJvjPqvXs7Eh3ARL8wpyO8qKvkj0q78b_tDqP1k46H-9p05gpKZHUOlcB9RcTlXVXhofTjom7o5aOCad4MG2XKqDaJN9KI834KaJstZeuTpVzRYOFRr0PDbdzXLGry1W8NGJyhKJARTgmkWhDqixEghQyKlCkmomA2t9CI0TKYq5MaZ2xqqecxZnCZCxVJaziUW2kElQk-gPJvPzCkg58eoGCvs65gyh0oSbjHWbjZzC1Ian0HVm2Sy2KhgTAprnP89fAP7neFzd9J97D1dwIG3_ybA6hLK2XJtrmAv_cymq-V1fohfXfGcdA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=Embracing+Single+Stride+3D+Object+Detector+with+Sparse+Transformer&rft.au=Fan%2C+Lue&rft.au=Pang%2C+Ziqi&rft.au=Zhang%2C+Tianyuan&rft.au=Wang%2C+Yu-Xiong&rft.date=2022-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=8448&rft.epage=8458&rft_id=info:doi/10.1109%2FCVPR52688.2022.00827&rft.externalDocID=9878875