SparseDet: Towards efficient multi-view 3D object detection via sparse scene representation

•An efficient multi-view 3D object detection algorithm, SparseDet, is proposed to construct sparse scene representation, which can remarkably diminish computational demands while enhancing detection accuracy.•A sparse sampling strategy utilizing both category-aware and geometry-aware supervisions is...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Advanced engineering informatics Jg. 62; S. 102955
Hauptverfasser: Li, Jingzhong, Yang, Lin, Shi, Zhen, Chen, Yuxuan, Jin, Yue, Akiyama, Kanta, Xu, Anze
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier Ltd 01.10.2024
Schlagworte:
ISSN:1474-0346
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract •An efficient multi-view 3D object detection algorithm, SparseDet, is proposed to construct sparse scene representation, which can remarkably diminish computational demands while enhancing detection accuracy.•A sparse sampling strategy utilizing both category-aware and geometry-aware supervisions is introduced to enhance foreground recognition.•A background aggregation module is designed to condense extensive background features into a compact set, which can significantly reduce computational costs while adaptively retaining contextual information.•Experimental results demonstrate the superiority of our method to state-of-the-art methods in both detection accuracy and inference speed. Efficient and reliable 3D object detection via multi-view cameras is pivotal for improving the safety and facilitating the cost-effective deployment of autonomous driving systems. However, owing to the learning of dense scene representations, existing methods still suffer from high computational costs and excessive noise, affecting the efficiency and accuracy of the inference process. To overcome this challenge, we propose SparseDet, a model that exploits sparse scene representations. Specifically, a sparse sampling module with category-aware and geometry-aware supervision is first introduced to adaptively sample foreground features at both semantic and instance levels. Additionally, to conserve computational resources while retaining context information, we propose a background aggregation module designed to compress extensive background features into a compact set. These strategies can markedly diminish feature volume while preserving essential information to boost computational efficiency without compromising accuracy. Due to the efficient sparse scene representation, our SparseDet achieves leading performance on the widely used nuScenes benchmark. Comprehensive experiments validate that SparseDet surpasses the PETR while reducing the decoder computational complexity by 47% in terms of FLOPs, facilitating a leading inference speed of 35.6 FPS on a single RTX3090 GPU.
AbstractList •An efficient multi-view 3D object detection algorithm, SparseDet, is proposed to construct sparse scene representation, which can remarkably diminish computational demands while enhancing detection accuracy.•A sparse sampling strategy utilizing both category-aware and geometry-aware supervisions is introduced to enhance foreground recognition.•A background aggregation module is designed to condense extensive background features into a compact set, which can significantly reduce computational costs while adaptively retaining contextual information.•Experimental results demonstrate the superiority of our method to state-of-the-art methods in both detection accuracy and inference speed. Efficient and reliable 3D object detection via multi-view cameras is pivotal for improving the safety and facilitating the cost-effective deployment of autonomous driving systems. However, owing to the learning of dense scene representations, existing methods still suffer from high computational costs and excessive noise, affecting the efficiency and accuracy of the inference process. To overcome this challenge, we propose SparseDet, a model that exploits sparse scene representations. Specifically, a sparse sampling module with category-aware and geometry-aware supervision is first introduced to adaptively sample foreground features at both semantic and instance levels. Additionally, to conserve computational resources while retaining context information, we propose a background aggregation module designed to compress extensive background features into a compact set. These strategies can markedly diminish feature volume while preserving essential information to boost computational efficiency without compromising accuracy. Due to the efficient sparse scene representation, our SparseDet achieves leading performance on the widely used nuScenes benchmark. Comprehensive experiments validate that SparseDet surpasses the PETR while reducing the decoder computational complexity by 47% in terms of FLOPs, facilitating a leading inference speed of 35.6 FPS on a single RTX3090 GPU.
ArticleNumber 102955
Author Jin, Yue
Chen, Yuxuan
Yang, Lin
Shi, Zhen
Akiyama, Kanta
Li, Jingzhong
Xu, Anze
Author_xml – sequence: 1
  givenname: Jingzhong
  surname: Li
  fullname: Li, Jingzhong
  email: jzhong_l@sjtu.edu.cn
– sequence: 2
  givenname: Lin
  surname: Yang
  fullname: Yang, Lin
  email: yanglin@sjtu.edu.cn
– sequence: 3
  givenname: Zhen
  surname: Shi
  fullname: Shi, Zhen
  email: shi_zhen@sjtu.edu.cn
– sequence: 4
  givenname: Yuxuan
  surname: Chen
  fullname: Chen, Yuxuan
  email: chenyxSJTU@sjtu.edu.cn
– sequence: 5
  givenname: Yue
  surname: Jin
  fullname: Jin, Yue
  email: jinyue1919@sjtu.edu.cn
– sequence: 6
  givenname: Kanta
  surname: Akiyama
  fullname: Akiyama, Kanta
  email: mtaki1216@sjtu.edu.cn
– sequence: 7
  givenname: Anze
  surname: Xu
  fullname: Xu, Anze
  email: slasher@sjtu.edu.cn
BookMark eNp9kM1OwzAQhH0oEm3hAbj5BVL8FyeGE2r5kypxoJw4WI69lhy1SWSHVrw9DuXMabTa_VYzs0Czru8AoRtKVpRQeduuDIQVI0zkmamynKE5FZUoCBfyEi1Sakm-q1U1R5_vg4kJNjDe4V1_MtElDN4HG6Ab8eFrP4biGOCE-Qb3TQt2xA7GLKHv8DEYnH55nCx0gCMMEVImzbS_Qhfe7BNc_-kSfTw97tYvxfbt-XX9sC0sE2osnGcV4VIqU9aGsoaVlTBW1t5RK6l1rsxeLZdUKGUbSRsjlaUV5974uikZXyJ6_mtjn1IEr4cYDiZ-a0r01IhudW5ET43ocyOZuT8zkI3lgFGnKbIFF2JOp10f_qF_AK1pbbc
Cites_doi 10.1109/TIV.2019.2955905
10.1109/TPAMI.2023.3292030
10.1016/j.aei.2022.101641
10.1007/978-3-030-58536-5_45
10.1007/978-3-030-58452-8_13
10.1016/j.patcog.2024.110457
10.1007/978-3-031-19812-0_31
10.1007/978-3-031-19809-0_8
10.1007/s11263-022-01579-8
10.1109/TPAMI.2023.3286409
10.1007/978-3-030-58568-6_12
10.1016/j.aei.2023.102348
10.1016/j.aei.2023.101971
10.1016/j.aei.2023.102249
10.1109/CVPR.2009.5206848
10.1016/j.patcog.2023.109997
10.1016/j.aei.2023.102007
10.1007/978-3-031-20077-9_1
10.1016/j.aei.2023.102061
10.1016/j.aei.2023.102069
ContentType Journal Article
Copyright 2024
Copyright_xml – notice: 2024
DBID AAYXX
CITATION
DOI 10.1016/j.aei.2024.102955
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
ExternalDocumentID 10_1016_j_aei_2024_102955
S1474034624006062
GroupedDBID --K
--M
-~X
.DC
.~1
0R~
1B1
1~.
1~5
23M
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
AAAKF
AAAKG
AACTN
AAEDT
AAEDW
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AARIN
AAXKI
AAXUO
AAYFN
ABBOA
ABFNM
ABMAC
ABUCO
ABWVN
ABXDB
ACDAQ
ACGFS
ACNNM
ACRLP
ACRPL
ACZNC
ADBBV
ADEZE
ADJOM
ADMUD
ADNMO
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFJKZ
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJOXV
AKRWK
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
APLSM
AXJTR
BJAXD
BKOJK
BLXMC
CS3
EBS
EFJIC
EJD
EO8
EO9
EP2
EP3
FDB
FEDTE
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
HAMUX
HVGLF
HZ~
IHE
J1W
JJJVA
KOM
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
PC.
Q38
RIG
RNS
ROL
RPZ
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SSB
SSD
SST
SSV
SSZ
T5K
UHS
XPP
ZMT
~G-
9DU
AATTM
AAYWO
AAYXX
ABJNI
ACLOT
ACVFH
ADCNI
AEIPS
AEUPX
AFPUW
AIGII
AIIUN
AKBMS
AKYEP
ANKPU
APXCP
CITATION
EFKBS
EFLBG
~HD
ID FETCH-LOGICAL-c249t-df2703669a58a12b2574ac68fd1c61cdd5168c361499cb61ba69c1733faf8b523
ISICitedReferencesCount 1
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001361218500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1474-0346
IngestDate Sat Nov 29 03:19:55 EST 2025
Sat Dec 14 16:15:12 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Sparse scene representation
Multi-view cameras
Autonomous driving
Bird’s eye view
3D object detection
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c249t-df2703669a58a12b2574ac68fd1c61cdd5168c361499cb61ba69c1733faf8b523
ParticipantIDs crossref_primary_10_1016_j_aei_2024_102955
elsevier_sciencedirect_doi_10_1016_j_aei_2024_102955
PublicationCentury 2000
PublicationDate October 2024
2024-10-00
PublicationDateYYYYMMDD 2024-10-01
PublicationDate_xml – month: 10
  year: 2024
  text: October 2024
PublicationDecade 2020
PublicationTitle Advanced engineering informatics
PublicationYear 2024
Publisher Elsevier Ltd
Publisher_xml – sequence: 0
  name: Elsevier Ltd
References Zhu, Zhang, Zhang, Zhu, Guan, Jia (b0140) 2023; 57
Wang, Zhu, Pang, Lin (b0225) 2021
Jiang, Zhang, Miao, Zhu, Gao, Hu, Jiang (b0275) 2023
Fan, Wang, Wang, Zhang (b0120) 2022; 35
Fan, Yang, Wang, Wang, Zhang (b0200) 2023
Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Y. Qiao, J. Dai, Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers, in: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IX, Springer Nature Switzerland Cham, 2022, pp. 1–18.
Y. Wang, V.C. Guizilini, T. Zhang, Y. Wang, H. Zhao, J. Solomon, Detr3d: 3d object detection from multi-view images via 3d-to-2d queries, in: Conference on Robot Learning, PMLR, 2022, pp. 180–191.
Li, Wang, Wu, Chen, Hu, Li, Tang, Yang (b0215) 2020; 33
Reading, Harakeh, Chae, Waslander (b0210) 2021
Li, Yang, Chen, Yang, Jin, Akiyama (b0015) 2023
Wang, Zhang, Zhang, Zhang, Liang, Huang, Huang (b0040) 2024; 59
Sun, Zhang, Jiang, Kong, Xu, Zhan, Tomizuka, Yuan, Luo (b0190) 2023
Li, Ge, Yu, Yang, Wang, Shi, Sun, Li (b0075) 2023
H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L.M. Ni, H.-Y. Shum, Dino: Detr with improved denoising anchor boxes for end-to-end object detection, arXiv preprint arXiv:2203.03605, (2022).
X. Lin, T. Lin, Z. Pei, L. Huang, Z. Su, Sparse4d: multi-view 3d object detection with sparse spatial-temporal fusion, arXiv preprint arXiv:2211.10581, (2022).
X. Zhu, W. Su, L. Lu, B. Li, X. Wang, J. Dai, Deformable detr: deformable transformers for end-to-end object detection, arXiv preprint arXiv:2010.04159, (2020).
Huang, Wang, Wen, Wang, Cai (b0005) 2023; 56
Xiong, Gong, Ye, Tan, Wan, Ding, Wang, Bai (b0110) 2023
Fan, Pang, Zhang, Wang, Zhao, Wang, Wang, Zhang (b0205) 2022
K. Mangalam, H. Girase, S. Agarwal, K.-H. Lee, E. Adeli, J. Malik, A. Gaidon, It is not the journey but the destination: endpoint conditioned trajectory prediction, in: European Conference on Computer Vision, Springer, 2020, pp. 759-776.
Wang, Jiang, Li (b0100) 2023
Wang, Liu, Wang, Li, Zhang (b0105) 2023
B. Roh, J. Shin, W. Shin, S. Kim, Sparse DETR: efficient end-to-end object detection with learnable sparsity, in: International Conference on Learning Representations, 2021.
Stoiber, Pfanne, Strobl, Triebel, Albu-Schäffer (b0020) 2022; 130
Y. Liu, T. Wang, X. Zhang, J. Sun, Petr: Position embedding transformation for multi-view 3d object detection, in: European Conference on Computer Vision, Springer, 2022, pp. 531–548.
Jiang, Li, Liu, Wang, Jia, Wang, Han, Zhang (b0115) 2024
Wang, Yuan, Chen, Feng, Yan (b0160) 2021
Liu, Teng, Lu, Wang, Wang (b0070) 2023
Roy, Bhaduri (b0145) 2023; 56
Tao, Cao, Cheng, Gao, Luo, Zhang, Zheng (b0195) 2023; 57
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, End-to-end object detection with transformers, in: European conference on computer vision, Springer, 2020, pp. 213–229.
Emunds, Pauen, Richter, Frisch, van Treeck (b0180) 2022; 53
Tian, Shen, Chen, He (b0220) 2020; 44
Lee, Hwang, Lee, Bae, Park (b0245) 2019
Caesar, Bankiti, Lang, Vora, Liong, Xu, Krishnan, Pan, Baldan, Beijbom (b0235) 2020
Park, Ambrus, Guizilini, Li, Gaidon (b0255) 2021
He, Zhang, Ren, Sun (b0240) 2016
J. Huang, G. Huang, Z. Zhu, Y. Ye, D. Du, Bevdet: high-performance multi-camera 3d object detection in bird-eye-view, arXiv preprint arXiv:2112.11790, (2021).
Li, Fan, Liu, Huang, Chen, Wang, Zhang (b0125) 2024
Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, Polosukhin (b0095) 2017; 30
Cao, Zhou, Tao, Xue, Gao, Zhang, Zhu (b0010) 2024; 59
Li, Zhang, Liu, Guo, Ni, Zhang (b0170) 2022
Li, Bao, Ge, Yang, Sun, Li (b0065) 2023
Zhang, Jiang, Qiu, Liu (b0135) 2024; 110630
J. Lu, Z. Zhou, X. Zhu, H. Xu, L. Zhang, Learning ego 3d representation as ray tracing, European Conference on Computer Vision, Springer, 2022, pp. 129-144.
Li, Yang, Chen, Jin (b0030) 2024; 146
Chen, Tian, Wang, Wang, Xiong, Li (b0280) 2024
J. Philion, S. Fidler, Lift, splat, shoot: encoding images from arbitrary camera rigs by implicitly unprojecting to 3d, in: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, Springer, 2020, pp. 194–210.
Yang, Chen, Tian, Tao, Zhu, Zhang, Huang, Li, Qiao, Lu (b0090) 2023
Hoel, Driggs-Campbell, Wolff, Laine, Kochenderfer (b0045) 2019; 5
I. Loshchilov, F. Hutter, Decoupled weight decay regularization, arXiv preprint arXiv:1711.05101, (2017).
Najibi, Rastegari, Davis (b0185) 2016
Nguyen, Quach, Duong, Phung, Le, Luu (b0025) 2024; 152
Zhao, Shi, Qi, Wang, Jia (b0130) 2017
Rezatofighi, Tsoi, Gwak, Sadeghian, Reid, Savarese (b0230) 2019
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: a large-scale hierarchical image database, 2009 IEEE conference on computer vision and pattern recognition, Ieee, 2009, pp. 248–255.
Cao (10.1016/j.aei.2024.102955_b0010) 2024; 59
Fan (10.1016/j.aei.2024.102955_b0200) 2023
Li (10.1016/j.aei.2024.102955_b0125) 2024
Fan (10.1016/j.aei.2024.102955_b0120) 2022; 35
Wang (10.1016/j.aei.2024.102955_b0040) 2024; 59
Chen (10.1016/j.aei.2024.102955_b0280) 2024
10.1016/j.aei.2024.102955_b0085
Najibi (10.1016/j.aei.2024.102955_b0185) 2016
Yang (10.1016/j.aei.2024.102955_b0090) 2023
10.1016/j.aei.2024.102955_b0165
Rezatofighi (10.1016/j.aei.2024.102955_b0230) 2019
Roy (10.1016/j.aei.2024.102955_b0145) 2023; 56
Jiang (10.1016/j.aei.2024.102955_b0275) 2023
Zhao (10.1016/j.aei.2024.102955_b0130) 2017
10.1016/j.aei.2024.102955_b0080
Tao (10.1016/j.aei.2024.102955_b0195) 2023; 57
Reading (10.1016/j.aei.2024.102955_b0210) 2021
Wang (10.1016/j.aei.2024.102955_b0100) 2023
Caesar (10.1016/j.aei.2024.102955_b0235) 2020
Liu (10.1016/j.aei.2024.102955_b0070) 2023
Tian (10.1016/j.aei.2024.102955_b0220) 2020; 44
Zhang (10.1016/j.aei.2024.102955_b0135) 2024; 110630
10.1016/j.aei.2024.102955_b0250
He (10.1016/j.aei.2024.102955_b0240) 2016
10.1016/j.aei.2024.102955_b0050
Vaswani (10.1016/j.aei.2024.102955_b0095) 2017; 30
10.1016/j.aei.2024.102955_b0055
10.1016/j.aei.2024.102955_b0175
Li (10.1016/j.aei.2024.102955_b0015) 2023
Li (10.1016/j.aei.2024.102955_b0075) 2023
Li (10.1016/j.aei.2024.102955_b0170) 2022
Lee (10.1016/j.aei.2024.102955_b0245) 2019
Fan (10.1016/j.aei.2024.102955_b0205) 2022
Jiang (10.1016/j.aei.2024.102955_b0115) 2024
10.1016/j.aei.2024.102955_b0260
10.1016/j.aei.2024.102955_b0060
Nguyen (10.1016/j.aei.2024.102955_b0025) 2024; 152
Wang (10.1016/j.aei.2024.102955_b0160) 2021
10.1016/j.aei.2024.102955_b0265
Wang (10.1016/j.aei.2024.102955_b0225) 2021
Hoel (10.1016/j.aei.2024.102955_b0045) 2019; 5
Sun (10.1016/j.aei.2024.102955_b0190) 2023
Li (10.1016/j.aei.2024.102955_b0030) 2024; 146
Wang (10.1016/j.aei.2024.102955_b0105) 2023
Xiong (10.1016/j.aei.2024.102955_b0110) 2023
Park (10.1016/j.aei.2024.102955_b0255) 2021
10.1016/j.aei.2024.102955_b0035
10.1016/j.aei.2024.102955_b0150
Li (10.1016/j.aei.2024.102955_b0215) 2020; 33
10.1016/j.aei.2024.102955_b0270
10.1016/j.aei.2024.102955_b0155
Zhu (10.1016/j.aei.2024.102955_b0140) 2023; 57
Emunds (10.1016/j.aei.2024.102955_b0180) 2022; 53
Huang (10.1016/j.aei.2024.102955_b0005) 2023; 56
Stoiber (10.1016/j.aei.2024.102955_b0020) 2022; 130
Li (10.1016/j.aei.2024.102955_b0065) 2023
References_xml – reference: H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L.M. Ni, H.-Y. Shum, Dino: Detr with improved denoising anchor boxes for end-to-end object detection, arXiv preprint arXiv:2203.03605, (2022).
– volume: 130
  start-page: 1008
  year: 2022
  end-page: 1030
  ident: b0020
  article-title: SRT3D: a sparse region-based 3D object tracking approach for the real world
  publication-title: Int. J. Comput. Vis.
– start-page: 18580
  year: 2023
  end-page: 18590
  ident: b0070
  article-title: Sparsebev: high-performance sparse 3d object detection from multi-camera videos
  publication-title: Proceedings of the IEEE/CVF International Conference on Computer Vision
– volume: 57
  year: 2023
  ident: b0195
  article-title: An efficient 3D object detection method based on Fast Guided Anchor Stereo RCNN
  publication-title: Adv. Eng. Inf.
– start-page: 913
  year: 2021
  end-page: 922
  ident: b0225
  article-title: Fcos3d: fully convolutional one-stage monocular 3d object detection
  publication-title: Proceedings of the IEEE/CVF International Conference on Computer Vision
– volume: 59
  year: 2024
  ident: b0040
  article-title: Machining feature process route planning based on a graph convolutional neural network
  publication-title: Adv. Eng. Inf.
– start-page: 3142
  year: 2021
  end-page: 3152
  ident: b0255
  article-title: Is pseudo-lidar needed for monocular 3d object detection?
  publication-title: Proceedings of the IEEE/CVF International Conference on Computer Vision
– reference: B. Roh, J. Shin, W. Shin, S. Kim, Sparse DETR: efficient end-to-end object detection with learnable sparsity, in: International Conference on Learning Representations, 2021.
– reference: J. Lu, Z. Zhou, X. Zhu, H. Xu, L. Zhang, Learning ego 3d representation as ray tracing, European Conference on Computer Vision, Springer, 2022, pp. 129-144.
– year: 2023
  ident: b0200
  article-title: Super sparse 3d object detection
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– reference: Y. Liu, T. Wang, X. Zhang, J. Sun, Petr: Position embedding transformation for multi-view 3d object detection, in: European Conference on Computer Vision, Springer, 2022, pp. 531–548.
– start-page: 4661
  year: 2021
  end-page: 4670
  ident: b0160
  article-title: Pnp-detr: towards efficient visual analysis with transformers
  publication-title: Proceedings of the IEEE/CVF International Conference on Computer Vision
– reference: I. Loshchilov, F. Hutter, Decoupled weight decay regularization, arXiv preprint arXiv:1711.05101, (2017).
– start-page: 3621
  year: 2023
  end-page: 3631
  ident: b0105
  article-title: Exploring object-centric temporal modeling for efficient multi-view 3d object detection
  publication-title: Proceedings of the IEEE/CVF International Conference on Computer Vision
– volume: 152
  year: 2024
  ident: b0025
  article-title: Multi-camera multi-object tracking on the move via single-stage global association approach
  publication-title: Pattern Recogn.
– volume: 56
  year: 2023
  ident: b0145
  article-title: DenseSPH-YOLOv5: an automated damage detection model based on DenseNet and Swin-Transformer prediction head-enabled YOLOv5 with attention mechanism
  publication-title: Adv. Eng. Inf.
– reference: J. Huang, G. Huang, Z. Zhu, Y. Ye, D. Du, Bevdet: high-performance multi-camera 3d object detection in bird-eye-view, arXiv preprint arXiv:2112.11790, (2021).
– start-page: 2881
  year: 2017
  end-page: 2890
  ident: b0130
  article-title: Pyramid scene parsing network
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– start-page: 11621
  year: 2020
  end-page: 11631
  ident: b0235
  article-title: nuscenes: a multimodal dataset for autonomous driving
  publication-title: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
– start-page: 2369
  year: 2016
  end-page: 2377
  ident: b0185
  article-title: G-cnn: an iterative grid based object detector
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– start-page: 770
  year: 2016
  end-page: 778
  ident: b0240
  article-title: Deep residual learning for image recognition
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– volume: 5
  start-page: 294
  year: 2019
  end-page: 305
  ident: b0045
  article-title: Combining planning and deep reinforcement learning in tactical decision making for autonomous driving
  publication-title: IEEE Trans. Intell. Veh.
– reference: J. Philion, S. Fidler, Lift, splat, shoot: encoding images from arbitrary camera rigs by implicitly unprojecting to 3d, in: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, Springer, 2020, pp. 194–210.
– volume: 110630
  year: 2024
  ident: b0135
  article-title: TCFAP-Net: transformer-based Cross-feature Fusion and Adaptive Perception Network for large-scale point cloud semantic segmentation
  publication-title: Pattern Recogn.
– volume: 56
  year: 2023
  ident: b0005
  article-title: An object detection algorithm combining semantic and geometric information of the 3D point cloud
  publication-title: Adv. Eng. Inf.
– reference: Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Y. Qiao, J. Dai, Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers, in: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IX, Springer Nature Switzerland Cham, 2022, pp. 1–18.
– reference: K. Mangalam, H. Girase, S. Agarwal, K.-H. Lee, E. Adeli, J. Malik, A. Gaidon, It is not the journey but the destination: endpoint conditioned trajectory prediction, in: European Conference on Computer Vision, Springer, 2020, pp. 759-776.
– reference: J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: a large-scale hierarchical image database, 2009 IEEE conference on computer vision and pattern recognition, Ieee, 2009, pp. 248–255.
– volume: 57
  year: 2023
  ident: b0140
  article-title: Surface defect detection and classification of steel using an efficient Swin Transformer
  publication-title: Adv. Eng. Inf.
– volume: 53
  year: 2022
  ident: b0180
  article-title: SpaRSE-BIM: classification of IFC-based geometry via sparse convolutional neural networks
  publication-title: Adv. Eng. Inf.
– year: 2024
  ident: b0125
  article-title: Fully sparse fusion for 3d object detection
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– year: 2019
  ident: b0245
  article-title: An energy and GPU-computation efficient backbone network for real-time object detection
  publication-title: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops
– volume: 44
  start-page: 1922
  year: 2020
  end-page: 1933
  ident: b0220
  article-title: FCOS: a simple and strong anchor-free object detector
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– start-page: 1851
  year: 2023
  end-page: 1857
  ident: b0015
  article-title: PillarDAN: Pillar-based Dual Attention Attention Network for 3D Object Detection with 4D RaDAR
  publication-title: 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), IEEE
– start-page: 1477
  year: 2023
  end-page: 1485
  ident: b0075
  article-title: Bevdepth: acquisition of reliable depth for multi-view 3d object detection
  publication-title: Proceedings of the AAAI Conference on Artificial Intelligence
– start-page: 658
  year: 2019
  end-page: 666
  ident: b0230
  article-title: Generalized intersection over union: a metric and a loss for bounding box regression
  publication-title: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
– volume: 59
  year: 2024
  ident: b0010
  article-title: VSL-Net: Voxel structure learning for 3D object detection
  publication-title: Adv. Eng. Inf.
– year: 2023
  ident: b0100
  article-title: Focal-petr: embracing foreground for efficient multi-camera 3d object detection
  publication-title: IEEE Trans. Intell. Veh.
– start-page: 2561
  year: 2024
  end-page: 2569
  ident: b0115
  article-title: Far3d: expanding the horizon for surround-view 3d object detection
  publication-title: Proceedings of the AAAI Conference on Artificial Intelligence
– start-page: 8555
  year: 2021
  end-page: 8564
  ident: b0210
  article-title: Categorical depth distribution network for monocular 3d object detection
  publication-title: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
– reference: N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, End-to-end object detection with transformers, in: European conference on computer vision, Springer, 2020, pp. 213–229.
– start-page: 21570
  year: 2023
  end-page: 21579
  ident: b0110
  article-title: Cape: camera view position embedding for multi-view 3d object detection
  publication-title: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
– volume: 33
  start-page: 21002
  year: 2020
  end-page: 21012
  ident: b0215
  article-title: Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection
  publication-title: Adv. Neural Inf. Proces. Syst.
– start-page: 1042
  year: 2023
  end-page: 1050
  ident: b0275
  article-title: Polarformer: multi-camera 3d object detection with polar transformer
  publication-title: Proceedings of the AAAI Conference on Artificial Intelligence
– year: 2023
  ident: b0190
  article-title: Sparse R-CNN: an end-to-end framework for object detection
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– reference: X. Lin, T. Lin, Z. Pei, L. Huang, Z. Su, Sparse4d: multi-view 3d object detection with sparse spatial-temporal fusion, arXiv preprint arXiv:2211.10581, (2022).
– volume: 146
  year: 2024
  ident: b0030
  article-title: MFAN: Mixing Feature Attention Network for trajectory prediction
  publication-title: Pattern Recogn.
– reference: Y. Wang, V.C. Guizilini, T. Zhang, Y. Wang, H. Zhao, J. Solomon, Detr3d: 3d object detection from multi-view images via 3d-to-2d queries, in: Conference on Robot Learning, PMLR, 2022, pp. 180–191.
– volume: 35
  start-page: 351
  year: 2022
  end-page: 363
  ident: b0120
  article-title: Fully sparse 3d object detection
  publication-title: Adv. Neural Inf. Proces. Syst.
– start-page: 13619
  year: 2022
  end-page: 13627
  ident: b0170
  article-title: Dn-detr: accelerate detr training by introducing query denoising
  publication-title: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
– volume: 30
  year: 2017
  ident: b0095
  article-title: Attention is all you need
  publication-title: Adv. Neural Inf. Proces. Syst.
– start-page: 1486
  year: 2023
  end-page: 1494
  ident: b0065
  article-title: Bevstereo: enhancing depth estimation in multi-view 3d object detection with temporal stereo
  publication-title: Proceedings of the AAAI Conference on Artificial Intelligence
– reference: X. Zhu, W. Su, L. Lu, B. Li, X. Wang, J. Dai, Deformable detr: deformable transformers for end-to-end object detection, arXiv preprint arXiv:2010.04159, (2020).
– start-page: 17830
  year: 2023
  end-page: 17839
  ident: b0090
  article-title: BEVFormer v2: adapting modern image backbones to bird's-eye-view recognition via perspective supervision
  publication-title: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
– start-page: 8458
  year: 2022
  end-page: 8468
  ident: b0205
  article-title: Embracing single stride 3d object detector with sparse transformer
  publication-title: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
– year: 2024
  ident: b0280
  article-title: EPro-PnP: generalized end-to-end probabilistic perspective-N-points for monocular object pose estimation
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– volume: 5
  start-page: 294
  year: 2019
  ident: 10.1016/j.aei.2024.102955_b0045
  article-title: Combining planning and deep reinforcement learning in tactical decision making for autonomous driving
  publication-title: IEEE Trans. Intell. Veh.
  doi: 10.1109/TIV.2019.2955905
– year: 2023
  ident: 10.1016/j.aei.2024.102955_b0190
  article-title: Sparse R-CNN: an end-to-end framework for object detection
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2023.3292030
– start-page: 3142
  year: 2021
  ident: 10.1016/j.aei.2024.102955_b0255
  article-title: Is pseudo-lidar needed for monocular 3d object detection?
– start-page: 13619
  year: 2022
  ident: 10.1016/j.aei.2024.102955_b0170
  article-title: Dn-detr: accelerate detr training by introducing query denoising
– start-page: 658
  year: 2019
  ident: 10.1016/j.aei.2024.102955_b0230
  article-title: Generalized intersection over union: a metric and a loss for bounding box regression
– volume: 44
  start-page: 1922
  year: 2020
  ident: 10.1016/j.aei.2024.102955_b0220
  article-title: FCOS: a simple and strong anchor-free object detector
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– start-page: 11621
  year: 2020
  ident: 10.1016/j.aei.2024.102955_b0235
  article-title: nuscenes: a multimodal dataset for autonomous driving
– volume: 53
  year: 2022
  ident: 10.1016/j.aei.2024.102955_b0180
  article-title: SpaRSE-BIM: classification of IFC-based geometry via sparse convolutional neural networks
  publication-title: Adv. Eng. Inf.
  doi: 10.1016/j.aei.2022.101641
– ident: 10.1016/j.aei.2024.102955_b0035
  doi: 10.1007/978-3-030-58536-5_45
– start-page: 2561
  year: 2024
  ident: 10.1016/j.aei.2024.102955_b0115
  article-title: Far3d: expanding the horizon for surround-view 3d object detection
– ident: 10.1016/j.aei.2024.102955_b0150
  doi: 10.1007/978-3-030-58452-8_13
– start-page: 1477
  year: 2023
  ident: 10.1016/j.aei.2024.102955_b0075
  article-title: Bevdepth: acquisition of reliable depth for multi-view 3d object detection
– ident: 10.1016/j.aei.2024.102955_b0055
– volume: 110630
  year: 2024
  ident: 10.1016/j.aei.2024.102955_b0135
  article-title: TCFAP-Net: transformer-based Cross-feature Fusion and Adaptive Perception Network for large-scale point cloud semantic segmentation
  publication-title: Pattern Recogn.
– start-page: 3621
  year: 2023
  ident: 10.1016/j.aei.2024.102955_b0105
  article-title: Exploring object-centric temporal modeling for efficient multi-view 3d object detection
– volume: 152
  year: 2024
  ident: 10.1016/j.aei.2024.102955_b0025
  article-title: Multi-camera multi-object tracking on the move via single-stage global association approach
  publication-title: Pattern Recogn.
  doi: 10.1016/j.patcog.2024.110457
– ident: 10.1016/j.aei.2024.102955_b0060
  doi: 10.1007/978-3-031-19812-0_31
– ident: 10.1016/j.aei.2024.102955_b0270
  doi: 10.1007/978-3-031-19809-0_8
– start-page: 1042
  year: 2023
  ident: 10.1016/j.aei.2024.102955_b0275
  article-title: Polarformer: multi-camera 3d object detection with polar transformer
– volume: 130
  start-page: 1008
  year: 2022
  ident: 10.1016/j.aei.2024.102955_b0020
  article-title: SRT3D: a sparse region-based 3D object tracking approach for the real world
  publication-title: Int. J. Comput. Vis.
  doi: 10.1007/s11263-022-01579-8
– year: 2023
  ident: 10.1016/j.aei.2024.102955_b0200
  article-title: Super sparse 3d object detection
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2023.3286409
– ident: 10.1016/j.aei.2024.102955_b0085
  doi: 10.1007/978-3-030-58568-6_12
– start-page: 1486
  year: 2023
  ident: 10.1016/j.aei.2024.102955_b0065
  article-title: Bevstereo: enhancing depth estimation in multi-view 3d object detection with temporal stereo
– start-page: 2369
  year: 2016
  ident: 10.1016/j.aei.2024.102955_b0185
  article-title: G-cnn: an iterative grid based object detector
– volume: 59
  year: 2024
  ident: 10.1016/j.aei.2024.102955_b0010
  article-title: VSL-Net: Voxel structure learning for 3D object detection
  publication-title: Adv. Eng. Inf.
  doi: 10.1016/j.aei.2023.102348
– volume: 56
  year: 2023
  ident: 10.1016/j.aei.2024.102955_b0005
  article-title: An object detection algorithm combining semantic and geometric information of the 3D point cloud
  publication-title: Adv. Eng. Inf.
  doi: 10.1016/j.aei.2023.101971
– volume: 59
  year: 2024
  ident: 10.1016/j.aei.2024.102955_b0040
  article-title: Machining feature process route planning based on a graph convolutional neural network
  publication-title: Adv. Eng. Inf.
  doi: 10.1016/j.aei.2023.102249
– volume: 33
  start-page: 21002
  year: 2020
  ident: 10.1016/j.aei.2024.102955_b0215
  article-title: Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection
  publication-title: Adv. Neural Inf. Proces. Syst.
– start-page: 770
  year: 2016
  ident: 10.1016/j.aei.2024.102955_b0240
  article-title: Deep residual learning for image recognition
– year: 2024
  ident: 10.1016/j.aei.2024.102955_b0125
  article-title: Fully sparse fusion for 3d object detection
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– ident: 10.1016/j.aei.2024.102955_b0250
  doi: 10.1109/CVPR.2009.5206848
– volume: 146
  year: 2024
  ident: 10.1016/j.aei.2024.102955_b0030
  article-title: MFAN: Mixing Feature Attention Network for trajectory prediction
  publication-title: Pattern Recogn.
  doi: 10.1016/j.patcog.2023.109997
– year: 2023
  ident: 10.1016/j.aei.2024.102955_b0100
  article-title: Focal-petr: embracing foreground for efficient multi-camera 3d object detection
  publication-title: IEEE Trans. Intell. Veh.
– ident: 10.1016/j.aei.2024.102955_b0165
– volume: 56
  year: 2023
  ident: 10.1016/j.aei.2024.102955_b0145
  article-title: DenseSPH-YOLOv5: an automated damage detection model based on DenseNet and Swin-Transformer prediction head-enabled YOLOv5 with attention mechanism
  publication-title: Adv. Eng. Inf.
  doi: 10.1016/j.aei.2023.102007
– start-page: 1851
  year: 2023
  ident: 10.1016/j.aei.2024.102955_b0015
  article-title: PillarDAN: Pillar-based Dual Attention Attention Network for 3D Object Detection with 4D RaDAR
– start-page: 17830
  year: 2023
  ident: 10.1016/j.aei.2024.102955_b0090
  article-title: BEVFormer v2: adapting modern image backbones to bird's-eye-view recognition via perspective supervision
– ident: 10.1016/j.aei.2024.102955_b0265
– year: 2019
  ident: 10.1016/j.aei.2024.102955_b0245
  article-title: An energy and GPU-computation efficient backbone network for real-time object detection
– start-page: 21570
  year: 2023
  ident: 10.1016/j.aei.2024.102955_b0110
  article-title: Cape: camera view position embedding for multi-view 3d object detection
– start-page: 8555
  year: 2021
  ident: 10.1016/j.aei.2024.102955_b0210
  article-title: Categorical depth distribution network for monocular 3d object detection
– year: 2024
  ident: 10.1016/j.aei.2024.102955_b0280
  article-title: EPro-PnP: generalized end-to-end probabilistic perspective-N-points for monocular object pose estimation
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– ident: 10.1016/j.aei.2024.102955_b0175
– ident: 10.1016/j.aei.2024.102955_b0080
– start-page: 913
  year: 2021
  ident: 10.1016/j.aei.2024.102955_b0225
  article-title: Fcos3d: fully convolutional one-stage monocular 3d object detection
– ident: 10.1016/j.aei.2024.102955_b0050
  doi: 10.1007/978-3-031-20077-9_1
– start-page: 4661
  year: 2021
  ident: 10.1016/j.aei.2024.102955_b0160
  article-title: Pnp-detr: towards efficient visual analysis with transformers
– start-page: 18580
  year: 2023
  ident: 10.1016/j.aei.2024.102955_b0070
  article-title: Sparsebev: high-performance sparse 3d object detection from multi-camera videos
– volume: 30
  year: 2017
  ident: 10.1016/j.aei.2024.102955_b0095
  article-title: Attention is all you need
  publication-title: Adv. Neural Inf. Proces. Syst.
– ident: 10.1016/j.aei.2024.102955_b0260
– start-page: 8458
  year: 2022
  ident: 10.1016/j.aei.2024.102955_b0205
  article-title: Embracing single stride 3d object detector with sparse transformer
– volume: 57
  year: 2023
  ident: 10.1016/j.aei.2024.102955_b0140
  article-title: Surface defect detection and classification of steel using an efficient Swin Transformer
  publication-title: Adv. Eng. Inf.
  doi: 10.1016/j.aei.2023.102061
– start-page: 2881
  year: 2017
  ident: 10.1016/j.aei.2024.102955_b0130
  article-title: Pyramid scene parsing network
– ident: 10.1016/j.aei.2024.102955_b0155
– volume: 35
  start-page: 351
  year: 2022
  ident: 10.1016/j.aei.2024.102955_b0120
  article-title: Fully sparse 3d object detection
  publication-title: Adv. Neural Inf. Proces. Syst.
– volume: 57
  year: 2023
  ident: 10.1016/j.aei.2024.102955_b0195
  article-title: An efficient 3D object detection method based on Fast Guided Anchor Stereo RCNN
  publication-title: Adv. Eng. Inf.
  doi: 10.1016/j.aei.2023.102069
SSID ssj0016897
Score 2.3935888
Snippet •An efficient multi-view 3D object detection algorithm, SparseDet, is proposed to construct sparse scene representation, which can remarkably diminish...
SourceID crossref
elsevier
SourceType Index Database
Publisher
StartPage 102955
SubjectTerms 3D object detection
Autonomous driving
Bird’s eye view
Multi-view cameras
Sparse scene representation
Title SparseDet: Towards efficient multi-view 3D object detection via sparse scene representation
URI https://dx.doi.org/10.1016/j.aei.2024.102955
Volume 62
WOSCitedRecordID wos001361218500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  issn: 1474-0346
  databaseCode: AIEXJ
  dateStart: 20020101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.sciencedirect.com
  omitProxy: false
  ssIdentifier: ssj0016897
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1La9tAEF4cJ4dc-kxI-mIPOVXIRO_d3oyd0oYQAnHBTQ5iXyIKQTaxbEx_fWcfkpy0gebQixCyNBY7H7Mzo29mEDpSnIKTLZVPeED9GO7yaQHBCpPgnUtSqCIQZthEdn5OplN60estm1qY1V1WVWS9pvP_qmq4BsrWpbPPUHcrFC7AOSgdjqB2OP6T4i_nEKuqsTI5v4lhxS40baM0pY-WQeibepVo7M24zsN4UtXKzgxflcxbGAme7vOkh6rMuwqlatOXHTb0AdX1NPRcI9Z6g0R_ZggDp_Drr5uZ2yi1nXGZ6rOyxeelmTDsXd109WkjVz3yc7leOiC7HEUYt2y3xqzGWewfRy7Z6Oyus8LWcIKfQ22_3j9suk0v3A6YKgda-KC792H_7Ef7Wss2bIhstzmIyLWI3IrYQtthllCw59vD7yfT0_bzU0rsVJ7mtZvP4YYY-Og9_u7QbDgpk1fohYsu8NCi4jXqqeoNeukiDezs-OItum5B8gU7iOAWIriDCI7G2EIEtxDBABFsIYINRPBDiOyhH19PJqNvvhuz4QuIvWtfFqHuwpZSlhAWhByMeMxESgoZiDQQUiawHiICP45SwdOAs5SKIIuighWEJ2G0j_rVrFIHCMtChRlRkYRtNE4lIYTC8kKMzHiQ8oQeos_NWuVz200lf1I7hyhuVjN37qB183JAxtOPvXvOf7xHux1eP6B-fb9UH9GOWNXl4v6Tg8VvrCyBJA
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SparseDet%3A+Towards+efficient+multi-view+3D+object+detection+via+sparse+scene+representation&rft.jtitle=Advanced+engineering+informatics&rft.au=Li%2C+Jingzhong&rft.au=Yang%2C+Lin&rft.au=Shi%2C+Zhen&rft.au=Chen%2C+Yuxuan&rft.date=2024-10-01&rft.issn=1474-0346&rft.volume=62&rft.spage=102955&rft_id=info:doi/10.1016%2Fj.aei.2024.102955&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_aei_2024_102955
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1474-0346&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1474-0346&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1474-0346&client=summon