SparseDet: Towards efficient multi-view 3D object detection via sparse scene representation
•An efficient multi-view 3D object detection algorithm, SparseDet, is proposed to construct sparse scene representation, which can remarkably diminish computational demands while enhancing detection accuracy.•A sparse sampling strategy utilizing both category-aware and geometry-aware supervisions is...
Gespeichert in:
| Veröffentlicht in: | Advanced engineering informatics Jg. 62; S. 102955 |
|---|---|
| Hauptverfasser: | , , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Elsevier Ltd
01.10.2024
|
| Schlagworte: | |
| ISSN: | 1474-0346 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | •An efficient multi-view 3D object detection algorithm, SparseDet, is proposed to construct sparse scene representation, which can remarkably diminish computational demands while enhancing detection accuracy.•A sparse sampling strategy utilizing both category-aware and geometry-aware supervisions is introduced to enhance foreground recognition.•A background aggregation module is designed to condense extensive background features into a compact set, which can significantly reduce computational costs while adaptively retaining contextual information.•Experimental results demonstrate the superiority of our method to state-of-the-art methods in both detection accuracy and inference speed.
Efficient and reliable 3D object detection via multi-view cameras is pivotal for improving the safety and facilitating the cost-effective deployment of autonomous driving systems. However, owing to the learning of dense scene representations, existing methods still suffer from high computational costs and excessive noise, affecting the efficiency and accuracy of the inference process. To overcome this challenge, we propose SparseDet, a model that exploits sparse scene representations. Specifically, a sparse sampling module with category-aware and geometry-aware supervision is first introduced to adaptively sample foreground features at both semantic and instance levels. Additionally, to conserve computational resources while retaining context information, we propose a background aggregation module designed to compress extensive background features into a compact set. These strategies can markedly diminish feature volume while preserving essential information to boost computational efficiency without compromising accuracy. Due to the efficient sparse scene representation, our SparseDet achieves leading performance on the widely used nuScenes benchmark. Comprehensive experiments validate that SparseDet surpasses the PETR while reducing the decoder computational complexity by 47% in terms of FLOPs, facilitating a leading inference speed of 35.6 FPS on a single RTX3090 GPU. |
|---|---|
| AbstractList | •An efficient multi-view 3D object detection algorithm, SparseDet, is proposed to construct sparse scene representation, which can remarkably diminish computational demands while enhancing detection accuracy.•A sparse sampling strategy utilizing both category-aware and geometry-aware supervisions is introduced to enhance foreground recognition.•A background aggregation module is designed to condense extensive background features into a compact set, which can significantly reduce computational costs while adaptively retaining contextual information.•Experimental results demonstrate the superiority of our method to state-of-the-art methods in both detection accuracy and inference speed.
Efficient and reliable 3D object detection via multi-view cameras is pivotal for improving the safety and facilitating the cost-effective deployment of autonomous driving systems. However, owing to the learning of dense scene representations, existing methods still suffer from high computational costs and excessive noise, affecting the efficiency and accuracy of the inference process. To overcome this challenge, we propose SparseDet, a model that exploits sparse scene representations. Specifically, a sparse sampling module with category-aware and geometry-aware supervision is first introduced to adaptively sample foreground features at both semantic and instance levels. Additionally, to conserve computational resources while retaining context information, we propose a background aggregation module designed to compress extensive background features into a compact set. These strategies can markedly diminish feature volume while preserving essential information to boost computational efficiency without compromising accuracy. Due to the efficient sparse scene representation, our SparseDet achieves leading performance on the widely used nuScenes benchmark. Comprehensive experiments validate that SparseDet surpasses the PETR while reducing the decoder computational complexity by 47% in terms of FLOPs, facilitating a leading inference speed of 35.6 FPS on a single RTX3090 GPU. |
| ArticleNumber | 102955 |
| Author | Jin, Yue Chen, Yuxuan Yang, Lin Shi, Zhen Akiyama, Kanta Li, Jingzhong Xu, Anze |
| Author_xml | – sequence: 1 givenname: Jingzhong surname: Li fullname: Li, Jingzhong email: jzhong_l@sjtu.edu.cn – sequence: 2 givenname: Lin surname: Yang fullname: Yang, Lin email: yanglin@sjtu.edu.cn – sequence: 3 givenname: Zhen surname: Shi fullname: Shi, Zhen email: shi_zhen@sjtu.edu.cn – sequence: 4 givenname: Yuxuan surname: Chen fullname: Chen, Yuxuan email: chenyxSJTU@sjtu.edu.cn – sequence: 5 givenname: Yue surname: Jin fullname: Jin, Yue email: jinyue1919@sjtu.edu.cn – sequence: 6 givenname: Kanta surname: Akiyama fullname: Akiyama, Kanta email: mtaki1216@sjtu.edu.cn – sequence: 7 givenname: Anze surname: Xu fullname: Xu, Anze email: slasher@sjtu.edu.cn |
| BookMark | eNp9kM1OwzAQhH0oEm3hAbj5BVL8FyeGE2r5kypxoJw4WI69lhy1SWSHVrw9DuXMabTa_VYzs0Czru8AoRtKVpRQeduuDIQVI0zkmamynKE5FZUoCBfyEi1Sakm-q1U1R5_vg4kJNjDe4V1_MtElDN4HG6Ab8eFrP4biGOCE-Qb3TQt2xA7GLKHv8DEYnH55nCx0gCMMEVImzbS_Qhfe7BNc_-kSfTw97tYvxfbt-XX9sC0sE2osnGcV4VIqU9aGsoaVlTBW1t5RK6l1rsxeLZdUKGUbSRsjlaUV5974uikZXyJ6_mtjn1IEr4cYDiZ-a0r01IhudW5ET43ocyOZuT8zkI3lgFGnKbIFF2JOp10f_qF_AK1pbbc |
| Cites_doi | 10.1109/TIV.2019.2955905 10.1109/TPAMI.2023.3292030 10.1016/j.aei.2022.101641 10.1007/978-3-030-58536-5_45 10.1007/978-3-030-58452-8_13 10.1016/j.patcog.2024.110457 10.1007/978-3-031-19812-0_31 10.1007/978-3-031-19809-0_8 10.1007/s11263-022-01579-8 10.1109/TPAMI.2023.3286409 10.1007/978-3-030-58568-6_12 10.1016/j.aei.2023.102348 10.1016/j.aei.2023.101971 10.1016/j.aei.2023.102249 10.1109/CVPR.2009.5206848 10.1016/j.patcog.2023.109997 10.1016/j.aei.2023.102007 10.1007/978-3-031-20077-9_1 10.1016/j.aei.2023.102061 10.1016/j.aei.2023.102069 |
| ContentType | Journal Article |
| Copyright | 2024 |
| Copyright_xml | – notice: 2024 |
| DBID | AAYXX CITATION |
| DOI | 10.1016/j.aei.2024.102955 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Applied Sciences |
| ExternalDocumentID | 10_1016_j_aei_2024_102955 S1474034624006062 |
| GroupedDBID | --K --M -~X .DC .~1 0R~ 1B1 1~. 1~5 23M 4.4 457 4G. 5GY 5VS 7-5 71M 8P~ AAAKF AAAKG AACTN AAEDT AAEDW AAIKJ AAKOC AALRI AAOAW AAQFI AARIN AAXKI AAXUO AAYFN ABBOA ABFNM ABMAC ABUCO ABWVN ABXDB ACDAQ ACGFS ACNNM ACRLP ACRPL ACZNC ADBBV ADEZE ADJOM ADMUD ADNMO ADTZH AEBSH AECPX AEKER AENEX AFJKZ AFKWA AFTJW AGHFR AGUBO AGYEJ AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJOXV AKRWK ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD APLSM AXJTR BJAXD BKOJK BLXMC CS3 EBS EFJIC EJD EO8 EO9 EP2 EP3 FDB FEDTE FIRID FNPLU FYGXN G-Q GBLVA GBOLZ HAMUX HVGLF HZ~ IHE J1W JJJVA KOM M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 PC. Q38 RIG RNS ROL RPZ SDF SDG SDP SES SEW SPC SPCBC SSB SSD SST SSV SSZ T5K UHS XPP ZMT ~G- 9DU AATTM AAYWO AAYXX ABJNI ACLOT ACVFH ADCNI AEIPS AEUPX AFPUW AIGII AIIUN AKBMS AKYEP ANKPU APXCP CITATION EFKBS EFLBG ~HD |
| ID | FETCH-LOGICAL-c249t-df2703669a58a12b2574ac68fd1c61cdd5168c361499cb61ba69c1733faf8b523 |
| ISICitedReferencesCount | 1 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001361218500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1474-0346 |
| IngestDate | Sat Nov 29 03:19:55 EST 2025 Sat Dec 14 16:15:12 EST 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Sparse scene representation Multi-view cameras Autonomous driving Bird’s eye view 3D object detection |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c249t-df2703669a58a12b2574ac68fd1c61cdd5168c361499cb61ba69c1733faf8b523 |
| ParticipantIDs | crossref_primary_10_1016_j_aei_2024_102955 elsevier_sciencedirect_doi_10_1016_j_aei_2024_102955 |
| PublicationCentury | 2000 |
| PublicationDate | October 2024 2024-10-00 |
| PublicationDateYYYYMMDD | 2024-10-01 |
| PublicationDate_xml | – month: 10 year: 2024 text: October 2024 |
| PublicationDecade | 2020 |
| PublicationTitle | Advanced engineering informatics |
| PublicationYear | 2024 |
| Publisher | Elsevier Ltd |
| Publisher_xml | – sequence: 0 name: Elsevier Ltd |
| References | Zhu, Zhang, Zhang, Zhu, Guan, Jia (b0140) 2023; 57 Wang, Zhu, Pang, Lin (b0225) 2021 Jiang, Zhang, Miao, Zhu, Gao, Hu, Jiang (b0275) 2023 Fan, Wang, Wang, Zhang (b0120) 2022; 35 Fan, Yang, Wang, Wang, Zhang (b0200) 2023 Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Y. Qiao, J. Dai, Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers, in: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IX, Springer Nature Switzerland Cham, 2022, pp. 1–18. Y. Wang, V.C. Guizilini, T. Zhang, Y. Wang, H. Zhao, J. Solomon, Detr3d: 3d object detection from multi-view images via 3d-to-2d queries, in: Conference on Robot Learning, PMLR, 2022, pp. 180–191. Li, Wang, Wu, Chen, Hu, Li, Tang, Yang (b0215) 2020; 33 Reading, Harakeh, Chae, Waslander (b0210) 2021 Li, Yang, Chen, Yang, Jin, Akiyama (b0015) 2023 Wang, Zhang, Zhang, Zhang, Liang, Huang, Huang (b0040) 2024; 59 Sun, Zhang, Jiang, Kong, Xu, Zhan, Tomizuka, Yuan, Luo (b0190) 2023 Li, Ge, Yu, Yang, Wang, Shi, Sun, Li (b0075) 2023 H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L.M. Ni, H.-Y. Shum, Dino: Detr with improved denoising anchor boxes for end-to-end object detection, arXiv preprint arXiv:2203.03605, (2022). X. Lin, T. Lin, Z. Pei, L. Huang, Z. Su, Sparse4d: multi-view 3d object detection with sparse spatial-temporal fusion, arXiv preprint arXiv:2211.10581, (2022). X. Zhu, W. Su, L. Lu, B. Li, X. Wang, J. Dai, Deformable detr: deformable transformers for end-to-end object detection, arXiv preprint arXiv:2010.04159, (2020). Huang, Wang, Wen, Wang, Cai (b0005) 2023; 56 Xiong, Gong, Ye, Tan, Wan, Ding, Wang, Bai (b0110) 2023 Fan, Pang, Zhang, Wang, Zhao, Wang, Wang, Zhang (b0205) 2022 K. Mangalam, H. Girase, S. Agarwal, K.-H. Lee, E. Adeli, J. Malik, A. Gaidon, It is not the journey but the destination: endpoint conditioned trajectory prediction, in: European Conference on Computer Vision, Springer, 2020, pp. 759-776. Wang, Jiang, Li (b0100) 2023 Wang, Liu, Wang, Li, Zhang (b0105) 2023 B. Roh, J. Shin, W. Shin, S. Kim, Sparse DETR: efficient end-to-end object detection with learnable sparsity, in: International Conference on Learning Representations, 2021. Stoiber, Pfanne, Strobl, Triebel, Albu-Schäffer (b0020) 2022; 130 Y. Liu, T. Wang, X. Zhang, J. Sun, Petr: Position embedding transformation for multi-view 3d object detection, in: European Conference on Computer Vision, Springer, 2022, pp. 531–548. Jiang, Li, Liu, Wang, Jia, Wang, Han, Zhang (b0115) 2024 Wang, Yuan, Chen, Feng, Yan (b0160) 2021 Liu, Teng, Lu, Wang, Wang (b0070) 2023 Roy, Bhaduri (b0145) 2023; 56 Tao, Cao, Cheng, Gao, Luo, Zhang, Zheng (b0195) 2023; 57 N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, End-to-end object detection with transformers, in: European conference on computer vision, Springer, 2020, pp. 213–229. Emunds, Pauen, Richter, Frisch, van Treeck (b0180) 2022; 53 Tian, Shen, Chen, He (b0220) 2020; 44 Lee, Hwang, Lee, Bae, Park (b0245) 2019 Caesar, Bankiti, Lang, Vora, Liong, Xu, Krishnan, Pan, Baldan, Beijbom (b0235) 2020 Park, Ambrus, Guizilini, Li, Gaidon (b0255) 2021 He, Zhang, Ren, Sun (b0240) 2016 J. Huang, G. Huang, Z. Zhu, Y. Ye, D. Du, Bevdet: high-performance multi-camera 3d object detection in bird-eye-view, arXiv preprint arXiv:2112.11790, (2021). Li, Fan, Liu, Huang, Chen, Wang, Zhang (b0125) 2024 Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, Polosukhin (b0095) 2017; 30 Cao, Zhou, Tao, Xue, Gao, Zhang, Zhu (b0010) 2024; 59 Li, Zhang, Liu, Guo, Ni, Zhang (b0170) 2022 Li, Bao, Ge, Yang, Sun, Li (b0065) 2023 Zhang, Jiang, Qiu, Liu (b0135) 2024; 110630 J. Lu, Z. Zhou, X. Zhu, H. Xu, L. Zhang, Learning ego 3d representation as ray tracing, European Conference on Computer Vision, Springer, 2022, pp. 129-144. Li, Yang, Chen, Jin (b0030) 2024; 146 Chen, Tian, Wang, Wang, Xiong, Li (b0280) 2024 J. Philion, S. Fidler, Lift, splat, shoot: encoding images from arbitrary camera rigs by implicitly unprojecting to 3d, in: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, Springer, 2020, pp. 194–210. Yang, Chen, Tian, Tao, Zhu, Zhang, Huang, Li, Qiao, Lu (b0090) 2023 Hoel, Driggs-Campbell, Wolff, Laine, Kochenderfer (b0045) 2019; 5 I. Loshchilov, F. Hutter, Decoupled weight decay regularization, arXiv preprint arXiv:1711.05101, (2017). Najibi, Rastegari, Davis (b0185) 2016 Nguyen, Quach, Duong, Phung, Le, Luu (b0025) 2024; 152 Zhao, Shi, Qi, Wang, Jia (b0130) 2017 Rezatofighi, Tsoi, Gwak, Sadeghian, Reid, Savarese (b0230) 2019 J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: a large-scale hierarchical image database, 2009 IEEE conference on computer vision and pattern recognition, Ieee, 2009, pp. 248–255. Cao (10.1016/j.aei.2024.102955_b0010) 2024; 59 Fan (10.1016/j.aei.2024.102955_b0200) 2023 Li (10.1016/j.aei.2024.102955_b0125) 2024 Fan (10.1016/j.aei.2024.102955_b0120) 2022; 35 Wang (10.1016/j.aei.2024.102955_b0040) 2024; 59 Chen (10.1016/j.aei.2024.102955_b0280) 2024 10.1016/j.aei.2024.102955_b0085 Najibi (10.1016/j.aei.2024.102955_b0185) 2016 Yang (10.1016/j.aei.2024.102955_b0090) 2023 10.1016/j.aei.2024.102955_b0165 Rezatofighi (10.1016/j.aei.2024.102955_b0230) 2019 Roy (10.1016/j.aei.2024.102955_b0145) 2023; 56 Jiang (10.1016/j.aei.2024.102955_b0275) 2023 Zhao (10.1016/j.aei.2024.102955_b0130) 2017 10.1016/j.aei.2024.102955_b0080 Tao (10.1016/j.aei.2024.102955_b0195) 2023; 57 Reading (10.1016/j.aei.2024.102955_b0210) 2021 Wang (10.1016/j.aei.2024.102955_b0100) 2023 Caesar (10.1016/j.aei.2024.102955_b0235) 2020 Liu (10.1016/j.aei.2024.102955_b0070) 2023 Tian (10.1016/j.aei.2024.102955_b0220) 2020; 44 Zhang (10.1016/j.aei.2024.102955_b0135) 2024; 110630 10.1016/j.aei.2024.102955_b0250 He (10.1016/j.aei.2024.102955_b0240) 2016 10.1016/j.aei.2024.102955_b0050 Vaswani (10.1016/j.aei.2024.102955_b0095) 2017; 30 10.1016/j.aei.2024.102955_b0055 10.1016/j.aei.2024.102955_b0175 Li (10.1016/j.aei.2024.102955_b0015) 2023 Li (10.1016/j.aei.2024.102955_b0075) 2023 Li (10.1016/j.aei.2024.102955_b0170) 2022 Lee (10.1016/j.aei.2024.102955_b0245) 2019 Fan (10.1016/j.aei.2024.102955_b0205) 2022 Jiang (10.1016/j.aei.2024.102955_b0115) 2024 10.1016/j.aei.2024.102955_b0260 10.1016/j.aei.2024.102955_b0060 Nguyen (10.1016/j.aei.2024.102955_b0025) 2024; 152 Wang (10.1016/j.aei.2024.102955_b0160) 2021 10.1016/j.aei.2024.102955_b0265 Wang (10.1016/j.aei.2024.102955_b0225) 2021 Hoel (10.1016/j.aei.2024.102955_b0045) 2019; 5 Sun (10.1016/j.aei.2024.102955_b0190) 2023 Li (10.1016/j.aei.2024.102955_b0030) 2024; 146 Wang (10.1016/j.aei.2024.102955_b0105) 2023 Xiong (10.1016/j.aei.2024.102955_b0110) 2023 Park (10.1016/j.aei.2024.102955_b0255) 2021 10.1016/j.aei.2024.102955_b0035 10.1016/j.aei.2024.102955_b0150 Li (10.1016/j.aei.2024.102955_b0215) 2020; 33 10.1016/j.aei.2024.102955_b0270 10.1016/j.aei.2024.102955_b0155 Zhu (10.1016/j.aei.2024.102955_b0140) 2023; 57 Emunds (10.1016/j.aei.2024.102955_b0180) 2022; 53 Huang (10.1016/j.aei.2024.102955_b0005) 2023; 56 Stoiber (10.1016/j.aei.2024.102955_b0020) 2022; 130 Li (10.1016/j.aei.2024.102955_b0065) 2023 |
| References_xml | – reference: H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L.M. Ni, H.-Y. Shum, Dino: Detr with improved denoising anchor boxes for end-to-end object detection, arXiv preprint arXiv:2203.03605, (2022). – volume: 130 start-page: 1008 year: 2022 end-page: 1030 ident: b0020 article-title: SRT3D: a sparse region-based 3D object tracking approach for the real world publication-title: Int. J. Comput. Vis. – start-page: 18580 year: 2023 end-page: 18590 ident: b0070 article-title: Sparsebev: high-performance sparse 3d object detection from multi-camera videos publication-title: Proceedings of the IEEE/CVF International Conference on Computer Vision – volume: 57 year: 2023 ident: b0195 article-title: An efficient 3D object detection method based on Fast Guided Anchor Stereo RCNN publication-title: Adv. Eng. Inf. – start-page: 913 year: 2021 end-page: 922 ident: b0225 article-title: Fcos3d: fully convolutional one-stage monocular 3d object detection publication-title: Proceedings of the IEEE/CVF International Conference on Computer Vision – volume: 59 year: 2024 ident: b0040 article-title: Machining feature process route planning based on a graph convolutional neural network publication-title: Adv. Eng. Inf. – start-page: 3142 year: 2021 end-page: 3152 ident: b0255 article-title: Is pseudo-lidar needed for monocular 3d object detection? publication-title: Proceedings of the IEEE/CVF International Conference on Computer Vision – reference: B. Roh, J. Shin, W. Shin, S. Kim, Sparse DETR: efficient end-to-end object detection with learnable sparsity, in: International Conference on Learning Representations, 2021. – reference: J. Lu, Z. Zhou, X. Zhu, H. Xu, L. Zhang, Learning ego 3d representation as ray tracing, European Conference on Computer Vision, Springer, 2022, pp. 129-144. – year: 2023 ident: b0200 article-title: Super sparse 3d object detection publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – reference: Y. Liu, T. Wang, X. Zhang, J. Sun, Petr: Position embedding transformation for multi-view 3d object detection, in: European Conference on Computer Vision, Springer, 2022, pp. 531–548. – start-page: 4661 year: 2021 end-page: 4670 ident: b0160 article-title: Pnp-detr: towards efficient visual analysis with transformers publication-title: Proceedings of the IEEE/CVF International Conference on Computer Vision – reference: I. Loshchilov, F. Hutter, Decoupled weight decay regularization, arXiv preprint arXiv:1711.05101, (2017). – start-page: 3621 year: 2023 end-page: 3631 ident: b0105 article-title: Exploring object-centric temporal modeling for efficient multi-view 3d object detection publication-title: Proceedings of the IEEE/CVF International Conference on Computer Vision – volume: 152 year: 2024 ident: b0025 article-title: Multi-camera multi-object tracking on the move via single-stage global association approach publication-title: Pattern Recogn. – volume: 56 year: 2023 ident: b0145 article-title: DenseSPH-YOLOv5: an automated damage detection model based on DenseNet and Swin-Transformer prediction head-enabled YOLOv5 with attention mechanism publication-title: Adv. Eng. Inf. – reference: J. Huang, G. Huang, Z. Zhu, Y. Ye, D. Du, Bevdet: high-performance multi-camera 3d object detection in bird-eye-view, arXiv preprint arXiv:2112.11790, (2021). – start-page: 2881 year: 2017 end-page: 2890 ident: b0130 article-title: Pyramid scene parsing network publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – start-page: 11621 year: 2020 end-page: 11631 ident: b0235 article-title: nuscenes: a multimodal dataset for autonomous driving publication-title: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition – start-page: 2369 year: 2016 end-page: 2377 ident: b0185 article-title: G-cnn: an iterative grid based object detector publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – start-page: 770 year: 2016 end-page: 778 ident: b0240 article-title: Deep residual learning for image recognition publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – volume: 5 start-page: 294 year: 2019 end-page: 305 ident: b0045 article-title: Combining planning and deep reinforcement learning in tactical decision making for autonomous driving publication-title: IEEE Trans. Intell. Veh. – reference: J. Philion, S. Fidler, Lift, splat, shoot: encoding images from arbitrary camera rigs by implicitly unprojecting to 3d, in: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, Springer, 2020, pp. 194–210. – volume: 110630 year: 2024 ident: b0135 article-title: TCFAP-Net: transformer-based Cross-feature Fusion and Adaptive Perception Network for large-scale point cloud semantic segmentation publication-title: Pattern Recogn. – volume: 56 year: 2023 ident: b0005 article-title: An object detection algorithm combining semantic and geometric information of the 3D point cloud publication-title: Adv. Eng. Inf. – reference: Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Y. Qiao, J. Dai, Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers, in: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IX, Springer Nature Switzerland Cham, 2022, pp. 1–18. – reference: K. Mangalam, H. Girase, S. Agarwal, K.-H. Lee, E. Adeli, J. Malik, A. Gaidon, It is not the journey but the destination: endpoint conditioned trajectory prediction, in: European Conference on Computer Vision, Springer, 2020, pp. 759-776. – reference: J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: a large-scale hierarchical image database, 2009 IEEE conference on computer vision and pattern recognition, Ieee, 2009, pp. 248–255. – volume: 57 year: 2023 ident: b0140 article-title: Surface defect detection and classification of steel using an efficient Swin Transformer publication-title: Adv. Eng. Inf. – volume: 53 year: 2022 ident: b0180 article-title: SpaRSE-BIM: classification of IFC-based geometry via sparse convolutional neural networks publication-title: Adv. Eng. Inf. – year: 2024 ident: b0125 article-title: Fully sparse fusion for 3d object detection publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – year: 2019 ident: b0245 article-title: An energy and GPU-computation efficient backbone network for real-time object detection publication-title: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops – volume: 44 start-page: 1922 year: 2020 end-page: 1933 ident: b0220 article-title: FCOS: a simple and strong anchor-free object detector publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – start-page: 1851 year: 2023 end-page: 1857 ident: b0015 article-title: PillarDAN: Pillar-based Dual Attention Attention Network for 3D Object Detection with 4D RaDAR publication-title: 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), IEEE – start-page: 1477 year: 2023 end-page: 1485 ident: b0075 article-title: Bevdepth: acquisition of reliable depth for multi-view 3d object detection publication-title: Proceedings of the AAAI Conference on Artificial Intelligence – start-page: 658 year: 2019 end-page: 666 ident: b0230 article-title: Generalized intersection over union: a metric and a loss for bounding box regression publication-title: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition – volume: 59 year: 2024 ident: b0010 article-title: VSL-Net: Voxel structure learning for 3D object detection publication-title: Adv. Eng. Inf. – year: 2023 ident: b0100 article-title: Focal-petr: embracing foreground for efficient multi-camera 3d object detection publication-title: IEEE Trans. Intell. Veh. – start-page: 2561 year: 2024 end-page: 2569 ident: b0115 article-title: Far3d: expanding the horizon for surround-view 3d object detection publication-title: Proceedings of the AAAI Conference on Artificial Intelligence – start-page: 8555 year: 2021 end-page: 8564 ident: b0210 article-title: Categorical depth distribution network for monocular 3d object detection publication-title: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition – reference: N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, End-to-end object detection with transformers, in: European conference on computer vision, Springer, 2020, pp. 213–229. – start-page: 21570 year: 2023 end-page: 21579 ident: b0110 article-title: Cape: camera view position embedding for multi-view 3d object detection publication-title: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition – volume: 33 start-page: 21002 year: 2020 end-page: 21012 ident: b0215 article-title: Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection publication-title: Adv. Neural Inf. Proces. Syst. – start-page: 1042 year: 2023 end-page: 1050 ident: b0275 article-title: Polarformer: multi-camera 3d object detection with polar transformer publication-title: Proceedings of the AAAI Conference on Artificial Intelligence – year: 2023 ident: b0190 article-title: Sparse R-CNN: an end-to-end framework for object detection publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – reference: X. Lin, T. Lin, Z. Pei, L. Huang, Z. Su, Sparse4d: multi-view 3d object detection with sparse spatial-temporal fusion, arXiv preprint arXiv:2211.10581, (2022). – volume: 146 year: 2024 ident: b0030 article-title: MFAN: Mixing Feature Attention Network for trajectory prediction publication-title: Pattern Recogn. – reference: Y. Wang, V.C. Guizilini, T. Zhang, Y. Wang, H. Zhao, J. Solomon, Detr3d: 3d object detection from multi-view images via 3d-to-2d queries, in: Conference on Robot Learning, PMLR, 2022, pp. 180–191. – volume: 35 start-page: 351 year: 2022 end-page: 363 ident: b0120 article-title: Fully sparse 3d object detection publication-title: Adv. Neural Inf. Proces. Syst. – start-page: 13619 year: 2022 end-page: 13627 ident: b0170 article-title: Dn-detr: accelerate detr training by introducing query denoising publication-title: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition – volume: 30 year: 2017 ident: b0095 article-title: Attention is all you need publication-title: Adv. Neural Inf. Proces. Syst. – start-page: 1486 year: 2023 end-page: 1494 ident: b0065 article-title: Bevstereo: enhancing depth estimation in multi-view 3d object detection with temporal stereo publication-title: Proceedings of the AAAI Conference on Artificial Intelligence – reference: X. Zhu, W. Su, L. Lu, B. Li, X. Wang, J. Dai, Deformable detr: deformable transformers for end-to-end object detection, arXiv preprint arXiv:2010.04159, (2020). – start-page: 17830 year: 2023 end-page: 17839 ident: b0090 article-title: BEVFormer v2: adapting modern image backbones to bird's-eye-view recognition via perspective supervision publication-title: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition – start-page: 8458 year: 2022 end-page: 8468 ident: b0205 article-title: Embracing single stride 3d object detector with sparse transformer publication-title: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition – year: 2024 ident: b0280 article-title: EPro-PnP: generalized end-to-end probabilistic perspective-N-points for monocular object pose estimation publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – volume: 5 start-page: 294 year: 2019 ident: 10.1016/j.aei.2024.102955_b0045 article-title: Combining planning and deep reinforcement learning in tactical decision making for autonomous driving publication-title: IEEE Trans. Intell. Veh. doi: 10.1109/TIV.2019.2955905 – year: 2023 ident: 10.1016/j.aei.2024.102955_b0190 article-title: Sparse R-CNN: an end-to-end framework for object detection publication-title: IEEE Trans. Pattern Anal. Mach. Intell. doi: 10.1109/TPAMI.2023.3292030 – start-page: 3142 year: 2021 ident: 10.1016/j.aei.2024.102955_b0255 article-title: Is pseudo-lidar needed for monocular 3d object detection? – start-page: 13619 year: 2022 ident: 10.1016/j.aei.2024.102955_b0170 article-title: Dn-detr: accelerate detr training by introducing query denoising – start-page: 658 year: 2019 ident: 10.1016/j.aei.2024.102955_b0230 article-title: Generalized intersection over union: a metric and a loss for bounding box regression – volume: 44 start-page: 1922 year: 2020 ident: 10.1016/j.aei.2024.102955_b0220 article-title: FCOS: a simple and strong anchor-free object detector publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – start-page: 11621 year: 2020 ident: 10.1016/j.aei.2024.102955_b0235 article-title: nuscenes: a multimodal dataset for autonomous driving – volume: 53 year: 2022 ident: 10.1016/j.aei.2024.102955_b0180 article-title: SpaRSE-BIM: classification of IFC-based geometry via sparse convolutional neural networks publication-title: Adv. Eng. Inf. doi: 10.1016/j.aei.2022.101641 – ident: 10.1016/j.aei.2024.102955_b0035 doi: 10.1007/978-3-030-58536-5_45 – start-page: 2561 year: 2024 ident: 10.1016/j.aei.2024.102955_b0115 article-title: Far3d: expanding the horizon for surround-view 3d object detection – ident: 10.1016/j.aei.2024.102955_b0150 doi: 10.1007/978-3-030-58452-8_13 – start-page: 1477 year: 2023 ident: 10.1016/j.aei.2024.102955_b0075 article-title: Bevdepth: acquisition of reliable depth for multi-view 3d object detection – ident: 10.1016/j.aei.2024.102955_b0055 – volume: 110630 year: 2024 ident: 10.1016/j.aei.2024.102955_b0135 article-title: TCFAP-Net: transformer-based Cross-feature Fusion and Adaptive Perception Network for large-scale point cloud semantic segmentation publication-title: Pattern Recogn. – start-page: 3621 year: 2023 ident: 10.1016/j.aei.2024.102955_b0105 article-title: Exploring object-centric temporal modeling for efficient multi-view 3d object detection – volume: 152 year: 2024 ident: 10.1016/j.aei.2024.102955_b0025 article-title: Multi-camera multi-object tracking on the move via single-stage global association approach publication-title: Pattern Recogn. doi: 10.1016/j.patcog.2024.110457 – ident: 10.1016/j.aei.2024.102955_b0060 doi: 10.1007/978-3-031-19812-0_31 – ident: 10.1016/j.aei.2024.102955_b0270 doi: 10.1007/978-3-031-19809-0_8 – start-page: 1042 year: 2023 ident: 10.1016/j.aei.2024.102955_b0275 article-title: Polarformer: multi-camera 3d object detection with polar transformer – volume: 130 start-page: 1008 year: 2022 ident: 10.1016/j.aei.2024.102955_b0020 article-title: SRT3D: a sparse region-based 3D object tracking approach for the real world publication-title: Int. J. Comput. Vis. doi: 10.1007/s11263-022-01579-8 – year: 2023 ident: 10.1016/j.aei.2024.102955_b0200 article-title: Super sparse 3d object detection publication-title: IEEE Trans. Pattern Anal. Mach. Intell. doi: 10.1109/TPAMI.2023.3286409 – ident: 10.1016/j.aei.2024.102955_b0085 doi: 10.1007/978-3-030-58568-6_12 – start-page: 1486 year: 2023 ident: 10.1016/j.aei.2024.102955_b0065 article-title: Bevstereo: enhancing depth estimation in multi-view 3d object detection with temporal stereo – start-page: 2369 year: 2016 ident: 10.1016/j.aei.2024.102955_b0185 article-title: G-cnn: an iterative grid based object detector – volume: 59 year: 2024 ident: 10.1016/j.aei.2024.102955_b0010 article-title: VSL-Net: Voxel structure learning for 3D object detection publication-title: Adv. Eng. Inf. doi: 10.1016/j.aei.2023.102348 – volume: 56 year: 2023 ident: 10.1016/j.aei.2024.102955_b0005 article-title: An object detection algorithm combining semantic and geometric information of the 3D point cloud publication-title: Adv. Eng. Inf. doi: 10.1016/j.aei.2023.101971 – volume: 59 year: 2024 ident: 10.1016/j.aei.2024.102955_b0040 article-title: Machining feature process route planning based on a graph convolutional neural network publication-title: Adv. Eng. Inf. doi: 10.1016/j.aei.2023.102249 – volume: 33 start-page: 21002 year: 2020 ident: 10.1016/j.aei.2024.102955_b0215 article-title: Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection publication-title: Adv. Neural Inf. Proces. Syst. – start-page: 770 year: 2016 ident: 10.1016/j.aei.2024.102955_b0240 article-title: Deep residual learning for image recognition – year: 2024 ident: 10.1016/j.aei.2024.102955_b0125 article-title: Fully sparse fusion for 3d object detection publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – ident: 10.1016/j.aei.2024.102955_b0250 doi: 10.1109/CVPR.2009.5206848 – volume: 146 year: 2024 ident: 10.1016/j.aei.2024.102955_b0030 article-title: MFAN: Mixing Feature Attention Network for trajectory prediction publication-title: Pattern Recogn. doi: 10.1016/j.patcog.2023.109997 – year: 2023 ident: 10.1016/j.aei.2024.102955_b0100 article-title: Focal-petr: embracing foreground for efficient multi-camera 3d object detection publication-title: IEEE Trans. Intell. Veh. – ident: 10.1016/j.aei.2024.102955_b0165 – volume: 56 year: 2023 ident: 10.1016/j.aei.2024.102955_b0145 article-title: DenseSPH-YOLOv5: an automated damage detection model based on DenseNet and Swin-Transformer prediction head-enabled YOLOv5 with attention mechanism publication-title: Adv. Eng. Inf. doi: 10.1016/j.aei.2023.102007 – start-page: 1851 year: 2023 ident: 10.1016/j.aei.2024.102955_b0015 article-title: PillarDAN: Pillar-based Dual Attention Attention Network for 3D Object Detection with 4D RaDAR – start-page: 17830 year: 2023 ident: 10.1016/j.aei.2024.102955_b0090 article-title: BEVFormer v2: adapting modern image backbones to bird's-eye-view recognition via perspective supervision – ident: 10.1016/j.aei.2024.102955_b0265 – year: 2019 ident: 10.1016/j.aei.2024.102955_b0245 article-title: An energy and GPU-computation efficient backbone network for real-time object detection – start-page: 21570 year: 2023 ident: 10.1016/j.aei.2024.102955_b0110 article-title: Cape: camera view position embedding for multi-view 3d object detection – start-page: 8555 year: 2021 ident: 10.1016/j.aei.2024.102955_b0210 article-title: Categorical depth distribution network for monocular 3d object detection – year: 2024 ident: 10.1016/j.aei.2024.102955_b0280 article-title: EPro-PnP: generalized end-to-end probabilistic perspective-N-points for monocular object pose estimation publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – ident: 10.1016/j.aei.2024.102955_b0175 – ident: 10.1016/j.aei.2024.102955_b0080 – start-page: 913 year: 2021 ident: 10.1016/j.aei.2024.102955_b0225 article-title: Fcos3d: fully convolutional one-stage monocular 3d object detection – ident: 10.1016/j.aei.2024.102955_b0050 doi: 10.1007/978-3-031-20077-9_1 – start-page: 4661 year: 2021 ident: 10.1016/j.aei.2024.102955_b0160 article-title: Pnp-detr: towards efficient visual analysis with transformers – start-page: 18580 year: 2023 ident: 10.1016/j.aei.2024.102955_b0070 article-title: Sparsebev: high-performance sparse 3d object detection from multi-camera videos – volume: 30 year: 2017 ident: 10.1016/j.aei.2024.102955_b0095 article-title: Attention is all you need publication-title: Adv. Neural Inf. Proces. Syst. – ident: 10.1016/j.aei.2024.102955_b0260 – start-page: 8458 year: 2022 ident: 10.1016/j.aei.2024.102955_b0205 article-title: Embracing single stride 3d object detector with sparse transformer – volume: 57 year: 2023 ident: 10.1016/j.aei.2024.102955_b0140 article-title: Surface defect detection and classification of steel using an efficient Swin Transformer publication-title: Adv. Eng. Inf. doi: 10.1016/j.aei.2023.102061 – start-page: 2881 year: 2017 ident: 10.1016/j.aei.2024.102955_b0130 article-title: Pyramid scene parsing network – ident: 10.1016/j.aei.2024.102955_b0155 – volume: 35 start-page: 351 year: 2022 ident: 10.1016/j.aei.2024.102955_b0120 article-title: Fully sparse 3d object detection publication-title: Adv. Neural Inf. Proces. Syst. – volume: 57 year: 2023 ident: 10.1016/j.aei.2024.102955_b0195 article-title: An efficient 3D object detection method based on Fast Guided Anchor Stereo RCNN publication-title: Adv. Eng. Inf. doi: 10.1016/j.aei.2023.102069 |
| SSID | ssj0016897 |
| Score | 2.3935888 |
| Snippet | •An efficient multi-view 3D object detection algorithm, SparseDet, is proposed to construct sparse scene representation, which can remarkably diminish... |
| SourceID | crossref elsevier |
| SourceType | Index Database Publisher |
| StartPage | 102955 |
| SubjectTerms | 3D object detection Autonomous driving Bird’s eye view Multi-view cameras Sparse scene representation |
| Title | SparseDet: Towards efficient multi-view 3D object detection via sparse scene representation |
| URI | https://dx.doi.org/10.1016/j.aei.2024.102955 |
| Volume | 62 |
| WOSCitedRecordID | wos001361218500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 issn: 1474-0346 databaseCode: AIEXJ dateStart: 20020101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.sciencedirect.com omitProxy: false ssIdentifier: ssj0016897 providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1La9tAEF4cJ4dc-kxI-mIPOVXIRO_d3oyd0oYQAnHBTQ5iXyIKQTaxbEx_fWcfkpy0gebQixCyNBY7H7Mzo29mEDpSnIKTLZVPeED9GO7yaQHBCpPgnUtSqCIQZthEdn5OplN60estm1qY1V1WVWS9pvP_qmq4BsrWpbPPUHcrFC7AOSgdjqB2OP6T4i_nEKuqsTI5v4lhxS40baM0pY-WQeibepVo7M24zsN4UtXKzgxflcxbGAme7vOkh6rMuwqlatOXHTb0AdX1NPRcI9Z6g0R_ZggDp_Drr5uZ2yi1nXGZ6rOyxeelmTDsXd109WkjVz3yc7leOiC7HEUYt2y3xqzGWewfRy7Z6Oyus8LWcIKfQ22_3j9suk0v3A6YKgda-KC792H_7Ef7Wss2bIhstzmIyLWI3IrYQtthllCw59vD7yfT0_bzU0rsVJ7mtZvP4YYY-Og9_u7QbDgpk1fohYsu8NCi4jXqqeoNeukiDezs-OItum5B8gU7iOAWIriDCI7G2EIEtxDBABFsIYINRPBDiOyhH19PJqNvvhuz4QuIvWtfFqHuwpZSlhAWhByMeMxESgoZiDQQUiawHiICP45SwdOAs5SKIIuighWEJ2G0j_rVrFIHCMtChRlRkYRtNE4lIYTC8kKMzHiQ8oQeos_NWuVz200lf1I7hyhuVjN37qB183JAxtOPvXvOf7xHux1eP6B-fb9UH9GOWNXl4v6Tg8VvrCyBJA |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SparseDet%3A+Towards+efficient+multi-view+3D+object+detection+via+sparse+scene+representation&rft.jtitle=Advanced+engineering+informatics&rft.au=Li%2C+Jingzhong&rft.au=Yang%2C+Lin&rft.au=Shi%2C+Zhen&rft.au=Chen%2C+Yuxuan&rft.date=2024-10-01&rft.issn=1474-0346&rft.volume=62&rft.spage=102955&rft_id=info:doi/10.1016%2Fj.aei.2024.102955&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_aei_2024_102955 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1474-0346&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1474-0346&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1474-0346&client=summon |