LVP: Leverage Virtual Points in Multimodal Early Fusion for 3-D Object Detection

Due to the sparsity and occlusion of point clouds, pure point cloud detection has limited effectiveness in detecting such samples. Researchers have been actively exploring the fusion of multimodal data, attempting to address the bottleneck issue based on LiDAR. In particular, virtual points, generat...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on geoscience and remote sensing Ročník 63; s. 1 - 15
Hlavní autoři: Chen, Yidong, Cai, Guorong, Song, Ziying, Liu, Zhaoliang, Zeng, Binghui, Li, Jonathan, Wang, Zongyue
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:0196-2892, 1558-0644
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Due to the sparsity and occlusion of point clouds, pure point cloud detection has limited effectiveness in detecting such samples. Researchers have been actively exploring the fusion of multimodal data, attempting to address the bottleneck issue based on LiDAR. In particular, virtual points, generated through depth completion from front-view RGB image, offer the potential for better integration with point clouds. Nevertheless, recent approaches fuse these two modalities in the region of interest (RoI), which limits the fusion effectiveness due to the inaccurate RoI region issue in the point cloud's branch, especially in hard samples. To overcome it and unleash the potential of virtual points, while combining late fusion, we present leverage virtual point (LVP), a high-performance 3-D object detector which LVPs in early fusion to enhance the quality of RoI generation. LVP consists of three early fusion modules: virtual points painting (VPP), virtual points auxiliary (VPA), and virtual points completion (VPC) to achieve point-level fusion and global-level fusion. The integration of these modules effectively improves occlusion handling and improves the detection of distant small objects. In the KITTI benchmark, LVP achieves 85.45% 3-D mAP. As for large dataset nuScenes, we could improve the detection accuracy of large objects by compensating for errors in depth estimation. Without whistles and bells, these results establish LVP as an impressive solution for a 3-D outdoor object detection algorithm.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0196-2892
1558-0644
DOI:10.1109/TGRS.2024.3519386