LVP: Leverage Virtual Points in Multimodal Early Fusion for 3-D Object Detection

Due to the sparsity and occlusion of point clouds, pure point cloud detection has limited effectiveness in detecting such samples. Researchers have been actively exploring the fusion of multimodal data, attempting to address the bottleneck issue based on LiDAR. In particular, virtual points, generat...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on geoscience and remote sensing Vol. 63; pp. 1 - 15
Main Authors: Chen, Yidong, Cai, Guorong, Song, Ziying, Liu, Zhaoliang, Zeng, Binghui, Li, Jonathan, Wang, Zongyue
Format: Journal Article
Language:English
Published: New York IEEE 2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:0196-2892, 1558-0644
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Due to the sparsity and occlusion of point clouds, pure point cloud detection has limited effectiveness in detecting such samples. Researchers have been actively exploring the fusion of multimodal data, attempting to address the bottleneck issue based on LiDAR. In particular, virtual points, generated through depth completion from front-view RGB image, offer the potential for better integration with point clouds. Nevertheless, recent approaches fuse these two modalities in the region of interest (RoI), which limits the fusion effectiveness due to the inaccurate RoI region issue in the point cloud's branch, especially in hard samples. To overcome it and unleash the potential of virtual points, while combining late fusion, we present leverage virtual point (LVP), a high-performance 3-D object detector which LVPs in early fusion to enhance the quality of RoI generation. LVP consists of three early fusion modules: virtual points painting (VPP), virtual points auxiliary (VPA), and virtual points completion (VPC) to achieve point-level fusion and global-level fusion. The integration of these modules effectively improves occlusion handling and improves the detection of distant small objects. In the KITTI benchmark, LVP achieves 85.45% 3-D mAP. As for large dataset nuScenes, we could improve the detection accuracy of large objects by compensating for errors in depth estimation. Without whistles and bells, these results establish LVP as an impressive solution for a 3-D outdoor object detection algorithm.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0196-2892
1558-0644
DOI:10.1109/TGRS.2024.3519386