Real-Time 3D Visual Perception by Cross-Dimensional Refined Learning

We introduce a novel learning method that can effectively perceive both the geometry structure and semantic labels of a 3D scene in real time. Existing real-time 3D scene reconstruction approaches often rely on volumetric schemes to regress a Truncated Signed Distance Function (TSDF) as the 3D repre...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE transactions on circuits and systems for video technology Ročník 34; číslo 10; s. 10326 - 10338
Hlavní autoři:	Hong, Ziyang, Patrick Yue, C.
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	New York IEEE 01.10.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:	3D perception 3D reconstruction 3D semantic segmentation deep learning Feature extraction Image reconstruction Image segmentation Industrial applications Labels Learning Monocular vision Neural networks Real time Real-time systems Semantic segmentation Semantics Sensors Space perception Three-dimensional displays Visual perception
ISSN:	1051-8215, 1558-2205
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	We introduce a novel learning method that can effectively perceive both the geometry structure and semantic labels of a 3D scene in real time. Existing real-time 3D scene reconstruction approaches often rely on volumetric schemes to regress a Truncated Signed Distance Function (TSDF) as the 3D representation. However, these volumetric approaches primarily focus on ensuring global coherence in the reconstructed scene, which often results in a lack of local geometric detail. To address this limitation, we propose a solution that leverages the latent geometric knowledge present in 2D image features by explicit depth prediction thereby creating anchored features, which are used to refine the learning of occupancy in the TSDF volume. Furthermore, we discover that this cross-dimensional feature refinement methodology can also be applied to the task of semantic segmentation by utilizing semantic priors. As a result, we propose an end-to-end cross-dimensional refinement neural network (CDRNet) that can extract both the 3D mesh and 3D semantic labeling of a scene in real time. Through experimental evaluation on multiple datasets, we demonstrate that our method achieves state-of-the-art 3D perception capability by boosting over 40% and 18% in 3D semantic segmentation and geometric reconstruction respectively over the prior art. These promising results indicate the significant potential of our approach for various industrial applications. Demo video and code can be found on the project page, https://hafred.github.io/cdrnet/ .
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2024.3406401