Learning to Recover 3D Scene Shape from a Single Image

Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due to an unknown depth shift induced by shift-invariant reconstruction losses used in mixed-data depth prediction training, and possible unknown...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) s. 204 - 213
Hlavní autoři:	Yin, Wei, Zhang, Jianming, Wang, Oliver, Niklaus, Simon, Mai, Long, Chen, Simon, Shen, Chunhua
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 01.01.2021
Témata:	Estimation Geometry Predictive models Reconstruction algorithms Shape Three-dimensional displays Training
ISSN:	1063-6919
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due to an unknown depth shift induced by shift-invariant reconstruction losses used in mixed-data depth prediction training, and possible unknown camera focal length. We investigate this problem in detail, and propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape. In addition, we propose an image-level normalized regression loss and a normal-based geometry loss to enhance depth prediction models trained on mixed datasets. We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot dataset generalization. Code is available at: https://git.io/Depth
ISSN:	1063-6919
DOI:	10.1109/CVPR46437.2021.00027