Learning to Recover 3D Scene Shape from a Single Image

Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due to an unknown depth shift induced by shift-invariant reconstruction losses used in mixed-data depth prediction training, and possible unknown...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) s. 204 - 213
Hlavní autoři: Yin, Wei, Zhang, Jianming, Wang, Oliver, Niklaus, Simon, Mai, Long, Chen, Simon, Shen, Chunhua
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.01.2021
Témata:
ISSN:1063-6919
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due to an unknown depth shift induced by shift-invariant reconstruction losses used in mixed-data depth prediction training, and possible unknown camera focal length. We investigate this problem in detail, and propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape. In addition, we propose an image-level normalized regression loss and a normal-based geometry loss to enhance depth prediction models trained on mixed datasets. We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot dataset generalization. Code is available at: https://git.io/Depth
AbstractList Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due to an unknown depth shift induced by shift-invariant reconstruction losses used in mixed-data depth prediction training, and possible unknown camera focal length. We investigate this problem in detail, and propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape. In addition, we propose an image-level normalized regression loss and a normal-based geometry loss to enhance depth prediction models trained on mixed datasets. We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot dataset generalization. Code is available at: https://git.io/Depth
Author Niklaus, Simon
Shen, Chunhua
Zhang, Jianming
Wang, Oliver
Chen, Simon
Mai, Long
Yin, Wei
Author_xml – sequence: 1
  givenname: Wei
  surname: Yin
  fullname: Yin, Wei
  organization: The University of Adelaide,Australia
– sequence: 2
  givenname: Jianming
  surname: Zhang
  fullname: Zhang, Jianming
  organization: Adobe Research
– sequence: 3
  givenname: Oliver
  surname: Wang
  fullname: Wang, Oliver
  organization: Adobe Research
– sequence: 4
  givenname: Simon
  surname: Niklaus
  fullname: Niklaus, Simon
  organization: Adobe Research
– sequence: 5
  givenname: Long
  surname: Mai
  fullname: Mai, Long
  organization: Adobe Research
– sequence: 6
  givenname: Simon
  surname: Chen
  fullname: Chen, Simon
  organization: Adobe Research
– sequence: 7
  givenname: Chunhua
  surname: Shen
  fullname: Shen, Chunhua
  organization: The University of Adelaide,Australia
BookMark eNotj9tKw0AURUdRsK39An2YH0idM9ecR4lWCwGlUV_LXE5qpElKUgT_3oA-bTYs1mbP2UXXd8TYLYgVgMC74uN1q61WbiWFhJUQQrozNgdrjdZGoDxnMxBWZRYBr9hyHL8mRkkAi_mM2ZL80DXdnp96vqXYf9PA1QOvInXEq09_JF4Pfcs9rybqQHzT-j1ds8vaH0Za_ueCva8f34rnrHx52hT3ZdZIi6fMmbxOWsVayeQgTbMpGgOSXNDBoQ6Yam9yxDhViMGHPMqEoMN0SQSrFuzmz9sQ0e44NK0ffnZonNNKql8EpUa6
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR46437.2021.00027
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 1665445092
9781665445092
EISSN 1063-6919
EndPage 213
ExternalDocumentID 9577432
Genre orig-research
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i269t-758fd43cf32d71d032dc5512e7b4b794b9dfa5899c4b71cbab8c2d914b4640b63
IEDL.DBID RIE
ISICitedReferencesCount 121
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000739917300021&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:24:15 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i269t-758fd43cf32d71d032dc5512e7b4b794b9dfa5899c4b71cbab8c2d914b4640b63
PageCount 10
ParticipantIDs ieee_primary_9577432
PublicationCentury 2000
PublicationDate 2021-01-01
PublicationDateYYYYMMDD 2021-01-01
PublicationDate_xml – month: 01
  year: 2021
  text: 2021-01-01
  day: 01
PublicationDecade 2020
PublicationTitle Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev CVPR
PublicationYear 2021
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003211698
Score 2.597798
Snippet Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due...
SourceID ieee
SourceType Publisher
StartPage 204
SubjectTerms Estimation
Geometry
Predictive models
Reconstruction algorithms
Shape
Three-dimensional displays
Training
Title Learning to Recover 3D Scene Shape from a Single Image
URI https://ieeexplore.ieee.org/document/9577432
WOSCitedRecordID wos000739917300021&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5VPHiq2opvcvBo2s1jHzlXi4KUYlV6K3lMtKBtabf-fifbpSJ48bZZEpad3Uy-byZfhpBr50GAKDKmhQGmggVmwAVW8MQ7nVrOfVW15DEfDIrxWA8b5GarhQGAavMZdOJllcv3c7eOobKuThGsSHS4O3meb7Ra23iKRCaT6aJWx_FEd3uvwycV81LIAgXvVEm2XzVUqiWk3_zfww9I-0eLR4fbVeaQNGB2RJo1eKT11Fy1SFaflPpGyzmNpBL_USpvsQd6Mzp6NwugUUtCDR1hrw-gD5_oS9rkpX_33LtndVEENhWZLhni--CVdEEKn3OP7-0doh4BuVUWJ5fVPpgUWZTDJnfW2MIJr7myaIvEZvKY7M7mMzghNDNKQgEhIOdQkEqjuE1wMCIEzrWCU9KKZpgsNudeTGoLnP19-5zsRztvwhMXZLdcruGS7LmvcrpaXlUf6xsA7pRz
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8JAEJ4QNdETKhjf7sGjBfbR0j2jBCISImi4kX1MlUSBQPH3Oy0NxsSLt26zm6bT7uz3zey3A3DrPAoUcRRoYTBQicXAoEuCmDe806Hl3OdVS3rNfj8ej_WgBHdbLQwi5pvPsJZd5rl8P3frLFRW1yGBFUkOdzdUSvCNWmsbUZHEZSIdF_o43tD11uvgWWWZKeKBgtfyNNuvKir5ItIu_-_xh1D9UeOxwXadOYISzo6hXMBHVkzOVQWi4qzUN5bOWUYr6S9l8p56kD9jw3ezQJapSZhhQ-r1gaz7Sd6kCi_th1GrExRlEYKpiHQaEMJPvJIukcI3uaf39o5wj8CmVZaml9U-MSHxKEdN7qyxsRNec2XJFg0byRPYmc1neAosMkpijElCrENhKI3itkGDCSNwrhWeQSUzw2SxOfliUljg_O_bN7DfGT31Jr1u__ECDjKbb4IVl7CTLtd4BXvuK52ultf5h_sG7wKXug
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=Learning+to+Recover+3D+Scene+Shape+from+a+Single+Image&rft.au=Yin%2C+Wei&rft.au=Zhang%2C+Jianming&rft.au=Wang%2C+Oliver&rft.au=Niklaus%2C+Simon&rft.date=2021-01-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=204&rft.epage=213&rft_id=info:doi/10.1109%2FCVPR46437.2021.00027&rft.externalDocID=9577432