Learning to Recover 3D Scene Shape from a Single Image

Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due to an unknown depth shift induced by shift-invariant reconstruction losses used in mixed-data depth prediction training, and possible unknown...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) s. 204 - 213
Hlavní autoři:	Yin, Wei, Zhang, Jianming, Wang, Oliver, Niklaus, Simon, Mai, Long, Chen, Simon, Shen, Chunhua
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 01.01.2021
Témata:	Estimation Geometry Predictive models Reconstruction algorithms Shape Three-dimensional displays Training
ISSN:	1063-6919
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due to an unknown depth shift induced by shift-invariant reconstruction losses used in mixed-data depth prediction training, and possible unknown camera focal length. We investigate this problem in detail, and propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape. In addition, we propose an image-level normalized regression loss and a normal-based geometry loss to enhance depth prediction models trained on mixed datasets. We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot dataset generalization. Code is available at: https://git.io/Depth
AbstractList	Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due to an unknown depth shift induced by shift-invariant reconstruction losses used in mixed-data depth prediction training, and possible unknown camera focal length. We investigate this problem in detail, and propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape. In addition, we propose an image-level normalized regression loss and a normal-based geometry loss to enhance depth prediction models trained on mixed datasets. We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot dataset generalization. Code is available at: https://git.io/Depth
Author	Niklaus, Simon Shen, Chunhua Zhang, Jianming Wang, Oliver Chen, Simon Mai, Long Yin, Wei
Author_xml	– sequence: 1 givenname: Wei surname: Yin fullname: Yin, Wei organization: The University of Adelaide,Australia – sequence: 2 givenname: Jianming surname: Zhang fullname: Zhang, Jianming organization: Adobe Research – sequence: 3 givenname: Oliver surname: Wang fullname: Wang, Oliver organization: Adobe Research – sequence: 4 givenname: Simon surname: Niklaus fullname: Niklaus, Simon organization: Adobe Research – sequence: 5 givenname: Long surname: Mai fullname: Mai, Long organization: Adobe Research – sequence: 6 givenname: Simon surname: Chen fullname: Chen, Simon organization: Adobe Research – sequence: 7 givenname: Chunhua surname: Shen fullname: Shen, Chunhua organization: The University of Adelaide,Australia
BookMark	eNotj9tKw0AURUdRsK39An2YH0idM9ecR4lWCwGlUV_LXE5qpElKUgT_3oA-bTYs1mbP2UXXd8TYLYgVgMC74uN1q61WbiWFhJUQQrozNgdrjdZGoDxnMxBWZRYBr9hyHL8mRkkAi_mM2ZL80DXdnp96vqXYf9PA1QOvInXEq09_JF4Pfcs9rybqQHzT-j1ds8vaH0Za_ueCva8f34rnrHx52hT3ZdZIi6fMmbxOWsVayeQgTbMpGgOSXNDBoQ6Yam9yxDhViMGHPMqEoMN0SQSrFuzmz9sQ0e44NK0ffnZonNNKql8EpUa6
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/CVPR46437.2021.00027
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences
EISBN	1665445092 9781665445092
EISSN	1063-6919
EndPage	213
ExternalDocumentID	9577432
Genre	orig-research
GroupedDBID	6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO
ID	FETCH-LOGICAL-i269t-758fd43cf32d71d032dc5512e7b4b794b9dfa5899c4b71cbab8c2d914b4640b63
IEDL.DBID	RIE
ISICitedReferencesCount	121
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000739917300021&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Aug 27 02:24:15 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i269t-758fd43cf32d71d032dc5512e7b4b794b9dfa5899c4b71cbab8c2d914b4640b63
PageCount	10
ParticipantIDs	ieee_primary_9577432
PublicationCentury	2000
PublicationDate	2021-01-01
PublicationDateYYYYMMDD	2021-01-01
PublicationDate_xml	– month: 01 year: 2021 text: 2021-01-01 day: 01
PublicationDecade	2020
PublicationTitle	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev	CVPR
PublicationYear	2021
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0003211698
Score	2.597798
Snippet	Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due...
SourceID	ieee
SourceType	Publisher
StartPage	204
SubjectTerms	Estimation Geometry Predictive models Reconstruction algorithms Shape Three-dimensional displays Training
Title	Learning to Recover 3D Scene Shape from a Single Image
URI	https://ieeexplore.ieee.org/document/9577432
WOSCitedRecordID	wos000739917300021&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5VPHiq2opvcvBo2s1jHzlXi4KUYlV6K3lMtKBtabf-fifbpSJ48bZZEpad3Uy-byZfhpBr50GAKDKmhQGmggVmwAVW8MQ7nVrOfVW15DEfDIrxWA8b5GarhQGAavMZdOJllcv3c7eOobKuThGsSHS4O3meb7Ra23iKRCaT6aJWx_FEd3uvwycV81LIAgXvVEm2XzVUqiWk3_zfww9I-0eLR4fbVeaQNGB2RJo1eKT11Fy1SFaflPpGyzmNpBL_USpvsQd6Mzp6NwugUUtCDR1hrw-gD5_oS9rkpX_33LtndVEENhWZLhni--CVdEEKn3OP7-0doh4BuVUWJ5fVPpgUWZTDJnfW2MIJr7myaIvEZvKY7M7mMzghNDNKQgEhIOdQkEqjuE1wMCIEzrWCU9KKZpgsNudeTGoLnP19-5zsRztvwhMXZLdcruGS7LmvcrpaXlUf6xsA7pRz
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8JAEJ4QNdETKhjf7sGjBfbR0j2jBCISImi4kX1MlUSBQPH3Oy0NxsSLt26zm6bT7uz3zey3A3DrPAoUcRRoYTBQicXAoEuCmDe806Hl3OdVS3rNfj8ej_WgBHdbLQwi5pvPsJZd5rl8P3frLFRW1yGBFUkOdzdUSvCNWmsbUZHEZSIdF_o43tD11uvgWWWZKeKBgtfyNNuvKir5ItIu_-_xh1D9UeOxwXadOYISzo6hXMBHVkzOVQWi4qzUN5bOWUYr6S9l8p56kD9jw3ezQJapSZhhQ-r1gaz7Sd6kCi_th1GrExRlEYKpiHQaEMJPvJIukcI3uaf39o5wj8CmVZaml9U-MSHxKEdN7qyxsRNec2XJFg0byRPYmc1neAosMkpijElCrENhKI3itkGDCSNwrhWeQSUzw2SxOfliUljg_O_bN7DfGT31Jr1u__ECDjKbb4IVl7CTLtd4BXvuK52ultf5h_sG7wKXug
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=Learning+to+Recover+3D+Scene+Shape+from+a+Single+Image&rft.au=Yin%2C+Wei&rft.au=Zhang%2C+Jianming&rft.au=Wang%2C+Oliver&rft.au=Niklaus%2C+Simon&rft.date=2021-01-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=204&rft.epage=213&rft_id=info:doi/10.1109%2FCVPR46437.2021.00027&rft.externalDocID=9577432