An efficient encoder–decoder model for portrait depth estimation from single images trained on pixel-accurate synthetic data

Depth estimation from a single image frame is a fundamental challenge in computer vision, with many applications such as augmented reality, action recognition, image understanding, and autonomous driving. Large and diverse training sets are required for accurate depth estimation from a single image...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Neural networks Ročník 142; s. 479 - 491
Hlavní autoři: Khan, Faisal, Hussain, Shahid, Basak, Shubhajit, Lemley, Joseph, Corcoran, Peter
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Ltd 01.10.2021
Témata:
ISSN:0893-6080, 1879-2782, 1879-2782
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Depth estimation from a single image frame is a fundamental challenge in computer vision, with many applications such as augmented reality, action recognition, image understanding, and autonomous driving. Large and diverse training sets are required for accurate depth estimation from a single image frame. Due to challenges in obtaining dense ground-truth depth, a new 3D pipeline of 100 synthetic virtual human models is presented to generate multiple 2D facial images and corresponding ground truth depth data, allowing complete control over image variations. To validate the synthetic facial depth data, we propose an evaluation of state-of-the-art depth estimation algorithms based on single image frames on the generated synthetic dataset. Furthermore, an improved encoder–decoder based neural network is presented. This network is computationally efficient and shows better performance than current state-of-the-art when tested and evaluated across 4 public datasets. Our training methodology relies on the use of synthetic data samples which provides a more reliable ground truth for depth estimation. Additionally, using a combination of appropriate loss functions leads to improved performance than the current state-of-the-art network performances. Our approach clearly outperforms competing methods across different test datasets, setting a new state-of-the-art for facial depth estimation from synthetic data.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0893-6080
1879-2782
1879-2782
DOI:10.1016/j.neunet.2021.07.007