PSHead: 3D Head Reconstruction from a Single Image with Diffusion Prior and Self‐Enhancement

Text‐to‐3D avatar generation has shown that diffusion models trained on general objects can capture head structure. However, image‐to‐3D avatar that creates a high‐fidelity 3D avatar from a single image remains challenging due to additional constraints. It requires recovering a detailed 3D represent...

Full description

Saved in:
Bibliographic Details
Published in:Computer graphics forum
Main Authors: Yang, Jing, Wu, Tianhan, Fogarty, Kyle, Zhong, Fangcheng, Oztireli, Cengiz
Format: Journal Article
Language:English
Published: 01.10.2025
ISSN:0167-7055, 1467-8659
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Text‐to‐3D avatar generation has shown that diffusion models trained on general objects can capture head structure. However, image‐to‐3D avatar that creates a high‐fidelity 3D avatar from a single image remains challenging due to additional constraints. It requires recovering a detailed 3D representation from limited cues while capturing complex facial features like wrinkles and hair. To address these challenges, we introduce PSHead, a coarse‐to‐fine framework guided by both object and face priors, to produce a Gaussian‐based 3D avatar for a single frontal‐view reference image. In the coarse stage, we create an initial 3D representation by applying diffusion models trained for general object generation, using Score Distillation Sampling losses over novel views. This approach marks the first integration of text‐to‐image, image‐to‐image, and text‐to‐video diffusion priors, with insights into each module's contribution to learning a 3D representation. In the fine stage, we refine this representation with pretrained face generation models, which denoise rendered images and use these refined outputs as supervision to further improve 3D detail fidelity. Leveraging the versatility of 2D objects prior, PSHead is robust across various different face framings. Our method outperforms existing approaches on in‐the‐wild images, proving its robustness and ability to capture intricate details without the need for extensive 3D supervision.
ISSN:0167-7055
1467-8659
DOI:10.1111/cgf.70279