PSHead: 3D Head Reconstruction from a Single Image with Diffusion Prior and Self‐Enhancement
Text‐to‐3D avatar generation has shown that diffusion models trained on general objects can capture head structure. However, image‐to‐3D avatar that creates a high‐fidelity 3D avatar from a single image remains challenging due to additional constraints. It requires recovering a detailed 3D represent...
Saved in:
| Published in: | Computer graphics forum |
|---|---|
| Main Authors: | , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
01.10.2025
|
| ISSN: | 0167-7055, 1467-8659 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Text‐to‐3D avatar generation has shown that diffusion models trained on general objects can capture head structure. However, image‐to‐3D avatar that creates a high‐fidelity 3D avatar from a single image remains challenging due to additional constraints. It requires recovering a detailed 3D representation from limited cues while capturing complex facial features like wrinkles and hair. To address these challenges, we introduce PSHead, a coarse‐to‐fine framework guided by both object and face priors, to produce a Gaussian‐based 3D avatar for a single frontal‐view reference image. In the coarse stage, we create an initial 3D representation by applying diffusion models trained for general object generation, using Score Distillation Sampling losses over novel views. This approach marks the first integration of text‐to‐image, image‐to‐image, and text‐to‐video diffusion priors, with insights into each module's contribution to learning a 3D representation. In the fine stage, we refine this representation with pretrained face generation models, which denoise rendered images and use these refined outputs as supervision to further improve 3D detail fidelity. Leveraging the versatility of 2D objects prior, PSHead is robust across various different face framings. Our method outperforms existing approaches on in‐the‐wild images, proving its robustness and ability to capture intricate details without the need for extensive 3D supervision. |
|---|---|
| ISSN: | 0167-7055 1467-8659 |
| DOI: | 10.1111/cgf.70279 |