Magic3D: High-Resolution Text-to-3D Content Creation

DreamFusion [31] has recently demonstrated the utility of a pretrained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF) [23], achieving remarkable text-to-3D synthesis results. However, the method has two inherent limitations: (a) extremely slow optimization of NeRF and (b) lo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) S. 300 - 309
Hauptverfasser:	Lin, Chen-Hsuan, Gao, Jun, Tang, Luming, Takikawa, Towaki, Zeng, Xiaohui, Huang, Xun, Kreis, Karsten, Fidler, Sanja, Liu, Ming-Yu, Lin, Tsung-Yi
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	IEEE 01.06.2023
Schlagworte:	Computer vision Graphics Image resolution Optimization Pattern recognition Solid modeling Three-dimensional displays Vision + graphics
ISSN:	1063-6919
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	DreamFusion [31] has recently demonstrated the utility of a pretrained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF) [23], achieving remarkable text-to-3D synthesis results. However, the method has two inherent limitations: (a) extremely slow optimization of NeRF and (b) low-resolution image space supervision on NeRF, leading to low-quality 3D models with a long processing time. In this paper, we address these limitations by utilizing a two-stage optimization framework. First, we obtain a coarse model using a low-resolution diffusion prior and accelerate with a sparse 3D hash grid structure. Using the coarse representation as the initialization, we further optimize a textured 3D mesh model with an efficient differentiable renderer interacting with a high-resolution latent diffusion model. Our method, dubbed Magic3D, can create high quality 3D mesh models in 40 minutes, which is 2× faster than DreamFusion (reportedly taking 1.5 hours on average), while also achieving higher resolution. User studies show 61.7% raters to prefer our approach over DreamFusion. Together with the image-conditioned generation capabilities, we provide users with new ways to control 3D synthesis, opening up new avenues to various creative applications.
ISSN:	1063-6919
DOI:	10.1109/CVPR52729.2023.00037