Magic3D: High-Resolution Text-to-3D Content Creation

DreamFusion [31] has recently demonstrated the utility of a pretrained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF) [23], achieving remarkable text-to-3D synthesis results. However, the method has two inherent limitations: (a) extremely slow optimization of NeRF and (b) lo...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) s. 300 - 309
Hlavní autori:	Lin, Chen-Hsuan, Gao, Jun, Tang, Luming, Takikawa, Towaki, Zeng, Xiaohui, Huang, Xun, Kreis, Karsten, Fidler, Sanja, Liu, Ming-Yu, Lin, Tsung-Yi
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	IEEE 01.06.2023
Predmet:	Computer vision Graphics Image resolution Optimization Pattern recognition Solid modeling Three-dimensional displays Vision + graphics
ISSN:	1063-6919
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	DreamFusion [31] has recently demonstrated the utility of a pretrained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF) [23], achieving remarkable text-to-3D synthesis results. However, the method has two inherent limitations: (a) extremely slow optimization of NeRF and (b) low-resolution image space supervision on NeRF, leading to low-quality 3D models with a long processing time. In this paper, we address these limitations by utilizing a two-stage optimization framework. First, we obtain a coarse model using a low-resolution diffusion prior and accelerate with a sparse 3D hash grid structure. Using the coarse representation as the initialization, we further optimize a textured 3D mesh model with an efficient differentiable renderer interacting with a high-resolution latent diffusion model. Our method, dubbed Magic3D, can create high quality 3D mesh models in 40 minutes, which is 2× faster than DreamFusion (reportedly taking 1.5 hours on average), while also achieving higher resolution. User studies show 61.7% raters to prefer our approach over DreamFusion. Together with the image-conditioned generation capabilities, we provide users with new ways to control 3D synthesis, opening up new avenues to various creative applications.
ISSN:	1063-6919
DOI:	10.1109/CVPR52729.2023.00037