V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians

Uložené v:
Podrobná bibliografia
Názov: V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians
Autori: Penghao Wang, Zhirui Zhang, Liao Wang, Kaixin Yao, Siyuan Xie, Jingyi Yu, Minye Wu, Lan Xu
Zdroj: ACM Transactions on Graphics. 43:1-13
Publication Status: Preprint
Informácie o vydavateľovi: Association for Computing Machinery (ACM), 2024.
Rok vydania: 2024
Predmety: volumetric video, FOS: Computer and information sciences, Technology, REPRESENTATION, Science & Technology, human performance, Computer Vision and Pattern Recognition (cs.CV), RADIANCE FIELDS, Computer Science - Computer Vision and Pattern Recognition, Software Engineering, 02 engineering and technology, 4607 Graphics, augmented reality and games, Computer Science, Software Engineering, mobile rendering, Graphics (cs.GR), Computer Science - Graphics, 0806 Information Systems, Computer Science, 0801 Artificial Intelligence and Image Processing, 0202 electrical engineering, electronic engineering, information engineering, Neural rendering, 3D Gaussian Splatting
Popis: Experiencing high-fidelity volumetric video as seamlessly as 2D videos is a long-held dream. However, current dynamic 3DGS methods, despite their high rendering quality, face challenges in streaming on mobile devices due to computational and bandwidth constraints. In this paper, we introduce V 3 (Viewing Volumetric Videos), a novel approach that enables high-quality mobile rendering through the streaming of dynamic Gaussians. Our key innovation is to view dynamic 3DGS as 2D videos, facilitating the use of hardware video codecs. Additionally, we propose a two-stage training strategy to reduce storage requirements with rapid training speed. The first stage employs hash encoding and shallow MLP to learn motion, then reduces the number of Gaussians through pruning to meet the streaming requirements, while the second stage fine tunes other Gaussian attributes using residual entropy loss and temporal loss to improve temporal continuity. This strategy, which disentangles motion and appearance, maintains high rendering quality with compact storage requirements. Meanwhile, we designed a multi-platform player to decode and render 2D Gaussian videos. Extensive experiments demonstrate the effectiveness of V 3 , outperforming other methods by enabling high-quality rendering and streaming on common devices, which is unseen before. As the first to stream dynamic Gaussians on mobile devices, our companion player offers users an unprecedented volumetric video experience, including smooth scrolling and instant sharing. Our project page with source code is available at https://authoritywang.github.io/v3/.
Druh dokumentu: Article
Jazyk: English
ISSN: 1557-7368
0730-0301
DOI: 10.1145/3687935
DOI: 10.48550/arxiv.2409.13648
Prístupová URL adresa: http://arxiv.org/abs/2409.13648
https://lirias.kuleuven.be/handle/20.500.12942/759455
https://doi.org/10.1145/3687935
Rights: CC BY
arXiv Non-Exclusive Distribution
Prístupové číslo: edsair.doi.dedup.....c22365979b7e34a61a68723add1c2d56
Databáza: OpenAIRE
Popis
Abstrakt:Experiencing high-fidelity volumetric video as seamlessly as 2D videos is a long-held dream. However, current dynamic 3DGS methods, despite their high rendering quality, face challenges in streaming on mobile devices due to computational and bandwidth constraints. In this paper, we introduce V 3 (Viewing Volumetric Videos), a novel approach that enables high-quality mobile rendering through the streaming of dynamic Gaussians. Our key innovation is to view dynamic 3DGS as 2D videos, facilitating the use of hardware video codecs. Additionally, we propose a two-stage training strategy to reduce storage requirements with rapid training speed. The first stage employs hash encoding and shallow MLP to learn motion, then reduces the number of Gaussians through pruning to meet the streaming requirements, while the second stage fine tunes other Gaussian attributes using residual entropy loss and temporal loss to improve temporal continuity. This strategy, which disentangles motion and appearance, maintains high rendering quality with compact storage requirements. Meanwhile, we designed a multi-platform player to decode and render 2D Gaussian videos. Extensive experiments demonstrate the effectiveness of V 3 , outperforming other methods by enabling high-quality rendering and streaming on common devices, which is unseen before. As the first to stream dynamic Gaussians on mobile devices, our companion player offers users an unprecedented volumetric video experience, including smooth scrolling and instant sharing. Our project page with source code is available at https://authoritywang.github.io/v3/.
ISSN:15577368
07300301
DOI:10.1145/3687935