PonderV2: Improved 3D Representation With a Universal Pre-Training Paradigm
In contrast to numerous NLP and 2D vision foundational models, training a 3D foundational model poses considerably greater challenges. This is primarily due to the inherent data variability and diversity of downstream tasks. In this paper, we introduce a novel universal 3D pre-training framework des...
Gespeichert in:
| Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence Jg. 47; H. 8; S. 6550 - 6565 |
|---|---|
| Hauptverfasser: | , , , , , , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
United States
IEEE
01.08.2025
|
| Schlagworte: | |
| ISSN: | 0162-8828, 1939-3539, 2160-9292, 1939-3539 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | In contrast to numerous NLP and 2D vision foundational models, training a 3D foundational model poses considerably greater challenges. This is primarily due to the inherent data variability and diversity of downstream tasks. In this paper, we introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representations. Considering that informative 3D features should encode rich geometry and appearance cues that can be utilized to render realistic images, we propose to learn 3D representations by differentiable neural rendering. We train a 3D backbone with a volumetric neural renderer by comparing the rendered with the real images. Notably, our pre-trained encoder can be seamlessly applied to various downstream tasks. These tasks include semantic challenges like 3D detection and segmentation, which involve scene understanding, and non-semantic tasks like 3D reconstruction and image synthesis, which focus on geometry and visuals. They span both indoor and outdoor scenarios. We also illustrate the capability of pre-training a 2D backbone using the proposed methodology, surpassing conventional pre-training methods by a large margin. For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness. |
|---|---|
| Bibliographie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 0162-8828 1939-3539 2160-9292 1939-3539 |
| DOI: | 10.1109/TPAMI.2025.3561598 |