SLAG: Scalable Language-Augmented Gaussian Splatting

Language-augmented scene representations hold great promise for large-scale robotics applications such as search-and-rescue, smart cities, and mining. Many of these scenarios are time-sensitive, requiring rapid scene encoding while also being data-intensive, necessitating scalable solutions. Deployi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE robotics and automation letters Jg. 10; H. 7; S. 6991 - 6998
Hauptverfasser:	Szilagyi, Laszlo, Engelmann, Francis, Bohg, Jeannette
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Piscataway IEEE 01.07.2025 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:	big data in robotics and automation Cameras Coding deep learning for visual perception Embedding Graphics processing units Image reconstruction Language Neural radiance field Representations Robotics Robots Scalability Semantic scene understanding Semantics Slag software architecture for robotics and automation Three-dimensional displays Vectors
ISSN:	2377-3766, 2377-3766
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Language-augmented scene representations hold great promise for large-scale robotics applications such as search-and-rescue, smart cities, and mining. Many of these scenarios are time-sensitive, requiring rapid scene encoding while also being data-intensive, necessitating scalable solutions. Deploying these representations on robots with limited computational resources further adds to the challenge. To address this, we introduce SLAG, a multi-GPU framework for language-augmented Gaussian splatting that enhances the speed and scalability of embedding large scenes. Our method integrates 2D visual-language model features into 3D scenes using SAM (Kirillov et al., 2023) and CLIP (Radford et al., 2021). Unlike prior approaches, SLAG eliminates the need for a loss function to compute per-Gaussian language embeddings. Instead, it derives embeddings from 3D Gaussian scene parameters via a normalized weighted average, enabling highly parallelized scene encoding. Additionally, we introduce a vector database for efficient embedding storage and retrieval. Our experiments show that SLAG achieves an 18× speedup in embedding computation on a 16-GPU setup compared to OpenGaussian (Wu et al., 2024), while preserving embedding quality on the ScanNet (Dai et al., 2017) and LERF (Kerr et al., 2023) datasets.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2377-3766 2377-3766
DOI:	10.1109/LRA.2025.3573203