SLAG: Scalable Language-Augmented Gaussian Splatting

Language-augmented scene representations hold great promise for large-scale robotics applications such as search-and-rescue, smart cities, and mining. Many of these scenarios are time-sensitive, requiring rapid scene encoding while also being data-intensive, necessitating scalable solutions. Deployi...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE robotics and automation letters Ročník 10; číslo 7; s. 6991 - 6998
Hlavní autoři:	Szilagyi, Laszlo, Engelmann, Francis, Bohg, Jeannette
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Piscataway IEEE 01.07.2025 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:	big data in robotics and automation Cameras Coding deep learning for visual perception Embedding Graphics processing units Image reconstruction Language Neural radiance field Representations Robotics Robots Scalability Semantic scene understanding Semantics Slag software architecture for robotics and automation Three-dimensional displays Vectors
ISSN:	2377-3766, 2377-3766
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Language-augmented scene representations hold great promise for large-scale robotics applications such as search-and-rescue, smart cities, and mining. Many of these scenarios are time-sensitive, requiring rapid scene encoding while also being data-intensive, necessitating scalable solutions. Deploying these representations on robots with limited computational resources further adds to the challenge. To address this, we introduce SLAG, a multi-GPU framework for language-augmented Gaussian splatting that enhances the speed and scalability of embedding large scenes. Our method integrates 2D visual-language model features into 3D scenes using SAM (Kirillov et al., 2023) and CLIP (Radford et al., 2021). Unlike prior approaches, SLAG eliminates the need for a loss function to compute per-Gaussian language embeddings. Instead, it derives embeddings from 3D Gaussian scene parameters via a normalized weighted average, enabling highly parallelized scene encoding. Additionally, we introduce a vector database for efficient embedding storage and retrieval. Our experiments show that SLAG achieves an 18× speedup in embedding computation on a 16-GPU setup compared to OpenGaussian (Wu et al., 2024), while preserving embedding quality on the ScanNet (Dai et al., 2017) and LERF (Kerr et al., 2023) datasets.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2377-3766 2377-3766
DOI:	10.1109/LRA.2025.3573203