SLAG: Scalable Language-Augmented Gaussian Splatting

Language-augmented scene representations hold great promise for large-scale robotics applications such as search-and-rescue, smart cities, and mining. Many of these scenarios are time-sensitive, requiring rapid scene encoding while also being data-intensive, necessitating scalable solutions. Deployi...

Full description

Saved in:
Bibliographic Details
Published in:IEEE robotics and automation letters Vol. 10; no. 7; pp. 6991 - 6998
Main Authors: Szilagyi, Laszlo, Engelmann, Francis, Bohg, Jeannette
Format: Journal Article
Language:English
Published: Piscataway IEEE 01.07.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:2377-3766, 2377-3766
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Language-augmented scene representations hold great promise for large-scale robotics applications such as search-and-rescue, smart cities, and mining. Many of these scenarios are time-sensitive, requiring rapid scene encoding while also being data-intensive, necessitating scalable solutions. Deploying these representations on robots with limited computational resources further adds to the challenge. To address this, we introduce SLAG, a multi-GPU framework for language-augmented Gaussian splatting that enhances the speed and scalability of embedding large scenes. Our method integrates 2D visual-language model features into 3D scenes using SAM (Kirillov et al., 2023) and CLIP (Radford et al., 2021). Unlike prior approaches, SLAG eliminates the need for a loss function to compute per-Gaussian language embeddings. Instead, it derives embeddings from 3D Gaussian scene parameters via a normalized weighted average, enabling highly parallelized scene encoding. Additionally, we introduce a vector database for efficient embedding storage and retrieval. Our experiments show that SLAG achieves an 18× speedup in embedding computation on a 16-GPU setup compared to OpenGaussian (Wu et al., 2024), while preserving embedding quality on the ScanNet (Dai et al., 2017) and LERF (Kerr et al., 2023) datasets.
AbstractList Language-augmented scene representations hold great promise for large-scale robotics applications such as search-and-rescue, smart cities, and mining. Many of these scenarios are time-sensitive, requiring rapid scene encoding while also being data-intensive, necessitating scalable solutions. Deploying these representations on robots with limited computational resources further adds to the challenge. To address this, we introduce SLAG, a multi-GPU framework for language-augmented Gaussian splatting that enhances the speed and scalability of embedding large scenes. Our method integrates 2D visual-language model features into 3D scenes using SAM (Kirillov et al., 2023) and CLIP (Radford et al., 2021). Unlike prior approaches, SLAG eliminates the need for a loss function to compute per-Gaussian language embeddings. Instead, it derives embeddings from 3D Gaussian scene parameters via a normalized weighted average, enabling highly parallelized scene encoding. Additionally, we introduce a vector database for efficient embedding storage and retrieval. Our experiments show that SLAG achieves an 18× speedup in embedding computation on a 16-GPU setup compared to OpenGaussian (Wu et al., 2024), while preserving embedding quality on the ScanNet (Dai et al., 2017) and LERF (Kerr et al., 2023) datasets.
Author Bohg, Jeannette
Engelmann, Francis
Szilagyi, Laszlo
Author_xml – sequence: 1
  givenname: Laszlo
  orcidid: 0009-0004-3975-4499
  surname: Szilagyi
  fullname: Szilagyi, Laszlo
  email: laszlosz@stanford.edu
  organization: Department of Computer Science, Stanford University, Stanford, CA, USA
– sequence: 2
  givenname: Francis
  orcidid: 0000-0001-5745-2137
  surname: Engelmann
  fullname: Engelmann, Francis
  organization: Department of Computer Science, Stanford University, Stanford, CA, USA
– sequence: 3
  givenname: Jeannette
  orcidid: 0000-0002-4921-7193
  surname: Bohg
  fullname: Bohg, Jeannette
  organization: Department of Computer Science, Stanford University, Stanford, CA, USA
BookMark eNpNkEFLw0AQhRepYK29e_AQ8Jw6M7ubzXoLxVYhIFg9L5t0G1LSTc0mB_-9KS3oad7he2_gu2UT33rH2D3CAhH0U_6RLQhILrhUnIBfsSlxpWKukmTyL9-weQh7AEBJims5ZWKTZ-vnaFPaxhaNi3Lrq8FWLs6G6uB877bR2g4h1NZHm2Nj-7721R273tkmuPnlztjX6uVz-Rrn7-u3ZZbHJQnVxylhCUSqEMppShMrSxBOISlMFC9I6YQg1UUhOCaQbhOdki2k5NyiIyv5jD2ed49d-z240Jt9O3R-fGk4odSSa4EjBWeq7NoQOrczx64-2O7HIJiTHjPqMSc95qJnrDycK7Vz7g9HQEHj4i_pIV5u
CODEN IRALC6
Cites_doi 10.1109/CVPR52733.2024.01895
10.1109/ICCV51070.2023.00125
10.1109/3DV62453.2024.00075
10.1145/3592433
10.1109/CVPR52733.2024.00510
10.1109/WACV61041.2025.00503
10.1109/ICCV51070.2023.01807
10.1109/ICRA48891.2023.10160969
10.1109/LRA.2025.3534523
10.1007/978-3-030-58452-8_24
10.1145/3588432.3591516
10.1007/978-3-031-91989-3_2
10.1007/978-3-031-73397-0_10
10.1109/WACV51458.2022.00036
10.1109/CVPR52733.2024.00463
10.1023/B:VISI.0000022288.19776.77
10.1109/CVPR.2017.261
10.1609/aaai.v39i2.32193
10.1007/978-3-031-72627-9_4
10.1109/ICRA57147.2024.10611725
10.1109/iccv51070.2023.00371
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/LRA.2025.3573203
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2377-3766
EndPage 6998
ExternalDocumentID 10_1109_LRA_2025_3573203
11014241
Genre orig-research
GrantInformation_xml – fundername: SNSF PostDoc. Mobility Fellowship during his stay at Stanford University
GroupedDBID 0R~
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFS
AGQYO
AGSQL
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
IFIPE
IPLJI
JAVBF
KQ8
M43
M~E
O9-
OCL
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c247t-821c0227b47e9286a5c04e71271673b27962089bb431608d6982ab5533a1e2a53
IEDL.DBID RIE
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001502469600008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2377-3766
IngestDate Sat Nov 22 13:40:28 EST 2025
Sat Nov 29 07:51:08 EST 2025
Wed Aug 27 01:52:23 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 7
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c247t-821c0227b47e9286a5c04e71271673b27962089bb431608d6982ab5533a1e2a53
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-4921-7193
0009-0004-3975-4499
0000-0001-5745-2137
PQID 3215953941
PQPubID 4437225
PageCount 8
ParticipantIDs crossref_primary_10_1109_LRA_2025_3573203
ieee_primary_11014241
proquest_journals_3215953941
PublicationCentury 2000
PublicationDate 2025-07-01
PublicationDateYYYYMMDD 2025-07-01
PublicationDate_xml – month: 07
  year: 2025
  text: 2025-07-01
  day: 01
PublicationDecade 2020
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE robotics and automation letters
PublicationTitleAbbrev LRA
PublicationYear 2025
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref15
Engelmann (ref6) 2024
Lemke (ref17) 2024
ref30
Radford (ref2) 2021
ref11
ref33
ref10
ref32
Ji (ref24) 2025
ref1
ref16
Wu (ref3) 2024; 37
Yue (ref14) 2024
Schnberger (ref28) 2016
ref23
Rashid (ref18) 2023
ref25
ref20
ref21
ref27
Ye (ref31) 2023
Turki (ref22) 2022
ref29
ref8
ref7
ref9
Shen (ref19) 2023
Zhang (ref26) 2025
ref4
ref5
References_xml – ident: ref7
  doi: 10.1109/CVPR52733.2024.01895
– volume-title: Proc. 12th Int. Conf. Learn. Representations
  year: 2024
  ident: ref6
  article-title: OpenNerf: Open set 3D neural scene segmentation with pixel-wise features and rendered novel views
– ident: ref13
  doi: 10.1109/ICCV51070.2023.00125
– start-page: 8748
  volume-title: Proc. Int. Conf. Mach. Learn.
  year: 2021
  ident: ref2
  article-title: Learning transferable visual models from natural language supervision
– volume: 37
  start-page: 19114
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  year: 2024
  ident: ref3
  article-title: OpenGaussian: Towards point-level 3D Gaussian-based open vocabulary understanding
– ident: ref23
  doi: 10.1109/3DV62453.2024.00075
– volume-title: Proc. 2nd Workshop Mobile Manipulation Embodied Intell. at ICRA
  year: 2024
  ident: ref17
  article-title: Spot-Compose: A framework for open-vocabulary object retrieval and drawer manipulation in point clouds
– volume-title: Proc. 7th Annu. Conf. Robot Learn.
  year: 2023
  ident: ref18
  article-title: Language embedded radiance fields for zero-shot task-oriented grasping
– ident: ref11
  doi: 10.1145/3592433
– ident: ref8
  doi: 10.1109/CVPR52733.2024.00510
– ident: ref25
  doi: 10.1109/WACV61041.2025.00503
– ident: ref5
  doi: 10.1109/ICCV51070.2023.01807
– ident: ref16
  doi: 10.1109/ICRA48891.2023.10160969
– ident: ref15
  doi: 10.1109/LRA.2025.3534523
– volume-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
  year: 2025
  ident: ref24
  article-title: ARKit LabelMaker: A new scale for indoor 3D scene understanding
– ident: ref10
  doi: 10.1007/978-3-030-58452-8_24
– volume-title: Proc. 7th Annu. Conf. Robot Learn.
  year: 2023
  ident: ref19
  article-title: Distilled feature fields enable few-shot language-guided manipulation
– volume-title: Proc. Int. Conf. Learn. Representations
  year: 2024
  ident: ref14
  article-title: AGILE3D: Attention guided interactive multi-object 3D segmentation
– volume-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
  year: 2025
  ident: ref26
  article-title: Open-vocabulary functional 3D scene graphs for real-world indoor spaces
– year: 2023
  ident: ref31
  article-title: Mathematical supplement for the $\mathtt{gsplat}$ library
– ident: ref33
  doi: 10.1145/3588432.3591516
– start-page: 321
  volume-title: Proc. Comput. Vis.ACCV 13th Asian Conf. Comput. Vis.
  year: 2016
  ident: ref28
  article-title: A vote-and-verify strategy for fast spatial verification in image retrieval
– ident: ref30
  doi: 10.1007/978-3-031-91989-3_2
– ident: ref27
  doi: 10.1007/978-3-031-73397-0_10
– ident: ref29
  doi: 10.1109/WACV51458.2022.00036
– ident: ref12
  doi: 10.1109/CVPR52733.2024.00463
– start-page: 12922
  volume-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
  year: 2022
  ident: ref22
  article-title: Mega-NERF: Scalable construction of large-scale NeRFs
– ident: ref32
  doi: 10.1023/B:VISI.0000022288.19776.77
– ident: ref4
  doi: 10.1109/CVPR.2017.261
– ident: ref21
  doi: 10.1609/aaai.v39i2.32193
– ident: ref9
  doi: 10.1007/978-3-031-72627-9_4
– ident: ref20
  doi: 10.1109/ICRA57147.2024.10611725
– ident: ref1
  doi: 10.1109/iccv51070.2023.00371
SSID ssj0001527395
Score 2.3032544
Snippet Language-augmented scene representations hold great promise for large-scale robotics applications such as search-and-rescue, smart cities, and mining. Many of...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Index Database
Publisher
StartPage 6991
SubjectTerms big data in robotics and automation
Cameras
Coding
deep learning for visual perception
Embedding
Graphics processing units
Image reconstruction
Language
Neural radiance field
Representations
Robotics
Robots
Scalability
Semantic scene understanding
Semantics
Slag
software architecture for robotics and automation
Three-dimensional displays
Vectors
Title SLAG: Scalable Language-Augmented Gaussian Splatting
URI https://ieeexplore.ieee.org/document/11014241
https://www.proquest.com/docview/3215953941
Volume 10
WOSCitedRecordID wos001502469600008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 2377-3766
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001527395
  issn: 2377-3766
  databaseCode: RIE
  dateStart: 20160101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2377-3766
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001527395
  issn: 2377-3766
  databaseCode: M~E
  dateStart: 20160101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED3RigEGPosolCoDC0PaxHZimy1CtAylQhSkbpHtuAgJtVXbMPLbsR1HRUIMbB6SyHp2fB--9w7gujBeuSIyDZkx7iHhuAgZMSNlwiEkZ5yrSLpmE3Q8ZtMpf_JkdceF0Vq74jPds0N3l18sVGlTZf3YNpZFlqbeoJRWZK1tQsVKifGkvoqMeH_0nJkAECU9nFCM6rZY3vS4Xiq_DmBnVQaH_5zPERx49zHIqvU-hh09P4H9H6KCp0Amo2x4G0wM-JYWFYx8RjLMyjenwFkEQ1GuLXkymCw_hCt8bsHr4P7l7iH0vRFChQjdhAzFyqr_SUI1RywViYqIpjEy8Q_FElGeoohxKS3VPWJFyhkSMjHOnYg1Egk-g-Z8MdfnEKSCzmZmYUTCKFGzgscKE6mISomIhSRtuKlhy5eVBEbuQoeI5wbi3EKce4jb0LIwbZ_zCLWhUwOd-59knWPjbvAEcxJf_PHaJezZr1flsR1oblalvoJd9bl5X6-60Hj8uu-6XfANkimspw
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NS8MwGA4yBfXgtzid2oMXD92aNGkSb0PcJtYhbsJuIUlTEWQb2-rvN0lbJogHbzm0NDxJ837kfZ4XgJvMeuUaqyRk1riHmMdZyLAdaRsOIZVzriPlm03Q4ZBNJvylIqt7LowxxhefmbYb-rv8bKYLlyrrQNdYFjma-ibBGMGSrrVOqTgxMU7qy8iId9LXrg0BEWnHhMaoboxVGR_fTeXXEeztSm__nzM6AHuVAxl0yxU_BBtmegR2f8gKHgM8Srv9u2Bk4XfEqCCtcpJht3j3GpxZ0JfF0tEng9H8U_rS5xPw1nsY3w_CqjtCqBGmq5AhqJ3-n8LUcMQSSXSEDYXIRkA0VojyBEWMK-XI7hHLEs6QVMS6dxIaJEl8ChrT2dScgSCRNM_t0kjCKNZ5xqGOsdJYJ1hCqXAT3NawiXkpgiF88BBxYSEWDmJRQdwEJw6m9XMVQk3QqoEW1W-yFLF1ODiJOYbnf7x2DbYH4-dUpI_Dpwuw475UFsu2QGO1KMwl2NJfq4_l4srvhW-mYq69
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SLAG%3A+Scalable+Language-Augmented+Gaussian+Splatting&rft.jtitle=IEEE+robotics+and+automation+letters&rft.au=Szilagyi%2C+Laszlo&rft.au=Engelmann%2C+Francis&rft.au=Bohg%2C+Jeannette&rft.date=2025-07-01&rft.issn=2377-3766&rft.eissn=2377-3766&rft.volume=10&rft.issue=7&rft.spage=6991&rft.epage=6998&rft_id=info:doi/10.1109%2FLRA.2025.3573203&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_LRA_2025_3573203
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2377-3766&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2377-3766&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2377-3766&client=summon