SLAG: Scalable Language-Augmented Gaussian Splatting
Language-augmented scene representations hold great promise for large-scale robotics applications such as search-and-rescue, smart cities, and mining. Many of these scenarios are time-sensitive, requiring rapid scene encoding while also being data-intensive, necessitating scalable solutions. Deployi...
Gespeichert in:
| Veröffentlicht in: | IEEE robotics and automation letters Jg. 10; H. 7; S. 6991 - 6998 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Piscataway
IEEE
01.07.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Schlagworte: | |
| ISSN: | 2377-3766, 2377-3766 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Language-augmented scene representations hold great promise for large-scale robotics applications such as search-and-rescue, smart cities, and mining. Many of these scenarios are time-sensitive, requiring rapid scene encoding while also being data-intensive, necessitating scalable solutions. Deploying these representations on robots with limited computational resources further adds to the challenge. To address this, we introduce SLAG, a multi-GPU framework for language-augmented Gaussian splatting that enhances the speed and scalability of embedding large scenes. Our method integrates 2D visual-language model features into 3D scenes using SAM (Kirillov et al., 2023) and CLIP (Radford et al., 2021). Unlike prior approaches, SLAG eliminates the need for a loss function to compute per-Gaussian language embeddings. Instead, it derives embeddings from 3D Gaussian scene parameters via a normalized weighted average, enabling highly parallelized scene encoding. Additionally, we introduce a vector database for efficient embedding storage and retrieval. Our experiments show that SLAG achieves an 18× speedup in embedding computation on a 16-GPU setup compared to OpenGaussian (Wu et al., 2024), while preserving embedding quality on the ScanNet (Dai et al., 2017) and LERF (Kerr et al., 2023) datasets. |
|---|---|
| AbstractList | Language-augmented scene representations hold great promise for large-scale robotics applications such as search-and-rescue, smart cities, and mining. Many of these scenarios are time-sensitive, requiring rapid scene encoding while also being data-intensive, necessitating scalable solutions. Deploying these representations on robots with limited computational resources further adds to the challenge. To address this, we introduce SLAG, a multi-GPU framework for language-augmented Gaussian splatting that enhances the speed and scalability of embedding large scenes. Our method integrates 2D visual-language model features into 3D scenes using SAM (Kirillov et al., 2023) and CLIP (Radford et al., 2021). Unlike prior approaches, SLAG eliminates the need for a loss function to compute per-Gaussian language embeddings. Instead, it derives embeddings from 3D Gaussian scene parameters via a normalized weighted average, enabling highly parallelized scene encoding. Additionally, we introduce a vector database for efficient embedding storage and retrieval. Our experiments show that SLAG achieves an 18× speedup in embedding computation on a 16-GPU setup compared to OpenGaussian (Wu et al., 2024), while preserving embedding quality on the ScanNet (Dai et al., 2017) and LERF (Kerr et al., 2023) datasets. |
| Author | Bohg, Jeannette Engelmann, Francis Szilagyi, Laszlo |
| Author_xml | – sequence: 1 givenname: Laszlo orcidid: 0009-0004-3975-4499 surname: Szilagyi fullname: Szilagyi, Laszlo email: laszlosz@stanford.edu organization: Department of Computer Science, Stanford University, Stanford, CA, USA – sequence: 2 givenname: Francis orcidid: 0000-0001-5745-2137 surname: Engelmann fullname: Engelmann, Francis organization: Department of Computer Science, Stanford University, Stanford, CA, USA – sequence: 3 givenname: Jeannette orcidid: 0000-0002-4921-7193 surname: Bohg fullname: Bohg, Jeannette organization: Department of Computer Science, Stanford University, Stanford, CA, USA |
| BookMark | eNpNkEFLw0AQhRepYK29e_AQ8Jw6M7ubzXoLxVYhIFg9L5t0G1LSTc0mB_-9KS3oad7he2_gu2UT33rH2D3CAhH0U_6RLQhILrhUnIBfsSlxpWKukmTyL9-weQh7AEBJims5ZWKTZ-vnaFPaxhaNi3Lrq8FWLs6G6uB877bR2g4h1NZHm2Nj-7721R273tkmuPnlztjX6uVz-Rrn7-u3ZZbHJQnVxylhCUSqEMppShMrSxBOISlMFC9I6YQg1UUhOCaQbhOdki2k5NyiIyv5jD2ed49d-z240Jt9O3R-fGk4odSSa4EjBWeq7NoQOrczx64-2O7HIJiTHjPqMSc95qJnrDycK7Vz7g9HQEHj4i_pIV5u |
| CODEN | IRALC6 |
| Cites_doi | 10.1109/CVPR52733.2024.01895 10.1109/ICCV51070.2023.00125 10.1109/3DV62453.2024.00075 10.1145/3592433 10.1109/CVPR52733.2024.00510 10.1109/WACV61041.2025.00503 10.1109/ICCV51070.2023.01807 10.1109/ICRA48891.2023.10160969 10.1109/LRA.2025.3534523 10.1007/978-3-030-58452-8_24 10.1145/3588432.3591516 10.1007/978-3-031-91989-3_2 10.1007/978-3-031-73397-0_10 10.1109/WACV51458.2022.00036 10.1109/CVPR52733.2024.00463 10.1023/B:VISI.0000022288.19776.77 10.1109/CVPR.2017.261 10.1609/aaai.v39i2.32193 10.1007/978-3-031-72627-9_4 10.1109/ICRA57147.2024.10611725 10.1109/iccv51070.2023.00371 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/LRA.2025.3573203 |
| DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 2377-3766 |
| EndPage | 6998 |
| ExternalDocumentID | 10_1109_LRA_2025_3573203 11014241 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: SNSF PostDoc. Mobility Fellowship during his stay at Stanford University |
| GroupedDBID | 0R~ 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFS AGQYO AGSQL AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD IFIPE IPLJI JAVBF KQ8 M43 M~E O9- OCL RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c247t-821c0227b47e9286a5c04e71271673b27962089bb431608d6982ab5533a1e2a53 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001502469600008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2377-3766 |
| IngestDate | Sat Nov 22 13:40:28 EST 2025 Sat Nov 29 07:51:08 EST 2025 Wed Aug 27 01:52:23 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 7 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c247t-821c0227b47e9286a5c04e71271673b27962089bb431608d6982ab5533a1e2a53 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0002-4921-7193 0009-0004-3975-4499 0000-0001-5745-2137 |
| PQID | 3215953941 |
| PQPubID | 4437225 |
| PageCount | 8 |
| ParticipantIDs | crossref_primary_10_1109_LRA_2025_3573203 ieee_primary_11014241 proquest_journals_3215953941 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-07-01 |
| PublicationDateYYYYMMDD | 2025-07-01 |
| PublicationDate_xml | – month: 07 year: 2025 text: 2025-07-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | Piscataway |
| PublicationPlace_xml | – name: Piscataway |
| PublicationTitle | IEEE robotics and automation letters |
| PublicationTitleAbbrev | LRA |
| PublicationYear | 2025 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref12 ref15 Engelmann (ref6) 2024 Lemke (ref17) 2024 ref30 Radford (ref2) 2021 ref11 ref33 ref10 ref32 Ji (ref24) 2025 ref1 ref16 Wu (ref3) 2024; 37 Yue (ref14) 2024 Schnberger (ref28) 2016 ref23 Rashid (ref18) 2023 ref25 ref20 ref21 ref27 Ye (ref31) 2023 Turki (ref22) 2022 ref29 ref8 ref7 ref9 Shen (ref19) 2023 Zhang (ref26) 2025 ref4 ref5 |
| References_xml | – ident: ref7 doi: 10.1109/CVPR52733.2024.01895 – volume-title: Proc. 12th Int. Conf. Learn. Representations year: 2024 ident: ref6 article-title: OpenNerf: Open set 3D neural scene segmentation with pixel-wise features and rendered novel views – ident: ref13 doi: 10.1109/ICCV51070.2023.00125 – start-page: 8748 volume-title: Proc. Int. Conf. Mach. Learn. year: 2021 ident: ref2 article-title: Learning transferable visual models from natural language supervision – volume: 37 start-page: 19114 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2024 ident: ref3 article-title: OpenGaussian: Towards point-level 3D Gaussian-based open vocabulary understanding – ident: ref23 doi: 10.1109/3DV62453.2024.00075 – volume-title: Proc. 2nd Workshop Mobile Manipulation Embodied Intell. at ICRA year: 2024 ident: ref17 article-title: Spot-Compose: A framework for open-vocabulary object retrieval and drawer manipulation in point clouds – volume-title: Proc. 7th Annu. Conf. Robot Learn. year: 2023 ident: ref18 article-title: Language embedded radiance fields for zero-shot task-oriented grasping – ident: ref11 doi: 10.1145/3592433 – ident: ref8 doi: 10.1109/CVPR52733.2024.00510 – ident: ref25 doi: 10.1109/WACV61041.2025.00503 – ident: ref5 doi: 10.1109/ICCV51070.2023.01807 – ident: ref16 doi: 10.1109/ICRA48891.2023.10160969 – ident: ref15 doi: 10.1109/LRA.2025.3534523 – volume-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. year: 2025 ident: ref24 article-title: ARKit LabelMaker: A new scale for indoor 3D scene understanding – ident: ref10 doi: 10.1007/978-3-030-58452-8_24 – volume-title: Proc. 7th Annu. Conf. Robot Learn. year: 2023 ident: ref19 article-title: Distilled feature fields enable few-shot language-guided manipulation – volume-title: Proc. Int. Conf. Learn. Representations year: 2024 ident: ref14 article-title: AGILE3D: Attention guided interactive multi-object 3D segmentation – volume-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. year: 2025 ident: ref26 article-title: Open-vocabulary functional 3D scene graphs for real-world indoor spaces – year: 2023 ident: ref31 article-title: Mathematical supplement for the $\mathtt{gsplat}$ library – ident: ref33 doi: 10.1145/3588432.3591516 – start-page: 321 volume-title: Proc. Comput. Vis.ACCV 13th Asian Conf. Comput. Vis. year: 2016 ident: ref28 article-title: A vote-and-verify strategy for fast spatial verification in image retrieval – ident: ref30 doi: 10.1007/978-3-031-91989-3_2 – ident: ref27 doi: 10.1007/978-3-031-73397-0_10 – ident: ref29 doi: 10.1109/WACV51458.2022.00036 – ident: ref12 doi: 10.1109/CVPR52733.2024.00463 – start-page: 12922 volume-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. year: 2022 ident: ref22 article-title: Mega-NERF: Scalable construction of large-scale NeRFs – ident: ref32 doi: 10.1023/B:VISI.0000022288.19776.77 – ident: ref4 doi: 10.1109/CVPR.2017.261 – ident: ref21 doi: 10.1609/aaai.v39i2.32193 – ident: ref9 doi: 10.1007/978-3-031-72627-9_4 – ident: ref20 doi: 10.1109/ICRA57147.2024.10611725 – ident: ref1 doi: 10.1109/iccv51070.2023.00371 |
| SSID | ssj0001527395 |
| Score | 2.3032544 |
| Snippet | Language-augmented scene representations hold great promise for large-scale robotics applications such as search-and-rescue, smart cities, and mining. Many of... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Index Database Publisher |
| StartPage | 6991 |
| SubjectTerms | big data in robotics and automation Cameras Coding deep learning for visual perception Embedding Graphics processing units Image reconstruction Language Neural radiance field Representations Robotics Robots Scalability Semantic scene understanding Semantics Slag software architecture for robotics and automation Three-dimensional displays Vectors |
| Title | SLAG: Scalable Language-Augmented Gaussian Splatting |
| URI | https://ieeexplore.ieee.org/document/11014241 https://www.proquest.com/docview/3215953941 |
| Volume | 10 |
| WOSCitedRecordID | wos001502469600008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Xplore customDbUrl: eissn: 2377-3766 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001527395 issn: 2377-3766 databaseCode: RIE dateStart: 20160101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2377-3766 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001527395 issn: 2377-3766 databaseCode: M~E dateStart: 20160101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVoxQADn0UUSpWBhSGt7dixzRYhWoZSIQpSt8h2XISE2qptGPnt2I6jIiEGtgxJFL04uXvne_cAuEZcSYKEjCmzTIdArWNZUBnPnKuDZjilM-jNJth4zKdT8RTE6l4LY4zxzWem5w79Xn6x0KUrlfWRM5bFTqbeYIxVYq1tQcWNEhO03oqEoj96ziwBxLSXUJbg2hYrhB7vpfLrB-yjyuDwn89zBA5C-hhl1fs-BjtmfgL2fwwVPAVkMsqGt9HEgu9kUdEoVCTjrHzzEziLaCjLtRNPRpPlh_SNzy3wOrh_uXuIgzdCrDFhm5hjpN30P0WYEZinkmpIDEPY8h-WKMxEiiEXSjmpO-RFKjiWitrkTiKDJU3OQHO-mJtzEEmeMmJmlodIbcO5VCnRCkk0g2mhTILa4KaGLV9WIzByTx2gyC3EuYM4DxC3QcvBtD0vINQGnRroPHwk6zyx6YagiSDo4o_LLsGeu3vVHtsBzc2qNFdgV39u3terLmg8ft13_Sr4Bi-srL0 |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFA4yBfXgb3E6tQcvHroladI03oa4TaxD3ITdQpKmIsg2ttW_3yRtmSAevPXQ0vI17Xvfy_veB8ANSpQkiMuQMst0CNQ6lBmVYe5cHTTDMc2hN5tgw2EymfCXSqzutTDGGN98Ztru0O_lZzNduFJZBzljWexk6puUEIxKuda6pOKGiXFab0ZC3klfu5YCYtqOKItwbYxVBR_vpvLrF-zjSm__n090APaqBDLolm_8EGyY6RHY_TFW8BiQUdrt3wUjC78TRgVpVZMMu8W7n8GZBX1ZLJ18MhjNP6VvfT4Bb72H8f0grNwRQo0JW4UJRtrN_1OEGY6TWFINiWEIWwbEIoUZjzFMuFJO7A6TLOYJlora9E4igyWNTkFjOpuaMxDIJGbE5JaJSG0DulQx0QpJlMM4UyZCTXBbwybm5RAM4ckD5MJCLBzEooK4CU4cTOvzKoSaoFUDLarPZCkim3BwGnGCzv-47BpsD8bPqUgfh08XYMfdqWyWbYHGalGYS7Clv1Yfy8WVXwvfRRGu0w |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SLAG%3A+Scalable+Language-Augmented+Gaussian+Splatting&rft.jtitle=IEEE+robotics+and+automation+letters&rft.au=Szilagyi%2C+Laszlo&rft.au=Engelmann%2C+Francis&rft.au=Bohg%2C+Jeannette&rft.date=2025-07-01&rft.issn=2377-3766&rft.eissn=2377-3766&rft.volume=10&rft.issue=7&rft.spage=6991&rft.epage=6998&rft_id=info:doi/10.1109%2FLRA.2025.3573203&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_LRA_2025_3573203 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2377-3766&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2377-3766&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2377-3766&client=summon |