Configurable Embodied Data Generation for Class-Agnostic RGB-D Video Segmentation
This letter presents a method for generating large-scale datasets to improve class-agnostic video segmentation across robots with different form factors. Specifically, we consider the question of whether video segmentation models trained on generic segmentation data could be more effective for parti...
Gespeichert in:
| Veröffentlicht in: | IEEE robotics and automation letters Jg. 9; H. 12; S. 11409 - 11416 |
|---|---|
| Hauptverfasser: | , , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Piscataway
IEEE
01.12.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Schlagworte: | |
| ISSN: | 2377-3766, 2377-3766 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | This letter presents a method for generating large-scale datasets to improve class-agnostic video segmentation across robots with different form factors. Specifically, we consider the question of whether video segmentation models trained on generic segmentation data could be more effective for particular robot platforms if robot embodiment is factored into the data generation process. To answer this question, a pipeline is formulated for using 3D reconstructions (e.g. from HM3DSem (Yadav et al., 2023)) to generate segmented videos that are configurable based on a robot's embodiment (e.g. sensor type, sensor placement, and illumination source). A resulting massive RGB-D video panoptic segmentation dataset (MVPd) is introduced for extensive benchmarking with foundation and video segmentation models, as well as to support embodiment-focused research in video segmentation. Our experimental findings demonstrate that using MVPd for finetuning can lead to performance improvements when transferring foundation models to certain robot embodiments, such as specific camera placements. These experiments also show that using 3D modalities (depth images and camera pose) can lead to improvements in video segmentation accuracy and consistency. |
|---|---|
| AbstractList | This letter presents a method for generating large-scale datasets to improve class-agnostic video segmentation across robots with different form factors. Specifically, we consider the question of whether video segmentation models trained on generic segmentation data could be more effective for particular robot platforms if robot embodiment is factored into the data generation process. To answer this question, a pipeline is formulated for using 3D reconstructions (e.g. from HM3DSem (Yadav et al., 2023)) to generate segmented videos that are configurable based on a robot's embodiment (e.g. sensor type, sensor placement, and illumination source). A resulting massive RGB-D video panoptic segmentation dataset (MVPd) is introduced for extensive benchmarking with foundation and video segmentation models, as well as to support embodiment-focused research in video segmentation. Our experimental findings demonstrate that using MVPd for finetuning can lead to performance improvements when transferring foundation models to certain robot embodiments, such as specific camera placements. These experiments also show that using 3D modalities (depth images and camera pose) can lead to improvements in video segmentation accuracy and consistency. |
| Author | Krishnan, Aravindhan K Gayaka, Shreekant Kuo, Cheng-Hao Sun, Min Jenkins, Odest Chadwicke Opipari, Anthony Sen, Arnie |
| Author_xml | – sequence: 1 givenname: Anthony orcidid: 0000-0002-4093-302X surname: Opipari fullname: Opipari, Anthony email: topipari@umich.edu organization: University of Michigan, Ann Arbor, MI, USA – sequence: 2 givenname: Aravindhan K orcidid: 0009-0007-2348-7826 surname: Krishnan fullname: Krishnan, Aravindhan K email: krsar@amazon.com organization: Amazon Inc., Seattle, WA, USA – sequence: 3 givenname: Shreekant surname: Gayaka fullname: Gayaka, Shreekant email: sgayaka@amazon.com organization: Amazon Inc., Seattle, WA, USA – sequence: 4 givenname: Min surname: Sun fullname: Sun, Min email: minnsun@amazon.com organization: Amazon Inc., Seattle, WA, USA – sequence: 5 givenname: Cheng-Hao surname: Kuo fullname: Kuo, Cheng-Hao email: chkuo@amazon.com organization: Amazon Inc., Seattle, WA, USA – sequence: 6 givenname: Arnie surname: Sen fullname: Sen, Arnie email: senarnie@amazon.com organization: Amazon Inc., Seattle, WA, USA – sequence: 7 givenname: Odest Chadwicke orcidid: 0000-0003-3750-7334 surname: Jenkins fullname: Jenkins, Odest Chadwicke email: ocj@umich.edu organization: University of Michigan, Ann Arbor, MI, USA |
| BookMark | eNpNkE1PAjEQhhuDiYjcPXho4nmxH9vt7hFXRBMSIxKvTXc7JSXQYrsc_PcuwoHTvMk870zy3KKBDx4QuqdkQimpnhbL6YQRlk94XhaM8is0ZFzKjMuiGFzkGzROaUMIoYJJXokh-qyDt259iLrZAp7tmmAcGPyiO43n4CHqzgWPbYi43uqUsunah9S5Fi_nz9kL_nYGAv6C9Q5898_eoWurtwnG5zlCq9fZqn7LFh_z93q6yFqWiy5jQldghDSsldIQzRkQY6GxVa4pERoK08rKQMHbfqGbJtdCNrZksi21LfgIPZ7O7mP4OUDq1CYcou8_Kk6ZlKKg1ZEiJ6qNIaUIVu2j2-n4qyhRR3WqV6eO6tRZXV95OFUcAFzgkvOqFPwPNj9sGQ |
| CODEN | IRALC6 |
| Cites_doi | 10.1109/ICRA.2019.8793744 10.1109/ICCV51070.2023.00127 10.1109/CVPR52688.2022.00290 10.1109/ICCV51070.2023.00375 10.1109/ICCV48922.2021.01061 10.1109/CVPR.2019.00550 10.1109/ICCV.2017.81 10.1007/s13735-020-00195-x 10.1109/CVPR46437.2021.00262 10.1007/s11263-024-02076-w 10.1007/978-3-319-10584-0_20 10.1007/978-3-030-01246-5_24 10.1109/ICCV48922.2021.01060 10.1109/CVPR52733.2024.02640 10.1109/CVPRW53098.2021.00317 10.1109/ICCV.2013.458 10.1109/CVPR.2017.64 10.1109/CVPR52688.2022.02036 10.1109/CVPR.2007.383177 10.1109/TPAMI.2022.3225573 10.1109/CVPR.2019.00963 10.1109/ICCV51070.2023.00110 10.1109/ICCV.2019.00529 10.1007/s11263-022-01629-1 10.1109/iccvw.2019.00187 10.1109/CVPR.2017.372 10.1109/CVPR46437.2021.00412 10.1109/LRA.2024.3451395 10.1109/ICCV48922.2021.00336 10.1016/j.cag.2023.06.026 10.1109/ICCV48922.2021.00951 10.1109/CVPR.2017.261 10.1109/ICCV51070.2023.01280 10.1109/CVPR.2016.350 10.1109/ICCV.2019.00943 10.1109/CVPR.2017.565 10.1109/CVPR52729.2023.00477 10.1109/LRA.2023.3271527 10.1109/CVPR.2019.01197 10.24963/ijcai.2023/178 10.1109/CVPR52688.2022.01828 10.1109/ICCV51070.2023.00371 10.1109/3DV.2017.00081 10.1109/CVPR42600.2020.00988 10.1109/IROS.2017.8202211 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/LRA.2024.3486213 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library (IEL) (UW System Shared) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 2377-3766 |
| EndPage | 11416 |
| ExternalDocumentID | 10_1109_LRA_2024_3486213 10733985 |
| Genre | orig-research |
| GroupedDBID | 0R~ 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFS AGQYO AGSQL AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD IFIPE IPLJI JAVBF KQ8 M43 M~E O9- OCL RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c245t-25a9ed57d2c77d0a32e0dfebf94a105ae6dc79de63ce0dabb4a57bf827c8af63 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 1 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001354569700023&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2377-3766 |
| IngestDate | Mon Jun 30 12:59:50 EDT 2025 Sat Nov 29 01:34:40 EST 2025 Wed Aug 27 02:29:08 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 12 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c245t-25a9ed57d2c77d0a32e0dfebf94a105ae6dc79de63ce0dabb4a57bf827c8af63 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0009-0007-2348-7826 0000-0002-4093-302X 0000-0003-3750-7334 |
| PQID | 3127756196 |
| PQPubID | 4437225 |
| PageCount | 8 |
| ParticipantIDs | proquest_journals_3127756196 ieee_primary_10733985 crossref_primary_10_1109_LRA_2024_3486213 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-12-01 |
| PublicationDateYYYYMMDD | 2024-12-01 |
| PublicationDate_xml | – month: 12 year: 2024 text: 2024-12-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | Piscataway |
| PublicationPlace_xml | – name: Piscataway |
| PublicationTitle | IEEE robotics and automation letters |
| PublicationTitleAbbrev | LRA |
| PublicationYear | 2024 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref12 ref15 ref14 ref52 ref11 ref10 ref17 ref16 ref19 ref18 Xiang (ref30) 2021 ref51 ref46 ref45 ref48 ref47 ref42 ref44 Jocher (ref53) 2023 ref43 ref49 ref8 Radford (ref50) 2021 ref7 ref9 Bucher (ref27) 2019 ref3 Yang (ref41) 2021 ref6 ref5 ref35 ref34 ref37 ref36 ref31 ref33 ref32 ref2 ref1 ref38 ref24 ref23 ref26 ref25 ref20 ref22 ref21 Cen (ref39) 2023 ref28 Zhao (ref4) 2023 ref29 Weber (ref40) 2021; 1 |
| References_xml | – ident: ref28 doi: 10.1109/ICRA.2019.8793744 – ident: ref34 doi: 10.1109/ICCV51070.2023.00127 – year: 2023 ident: ref53 article-title: Ultralytics YOLO – ident: ref18 doi: 10.1109/CVPR52688.2022.00290 – start-page: 461 volume-title: Proc. Conf. Robot Learn. year: 2021 ident: ref30 article-title: Learning RGB-D feature embeddings for unseen object instance segmentation – ident: ref35 doi: 10.1109/ICCV51070.2023.00375 – ident: ref47 doi: 10.1109/ICCV48922.2021.01061 – ident: ref12 doi: 10.1109/CVPR.2019.00550 – ident: ref25 doi: 10.1109/ICCV.2017.81 – ident: ref6 doi: 10.1007/s13735-020-00195-x – ident: ref29 doi: 10.1109/CVPR46437.2021.00262 – ident: ref36 doi: 10.1007/s11263-024-02076-w – ident: ref5 doi: 10.1007/978-3-319-10584-0_20 – ident: ref49 doi: 10.1007/978-3-030-01246-5_24 – ident: ref43 doi: 10.1109/ICCV48922.2021.01060 – ident: ref37 doi: 10.1109/CVPR52733.2024.02640 – ident: ref21 doi: 10.1109/CVPRW53098.2021.00317 – ident: ref14 doi: 10.1109/ICCV.2013.458 – ident: ref22 doi: 10.1109/CVPR.2017.64 – volume: 1 volume-title: Proc. Int. Conf. Neural Inf. Process. Syst. year: 2021 ident: ref40 article-title: Step: Segmenting and tracking every pixel – year: 2023 ident: ref4 article-title: Fast segment anything – ident: ref11 doi: 10.1109/CVPR52688.2022.02036 – ident: ref7 doi: 10.1109/CVPR.2007.383177 – ident: ref10 doi: 10.1109/TPAMI.2022.3225573 – ident: ref51 doi: 10.1109/CVPR.2019.00963 – ident: ref26 doi: 10.1109/ICCV51070.2023.00110 – ident: ref8 doi: 10.1109/ICCV.2019.00529 – ident: ref42 doi: 10.1007/s11263-022-01629-1 – volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2019 ident: ref27 article-title: Zero-shot semantic segmentation – ident: ref52 doi: 10.1109/iccvw.2019.00187 – ident: ref23 doi: 10.1109/CVPR.2017.372 – ident: ref45 doi: 10.1109/CVPR46437.2021.00412 – start-page: 8748 volume-title: Proc. Int. Conf. Mach. Learn. year: 2021 ident: ref50 article-title: Learning transferable visual models from natural language supervision – ident: ref2 doi: 10.1109/LRA.2024.3451395 – ident: ref32 doi: 10.1109/ICCV48922.2021.00336 – ident: ref33 doi: 10.1016/j.cag.2023.06.026 – ident: ref38 doi: 10.1109/ICCV48922.2021.00951 – ident: ref46 doi: 10.1109/CVPR.2017.261 – ident: ref20 doi: 10.1109/ICCV51070.2023.01280 – ident: ref44 doi: 10.1109/CVPR.2016.350 – ident: ref48 doi: 10.1109/ICCV.2019.00943 – ident: ref24 doi: 10.1109/CVPR.2017.565 – ident: ref1 doi: 10.1109/CVPR52729.2023.00477 – ident: ref31 doi: 10.1109/LRA.2023.3271527 – ident: ref13 doi: 10.1109/CVPR.2019.01197 – ident: ref19 doi: 10.24963/ijcai.2023/178 – ident: ref17 doi: 10.1109/CVPR52688.2022.01828 – start-page: 25971 volume-title: Proc. Int. Conf. Neural Inf. Process. Syst. year: 2023 ident: ref39 article-title: Segment anything in 3D with NeRFs – ident: ref3 doi: 10.1109/ICCV51070.2023.00371 – ident: ref15 doi: 10.1109/3DV.2017.00081 – year: 2021 ident: ref41 article-title: The 3rd large-scale video object segmentation challengevideo instance segmentation track – ident: ref9 doi: 10.1109/CVPR42600.2020.00988 – ident: ref16 doi: 10.1109/IROS.2017.8202211 |
| SSID | ssj0001527395 |
| Score | 2.282967 |
| Snippet | This letter presents a method for generating large-scale datasets to improve class-agnostic video segmentation across robots with different form factors.... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Index Database Publisher |
| StartPage | 11409 |
| SubjectTerms | Benchmark testing Cameras Data collection data sets for robotic vision Datasets Form factors Image segmentation Motion segmentation Object detection RGB-D perception Robot vision systems Robots segmentation and categorization Semantics Sensor placement Three-dimensional displays Trajectory Video data |
| Title | Configurable Embodied Data Generation for Class-Agnostic RGB-D Video Segmentation |
| URI | https://ieeexplore.ieee.org/document/10733985 https://www.proquest.com/docview/3127756196 |
| Volume | 9 |
| WOSCitedRecordID | wos001354569700023&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE/IET Electronic Library (IEL) (UW System Shared) customDbUrl: eissn: 2377-3766 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001527395 issn: 2377-3766 databaseCode: RIE dateStart: 20160101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2377-3766 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001527395 issn: 2377-3766 databaseCode: M~E dateStart: 20160101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED5RxAADb0R5VB5YGAKJ7djxWKDAUBAvIbbIr6AOtKi0jPx2zk4qQIiBLVISK_rOzn3nu-8McEA1coLUsATj2QoDFJcm6IbTxFItFHWZMTHR_tiX19fF05O6acTqUQvjvY_FZ_4oXMZcvhvZadgqwxUuGVNF3oKWlKIWa31tqIRWYiqfpSJTddy_62IASPkR48jbM_bD9cSzVH79gKNXOV_55_eswnJDH0m3tvcazPnhOix9ayq4AbdBxDd4no6DKIr0XszIIc0kZ3qiSd1kOtiCIFkl8UTMpBtq7XA8cndxkpyRx4HzI3Lvn18aWdJwEx7Oew-nl0lzcAJCzPNJQnOtvMulo1ZKl6I5fOoqbyrFNfIp7YWzUjkvmMUb2hiuc2mqgkpb6EqwLZgfjoZ-G0hWWI6EkKWVodwqV4is8rzQXAjtlJdtOJxBWr7W7THKGFakqkT4ywB_2cDfhs0A4bfnavTasDczQtksoLeSZVRK5HZK7Pzx2i4shtHr0pI9mJ-Mp34fFuz7ZPA27kDr6qPXiTPkEyiYudM |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT-MwEB6xgLRw4LFbRHksPnDhENaxnTg-lveKUgFbIW6RX0E90KLS8vsZO6kAIQ7cIiVOom_izDee-cYA-0wjJ6CGJxjPVhigOJqgG6aJZTpXzKXGxET7XVf2esX9vbpuxOpRC-O9j8Vn_jAcxly-G9lpWCrDGS45V0X2AxYyIRit5VpvSyqhmZjKZslIqv52bzsYAjJxyAUy95R_cD5xN5VPv-DoV85Wv_lGa7DSEEjSqS2-DnN--AuW37UV_A03QcY3eJiOgyyKnD6akUOiSU70RJO6zXSwBkG6SuKemEknVNvh_cjt-VFyQu4Gzo_If__w2AiThi3on532jy-SZusEBFlkk4RlWnmXSceslI6iQTx1lTeVEhoZlfa5s1I5n3OLJ7QxQmfSVAWTttBVzjdgfjga-k0gaWEFUkJOK8OEVa7I08qLQos810552YaDGaTlU90go4yBBVUlwl8G-MsG_ja0AoTvrqvRa8POzAhlM4WeS54yKZHdqXzri2F78POif9Utu_96l9uwFJ5UF5rswPxkPPW7sGhfJoPn8Z_4nbwCllu76Q |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Configurable+Embodied+Data+Generation+for+Class-Agnostic+RGB-D+Video+Segmentation&rft.jtitle=IEEE+robotics+and+automation+letters&rft.au=Opipari%2C+Anthony&rft.au=Krishnan%2C+Aravindhan+K&rft.au=Gayaka%2C+Shreekant&rft.au=Sun%2C+Min&rft.date=2024-12-01&rft.pub=IEEE&rft.eissn=2377-3766&rft.volume=9&rft.issue=12&rft.spage=11409&rft.epage=11416&rft_id=info:doi/10.1109%2FLRA.2024.3486213&rft.externalDocID=10733985 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2377-3766&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2377-3766&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2377-3766&client=summon |