Configurable Embodied Data Generation for Class-Agnostic RGB-D Video Segmentation

This letter presents a method for generating large-scale datasets to improve class-agnostic video segmentation across robots with different form factors. Specifically, we consider the question of whether video segmentation models trained on generic segmentation data could be more effective for parti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE robotics and automation letters Jg. 9; H. 12; S. 11409 - 11416
Hauptverfasser: Opipari, Anthony, Krishnan, Aravindhan K, Gayaka, Shreekant, Sun, Min, Kuo, Cheng-Hao, Sen, Arnie, Jenkins, Odest Chadwicke
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Piscataway IEEE 01.12.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:
ISSN:2377-3766, 2377-3766
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract This letter presents a method for generating large-scale datasets to improve class-agnostic video segmentation across robots with different form factors. Specifically, we consider the question of whether video segmentation models trained on generic segmentation data could be more effective for particular robot platforms if robot embodiment is factored into the data generation process. To answer this question, a pipeline is formulated for using 3D reconstructions (e.g. from HM3DSem (Yadav et al., 2023)) to generate segmented videos that are configurable based on a robot's embodiment (e.g. sensor type, sensor placement, and illumination source). A resulting massive RGB-D video panoptic segmentation dataset (MVPd) is introduced for extensive benchmarking with foundation and video segmentation models, as well as to support embodiment-focused research in video segmentation. Our experimental findings demonstrate that using MVPd for finetuning can lead to performance improvements when transferring foundation models to certain robot embodiments, such as specific camera placements. These experiments also show that using 3D modalities (depth images and camera pose) can lead to improvements in video segmentation accuracy and consistency.
AbstractList This letter presents a method for generating large-scale datasets to improve class-agnostic video segmentation across robots with different form factors. Specifically, we consider the question of whether video segmentation models trained on generic segmentation data could be more effective for particular robot platforms if robot embodiment is factored into the data generation process. To answer this question, a pipeline is formulated for using 3D reconstructions (e.g. from HM3DSem (Yadav et al., 2023)) to generate segmented videos that are configurable based on a robot's embodiment (e.g. sensor type, sensor placement, and illumination source). A resulting massive RGB-D video panoptic segmentation dataset (MVPd) is introduced for extensive benchmarking with foundation and video segmentation models, as well as to support embodiment-focused research in video segmentation. Our experimental findings demonstrate that using MVPd for finetuning can lead to performance improvements when transferring foundation models to certain robot embodiments, such as specific camera placements. These experiments also show that using 3D modalities (depth images and camera pose) can lead to improvements in video segmentation accuracy and consistency.
Author Krishnan, Aravindhan K
Gayaka, Shreekant
Kuo, Cheng-Hao
Sun, Min
Jenkins, Odest Chadwicke
Opipari, Anthony
Sen, Arnie
Author_xml – sequence: 1
  givenname: Anthony
  orcidid: 0000-0002-4093-302X
  surname: Opipari
  fullname: Opipari, Anthony
  email: topipari@umich.edu
  organization: University of Michigan, Ann Arbor, MI, USA
– sequence: 2
  givenname: Aravindhan K
  orcidid: 0009-0007-2348-7826
  surname: Krishnan
  fullname: Krishnan, Aravindhan K
  email: krsar@amazon.com
  organization: Amazon Inc., Seattle, WA, USA
– sequence: 3
  givenname: Shreekant
  surname: Gayaka
  fullname: Gayaka, Shreekant
  email: sgayaka@amazon.com
  organization: Amazon Inc., Seattle, WA, USA
– sequence: 4
  givenname: Min
  surname: Sun
  fullname: Sun, Min
  email: minnsun@amazon.com
  organization: Amazon Inc., Seattle, WA, USA
– sequence: 5
  givenname: Cheng-Hao
  surname: Kuo
  fullname: Kuo, Cheng-Hao
  email: chkuo@amazon.com
  organization: Amazon Inc., Seattle, WA, USA
– sequence: 6
  givenname: Arnie
  surname: Sen
  fullname: Sen, Arnie
  email: senarnie@amazon.com
  organization: Amazon Inc., Seattle, WA, USA
– sequence: 7
  givenname: Odest Chadwicke
  orcidid: 0000-0003-3750-7334
  surname: Jenkins
  fullname: Jenkins, Odest Chadwicke
  email: ocj@umich.edu
  organization: University of Michigan, Ann Arbor, MI, USA
BookMark eNpNkE1PAjEQhhuDiYjcPXho4nmxH9vt7hFXRBMSIxKvTXc7JSXQYrsc_PcuwoHTvMk870zy3KKBDx4QuqdkQimpnhbL6YQRlk94XhaM8is0ZFzKjMuiGFzkGzROaUMIoYJJXokh-qyDt259iLrZAp7tmmAcGPyiO43n4CHqzgWPbYi43uqUsunah9S5Fi_nz9kL_nYGAv6C9Q5898_eoWurtwnG5zlCq9fZqn7LFh_z93q6yFqWiy5jQldghDSsldIQzRkQY6GxVa4pERoK08rKQMHbfqGbJtdCNrZksi21LfgIPZ7O7mP4OUDq1CYcou8_Kk6ZlKKg1ZEiJ6qNIaUIVu2j2-n4qyhRR3WqV6eO6tRZXV95OFUcAFzgkvOqFPwPNj9sGQ
CODEN IRALC6
Cites_doi 10.1109/ICRA.2019.8793744
10.1109/ICCV51070.2023.00127
10.1109/CVPR52688.2022.00290
10.1109/ICCV51070.2023.00375
10.1109/ICCV48922.2021.01061
10.1109/CVPR.2019.00550
10.1109/ICCV.2017.81
10.1007/s13735-020-00195-x
10.1109/CVPR46437.2021.00262
10.1007/s11263-024-02076-w
10.1007/978-3-319-10584-0_20
10.1007/978-3-030-01246-5_24
10.1109/ICCV48922.2021.01060
10.1109/CVPR52733.2024.02640
10.1109/CVPRW53098.2021.00317
10.1109/ICCV.2013.458
10.1109/CVPR.2017.64
10.1109/CVPR52688.2022.02036
10.1109/CVPR.2007.383177
10.1109/TPAMI.2022.3225573
10.1109/CVPR.2019.00963
10.1109/ICCV51070.2023.00110
10.1109/ICCV.2019.00529
10.1007/s11263-022-01629-1
10.1109/iccvw.2019.00187
10.1109/CVPR.2017.372
10.1109/CVPR46437.2021.00412
10.1109/LRA.2024.3451395
10.1109/ICCV48922.2021.00336
10.1016/j.cag.2023.06.026
10.1109/ICCV48922.2021.00951
10.1109/CVPR.2017.261
10.1109/ICCV51070.2023.01280
10.1109/CVPR.2016.350
10.1109/ICCV.2019.00943
10.1109/CVPR.2017.565
10.1109/CVPR52729.2023.00477
10.1109/LRA.2023.3271527
10.1109/CVPR.2019.01197
10.24963/ijcai.2023/178
10.1109/CVPR52688.2022.01828
10.1109/ICCV51070.2023.00371
10.1109/3DV.2017.00081
10.1109/CVPR42600.2020.00988
10.1109/IROS.2017.8202211
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/LRA.2024.3486213
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library (IEL) (UW System Shared)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2377-3766
EndPage 11416
ExternalDocumentID 10_1109_LRA_2024_3486213
10733985
Genre orig-research
GroupedDBID 0R~
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFS
AGQYO
AGSQL
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
IFIPE
IPLJI
JAVBF
KQ8
M43
M~E
O9-
OCL
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c245t-25a9ed57d2c77d0a32e0dfebf94a105ae6dc79de63ce0dabb4a57bf827c8af63
IEDL.DBID RIE
ISICitedReferencesCount 1
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001354569700023&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2377-3766
IngestDate Mon Jun 30 12:59:50 EDT 2025
Sat Nov 29 01:34:40 EST 2025
Wed Aug 27 02:29:08 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 12
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c245t-25a9ed57d2c77d0a32e0dfebf94a105ae6dc79de63ce0dabb4a57bf827c8af63
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0009-0007-2348-7826
0000-0002-4093-302X
0000-0003-3750-7334
PQID 3127756196
PQPubID 4437225
PageCount 8
ParticipantIDs proquest_journals_3127756196
ieee_primary_10733985
crossref_primary_10_1109_LRA_2024_3486213
PublicationCentury 2000
PublicationDate 2024-12-01
PublicationDateYYYYMMDD 2024-12-01
PublicationDate_xml – month: 12
  year: 2024
  text: 2024-12-01
  day: 01
PublicationDecade 2020
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE robotics and automation letters
PublicationTitleAbbrev LRA
PublicationYear 2024
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref15
ref14
ref52
ref11
ref10
ref17
ref16
ref19
ref18
Xiang (ref30) 2021
ref51
ref46
ref45
ref48
ref47
ref42
ref44
Jocher (ref53) 2023
ref43
ref49
ref8
Radford (ref50) 2021
ref7
ref9
Bucher (ref27) 2019
ref3
Yang (ref41) 2021
ref6
ref5
ref35
ref34
ref37
ref36
ref31
ref33
ref32
ref2
ref1
ref38
ref24
ref23
ref26
ref25
ref20
ref22
ref21
Cen (ref39) 2023
ref28
Zhao (ref4) 2023
ref29
Weber (ref40) 2021; 1
References_xml – ident: ref28
  doi: 10.1109/ICRA.2019.8793744
– ident: ref34
  doi: 10.1109/ICCV51070.2023.00127
– year: 2023
  ident: ref53
  article-title: Ultralytics YOLO
– ident: ref18
  doi: 10.1109/CVPR52688.2022.00290
– start-page: 461
  volume-title: Proc. Conf. Robot Learn.
  year: 2021
  ident: ref30
  article-title: Learning RGB-D feature embeddings for unseen object instance segmentation
– ident: ref35
  doi: 10.1109/ICCV51070.2023.00375
– ident: ref47
  doi: 10.1109/ICCV48922.2021.01061
– ident: ref12
  doi: 10.1109/CVPR.2019.00550
– ident: ref25
  doi: 10.1109/ICCV.2017.81
– ident: ref6
  doi: 10.1007/s13735-020-00195-x
– ident: ref29
  doi: 10.1109/CVPR46437.2021.00262
– ident: ref36
  doi: 10.1007/s11263-024-02076-w
– ident: ref5
  doi: 10.1007/978-3-319-10584-0_20
– ident: ref49
  doi: 10.1007/978-3-030-01246-5_24
– ident: ref43
  doi: 10.1109/ICCV48922.2021.01060
– ident: ref37
  doi: 10.1109/CVPR52733.2024.02640
– ident: ref21
  doi: 10.1109/CVPRW53098.2021.00317
– ident: ref14
  doi: 10.1109/ICCV.2013.458
– ident: ref22
  doi: 10.1109/CVPR.2017.64
– volume: 1
  volume-title: Proc. Int. Conf. Neural Inf. Process. Syst.
  year: 2021
  ident: ref40
  article-title: Step: Segmenting and tracking every pixel
– year: 2023
  ident: ref4
  article-title: Fast segment anything
– ident: ref11
  doi: 10.1109/CVPR52688.2022.02036
– ident: ref7
  doi: 10.1109/CVPR.2007.383177
– ident: ref10
  doi: 10.1109/TPAMI.2022.3225573
– ident: ref51
  doi: 10.1109/CVPR.2019.00963
– ident: ref26
  doi: 10.1109/ICCV51070.2023.00110
– ident: ref8
  doi: 10.1109/ICCV.2019.00529
– ident: ref42
  doi: 10.1007/s11263-022-01629-1
– volume-title: Proc. Adv. Neural Inf. Process. Syst.
  year: 2019
  ident: ref27
  article-title: Zero-shot semantic segmentation
– ident: ref52
  doi: 10.1109/iccvw.2019.00187
– ident: ref23
  doi: 10.1109/CVPR.2017.372
– ident: ref45
  doi: 10.1109/CVPR46437.2021.00412
– start-page: 8748
  volume-title: Proc. Int. Conf. Mach. Learn.
  year: 2021
  ident: ref50
  article-title: Learning transferable visual models from natural language supervision
– ident: ref2
  doi: 10.1109/LRA.2024.3451395
– ident: ref32
  doi: 10.1109/ICCV48922.2021.00336
– ident: ref33
  doi: 10.1016/j.cag.2023.06.026
– ident: ref38
  doi: 10.1109/ICCV48922.2021.00951
– ident: ref46
  doi: 10.1109/CVPR.2017.261
– ident: ref20
  doi: 10.1109/ICCV51070.2023.01280
– ident: ref44
  doi: 10.1109/CVPR.2016.350
– ident: ref48
  doi: 10.1109/ICCV.2019.00943
– ident: ref24
  doi: 10.1109/CVPR.2017.565
– ident: ref1
  doi: 10.1109/CVPR52729.2023.00477
– ident: ref31
  doi: 10.1109/LRA.2023.3271527
– ident: ref13
  doi: 10.1109/CVPR.2019.01197
– ident: ref19
  doi: 10.24963/ijcai.2023/178
– ident: ref17
  doi: 10.1109/CVPR52688.2022.01828
– start-page: 25971
  volume-title: Proc. Int. Conf. Neural Inf. Process. Syst.
  year: 2023
  ident: ref39
  article-title: Segment anything in 3D with NeRFs
– ident: ref3
  doi: 10.1109/ICCV51070.2023.00371
– ident: ref15
  doi: 10.1109/3DV.2017.00081
– year: 2021
  ident: ref41
  article-title: The 3rd large-scale video object segmentation challengevideo instance segmentation track
– ident: ref9
  doi: 10.1109/CVPR42600.2020.00988
– ident: ref16
  doi: 10.1109/IROS.2017.8202211
SSID ssj0001527395
Score 2.282967
Snippet This letter presents a method for generating large-scale datasets to improve class-agnostic video segmentation across robots with different form factors....
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Index Database
Publisher
StartPage 11409
SubjectTerms Benchmark testing
Cameras
Data collection
data sets for robotic vision
Datasets
Form factors
Image segmentation
Motion segmentation
Object detection
RGB-D perception
Robot vision systems
Robots
segmentation and categorization
Semantics
Sensor placement
Three-dimensional displays
Trajectory
Video data
Title Configurable Embodied Data Generation for Class-Agnostic RGB-D Video Segmentation
URI https://ieeexplore.ieee.org/document/10733985
https://www.proquest.com/docview/3127756196
Volume 9
WOSCitedRecordID wos001354569700023&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE/IET Electronic Library (IEL) (UW System Shared)
  customDbUrl:
  eissn: 2377-3766
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001527395
  issn: 2377-3766
  databaseCode: RIE
  dateStart: 20160101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2377-3766
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001527395
  issn: 2377-3766
  databaseCode: M~E
  dateStart: 20160101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED5RxAADb0R5VB5YGAKJ7djxWKDAUBAvIbbIr6AOtKi0jPx2zk4qQIiBLVISK_rOzn3nu-8McEA1coLUsATj2QoDFJcm6IbTxFItFHWZMTHR_tiX19fF05O6acTqUQvjvY_FZ_4oXMZcvhvZadgqwxUuGVNF3oKWlKIWa31tqIRWYiqfpSJTddy_62IASPkR48jbM_bD9cSzVH79gKNXOV_55_eswnJDH0m3tvcazPnhOix9ayq4AbdBxDd4no6DKIr0XszIIc0kZ3qiSd1kOtiCIFkl8UTMpBtq7XA8cndxkpyRx4HzI3Lvn18aWdJwEx7Oew-nl0lzcAJCzPNJQnOtvMulo1ZKl6I5fOoqbyrFNfIp7YWzUjkvmMUb2hiuc2mqgkpb6EqwLZgfjoZ-G0hWWI6EkKWVodwqV4is8rzQXAjtlJdtOJxBWr7W7THKGFakqkT4ywB_2cDfhs0A4bfnavTasDczQtksoLeSZVRK5HZK7Pzx2i4shtHr0pI9mJ-Mp34fFuz7ZPA27kDr6qPXiTPkEyiYudM
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT-MwEB6xgLRw4LFbRHksPnDhENaxnTg-lveKUgFbIW6RX0E90KLS8vsZO6kAIQ7cIiVOom_izDee-cYA-0wjJ6CGJxjPVhigOJqgG6aJZTpXzKXGxET7XVf2esX9vbpuxOpRC-O9j8Vn_jAcxly-G9lpWCrDGS45V0X2AxYyIRit5VpvSyqhmZjKZslIqv52bzsYAjJxyAUy95R_cD5xN5VPv-DoV85Wv_lGa7DSEEjSqS2-DnN--AuW37UV_A03QcY3eJiOgyyKnD6akUOiSU70RJO6zXSwBkG6SuKemEknVNvh_cjt-VFyQu4Gzo_If__w2AiThi3on532jy-SZusEBFlkk4RlWnmXSceslI6iQTx1lTeVEhoZlfa5s1I5n3OLJ7QxQmfSVAWTttBVzjdgfjga-k0gaWEFUkJOK8OEVa7I08qLQos810552YaDGaTlU90go4yBBVUlwl8G-MsG_ja0AoTvrqvRa8POzAhlM4WeS54yKZHdqXzri2F78POif9Utu_96l9uwFJ5UF5rswPxkPPW7sGhfJoPn8Z_4nbwCllu76Q
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Configurable+Embodied+Data+Generation+for+Class-Agnostic+RGB-D+Video+Segmentation&rft.jtitle=IEEE+robotics+and+automation+letters&rft.au=Opipari%2C+Anthony&rft.au=Krishnan%2C+Aravindhan+K&rft.au=Gayaka%2C+Shreekant&rft.au=Sun%2C+Min&rft.date=2024-12-01&rft.pub=IEEE&rft.eissn=2377-3766&rft.volume=9&rft.issue=12&rft.spage=11409&rft.epage=11416&rft_id=info:doi/10.1109%2FLRA.2024.3486213&rft.externalDocID=10733985
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2377-3766&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2377-3766&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2377-3766&client=summon