CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
General-purpose robots coexisting with humans in their environment must learn to relate human language to their perceptions and actions to be useful in a range of daily tasks. Moreover, they need to acquire a diverse repertoire of general-purpose skills that allow composing long-horizon tasks by fol...
Uloženo v:
| Vydáno v: | IEEE robotics and automation letters Ročník 7; číslo 3; s. 7327 - 7334 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Piscataway
IEEE
01.07.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Témata: | |
| ISSN: | 2377-3766, 2377-3766 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | General-purpose robots coexisting with humans in their environment must learn to relate human language to their perceptions and actions to be useful in a range of daily tasks. Moreover, they need to acquire a diverse repertoire of general-purpose skills that allow composing long-horizon tasks by following unconstrained language instructions. In this letter, we present Composing Actions from Language and Vision (CALVIN) ( C omposing A ctions from L anguage and Vi sio n ), an open-source simulated benchmark to learn long-horizon language-conditioned tasks. Our aim is to make it possible to develop agents that can solve many robotic manipulation tasks over a long horizon, from onboard sensors, and specified only via human language. CALVIN tasks are more complex in terms of sequence length, action space, and language than existing vision-and-language task datasets and supports flexible specification of sensor suites. We evaluate the agents in zero-shot to novel language instructions and to novel environments. We show that a baseline model based on multi-context imitation learning performs poorly on CALVIN, suggesting that there is significant room for developing innovative agents that learn to relate human language to their world models with this benchmark. |
|---|---|
| AbstractList | General-purpose robots coexisting with humans in their environment must learn to relate human language to their perceptions and actions to be useful in a range of daily tasks. Moreover, they need to acquire a diverse repertoire of general-purpose skills that allow composing long-horizon tasks by following unconstrained language instructions. In this letter, we present Composing Actions from Language and Vision (CALVIN) ( C omposing A ctions from L anguage and Vi sio n ), an open-source simulated benchmark to learn long-horizon language-conditioned tasks. Our aim is to make it possible to develop agents that can solve many robotic manipulation tasks over a long horizon, from onboard sensors, and specified only via human language. CALVIN tasks are more complex in terms of sequence length, action space, and language than existing vision-and-language task datasets and supports flexible specification of sensor suites. We evaluate the agents in zero-shot to novel language instructions and to novel environments. We show that a baseline model based on multi-context imitation learning performs poorly on CALVIN, suggesting that there is significant room for developing innovative agents that learn to relate human language to their world models with this benchmark. |
| Author | Hermann, Lukas Burgard, Wolfram Burgard Mees, Oier Rosete-Beas, Erick |
| Author_xml | – sequence: 1 givenname: Oier orcidid: 0000-0001-6020-9744 surname: Mees fullname: Mees, Oier email: meeso@informatik.uni-freiburg.de organization: University of Freiburg, Freiburg, Germany – sequence: 2 givenname: Lukas surname: Hermann fullname: Hermann, Lukas email: hermannl@informatik.uni-freiburg.de organization: University of Freiburg, Freiburg, Germany – sequence: 3 givenname: Erick surname: Rosete-Beas fullname: Rosete-Beas, Erick email: erick.rosete@students.uni-freiburg.de organization: University of Freiburg, Freiburg, Germany – sequence: 4 givenname: Wolfram Burgard orcidid: 0000-0002-5680-6500 surname: Burgard fullname: Burgard, Wolfram Burgard email: burgard@informatik.uni-freiburg.de organization: University of Freiburg, Freiburg, Germany |
| BookMark | eNp9kDtPwzAUhS1UJErpjsRiiTnFjyS22UoFtFJ4qCqskZPYwW2wi5MM5deTkgohBqZ7h_Pdc885BQPrrALgHKMJxkhcJcvphCBCJhRzhBE_AkNCGQsoi-PBr_0EjOt6jRDCEWFUREOgZ9PkdfF4DafwRtn87V36DdTOw0TaspWlCmbOFqYxnWEBn11l8h1MlPTW2LIXOlsGc-fNp7Nw6TLXwAdpzbat5J6CK1lv6jNwrGVVq_FhjsDL3e1qNg-Sp_tF90KQE4GbgOWxEDpHWBaYc6YER6yQOlY6yzDJeE7CWHOiIsQpIZFkUcgIpiHSYRZKLOkIXPZ3t959tKpu0rVrve0sUxIzQQWJUNSpUK_Kvatrr3S69aZLvksxSveFpl2h6b7Q9FBoh8R_kNw03wEbL031H3jRg0Yp9eMjGOeIxPQLwmCDFQ |
| CODEN | IRALC6 |
| CitedBy_id | crossref_primary_10_1016_j_robot_2022_104294 crossref_primary_10_1007_s10514_023_10129_1 crossref_primary_10_1109_LRA_2024_3421849 crossref_primary_10_1177_02783649251351658 crossref_primary_10_1109_LRA_2025_3526436 crossref_primary_10_1109_LRA_2025_3595034 crossref_primary_10_1080_01691864_2024_2408593 crossref_primary_10_1109_TRO_2025_3577437 crossref_primary_10_1109_LRA_2024_3477095 crossref_primary_10_1016_j_comcom_2024_04_029 crossref_primary_10_1016_j_inffus_2025_103652 crossref_primary_10_1109_LRA_2024_3433309 crossref_primary_10_1007_s12555_024_0438_7 crossref_primary_10_1080_01691864_2024_2379381 crossref_primary_10_1109_LRA_2025_3585390 crossref_primary_10_1007_s10514_023_10131_7 crossref_primary_10_1007_s10514_023_10134_4 crossref_primary_10_1016_j_inffus_2025_103198 crossref_primary_10_1109_LRA_2022_3196123 crossref_primary_10_1109_LRA_2023_3313058 crossref_primary_10_1109_LRA_2024_3443610 crossref_primary_10_3389_frobt_2025_1606247 crossref_primary_10_1016_j_engappai_2025_111004 crossref_primary_10_1109_LRA_2024_3466076 crossref_primary_10_1109_LRA_2025_3597846 crossref_primary_10_1631_FITEE_2300548 crossref_primary_10_1016_j_neucom_2025_129963 crossref_primary_10_1002_aaai_12197 crossref_primary_10_1016_j_imavis_2024_105280 crossref_primary_10_1177_02783649241304789 crossref_primary_10_1109_LRA_2025_3575013 crossref_primary_10_1016_j_jmsy_2024_05_003 crossref_primary_10_7717_peerj_cs_2097 crossref_primary_10_1007_s11081_025_09990_z |
| Cites_doi | 10.1007/978-3-030-58621-8_45 10.1109/CVPR.2018.00387 10.1109/ICRA40945.2020.9196582 10.1109/ICRA.2019.8793485 10.15607/RSS.2020.XVI.080 10.3115/v1/D14-1086 10.15607/RSS.2016.XII.037 10.1109/LRA.2022.3146945 10.1177/02783649211046285 10.18653/v1/D17-1106 10.15607/RSS.2021.XVII.047 10.1109/CVPR42600.2020.01075 10.1109/IROS.2016.7759048 10.1109/ICRA48506.2021.9560895 10.1007/978-3-030-71151-1_43 10.1146/annurev-control-101119-071628 10.15607/RSS.2018.XIV.028 10.1109/ICRA.2018.8460699 10.15607/RSS.2021.XVII.020 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/LRA.2022.3180108 |
| DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 2377-3766 |
| EndPage | 7334 |
| ExternalDocumentID | 10_1109_LRA_2022_3180108 9788026 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: German Federal Ministry of Education and Research grantid: 01IS18040B-OML |
| GroupedDBID | 0R~ 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFS AGQYO AGSQL AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD IFIPE IPLJI JAVBF KQ8 M43 M~E O9- OCL RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c291t-7c699fc01ad1887e9807daf6efbb12b8c246f82e5083225a754721340f4b4a1a3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 94 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000814637000014&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2377-3766 |
| IngestDate | Sun Nov 30 04:06:16 EST 2025 Sat Nov 29 06:03:17 EST 2025 Tue Nov 18 19:41:45 EST 2025 Wed Aug 27 02:23:54 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 3 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c291t-7c699fc01ad1887e9807daf6efbb12b8c246f82e5083225a754721340f4b4a1a3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0002-5680-6500 0000-0001-6020-9744 |
| PQID | 2679392505 |
| PQPubID | 4437225 |
| PageCount | 8 |
| ParticipantIDs | crossref_citationtrail_10_1109_LRA_2022_3180108 ieee_primary_9788026 proquest_journals_2679392505 crossref_primary_10_1109_LRA_2022_3180108 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-07-01 |
| PublicationDateYYYYMMDD | 2022-07-01 |
| PublicationDate_xml | – month: 07 year: 2022 text: 2022-07-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | Piscataway |
| PublicationPlace_xml | – name: Piscataway |
| PublicationTitle | IEEE robotics and automation letters |
| PublicationTitleAbbrev | LRA |
| PublicationYear | 2022 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref35 ref13 ref34 ref12 ref15 ref36 ref14 ref33 wang (ref31) 2020; 33 ref11 radford (ref37) 0 liu (ref18) 2022 shridhar (ref23) 0 stepputtis (ref24) 2020; 33 ref17 yu (ref2) 0 ref16 ref19 lu (ref10) 2019; 32 kalashnikov (ref3) 0 nair (ref7) 0 lynch (ref1) 0 ref26 haan (ref32) 2019; 32 ref22 ref21 yu (ref20) 2018 yu (ref38) 2021; 34 coumans (ref28) 2016 jang (ref25) 0 ref8 young (ref4) 2021 ref9 kaelbling (ref29) 0 blukis (ref27) 0 ref6 ref5 andrychowicz (ref30) 2017; 30 |
| References_xml | – year: 2016 ident: ref28 article-title: Pybullet, a python module for physics simulation for games, robotics and machine learning – ident: ref35 doi: 10.1007/978-3-030-58621-8_45 – ident: ref21 doi: 10.1109/CVPR.2018.00387 – ident: ref36 doi: 10.1109/ICRA40945.2020.9196582 – volume: 32 start-page: 11698 year: 2019 ident: ref32 article-title: Causal confusion in imitation learning publication-title: Neural Inf Process Syst – ident: ref34 doi: 10.1109/ICRA.2019.8793485 – start-page: 894 year: 0 ident: ref23 article-title: CLIPort: What and where pathways for robotic manipulation publication-title: Proc Conf Robot Learn – volume: 33 start-page: 5776 year: 2020 ident: ref31 article-title: Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers publication-title: Neural Inf Process Syst – ident: ref14 doi: 10.15607/RSS.2020.XVI.080 – start-page: 1113 year: 0 ident: ref1 article-title: Learning latent plans from play publication-title: Proc Conf Robot Learn – volume: 32 start-page: 13 year: 2019 ident: ref10 article-title: Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks publication-title: Neural Inf Process Syst – start-page: 1094 year: 0 ident: ref2 article-title: Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning publication-title: Proc Conf Robot Learn – year: 2022 ident: ref18 article-title: Structformer: Learning spatial structure for language-guided semantic rearrangement of novel objects publication-title: ICRA – ident: ref9 doi: 10.3115/v1/D14-1086 – volume: 30 start-page: 5055 year: 2017 ident: ref30 article-title: Hindsight experience replay publication-title: Neural Inf Process Syst – ident: ref11 doi: 10.15607/RSS.2016.XII.037 – year: 2021 ident: ref4 article-title: Playful interactions for representation learning – ident: ref5 doi: 10.1109/LRA.2022.3146945 – volume: 34 start-page: 11501 year: 2021 ident: ref38 article-title: Conservative data sharing for multi-task offline reinforcement learning publication-title: Neural Inf Process Syst – ident: ref26 doi: 10.1177/02783649211046285 – start-page: 8748 year: 0 ident: ref37 article-title: Learning transferable visual models from natural language supervision – ident: ref19 doi: 10.18653/v1/D17-1106 – ident: ref6 doi: 10.15607/RSS.2021.XVII.047 – start-page: 1303 year: 0 ident: ref7 article-title: Learning language-conditioned robot behavior from offline data and crowd-sourced annotation publication-title: Proc Conf Robot Learn – ident: ref22 doi: 10.1109/CVPR42600.2020.01075 – ident: ref33 doi: 10.1109/IROS.2016.7759048 – start-page: 505 year: 0 ident: ref27 article-title: Mapping navigation instructions to continuous control actions with position-visitation prediction publication-title: Proc Conf Robot Learn – volume: 33 start-page: 13139 year: 2020 ident: ref24 article-title: Language-conditioned imitation learning for robot manipulation tasks publication-title: Neural Inf Process Syst – ident: ref17 doi: 10.1109/ICRA48506.2021.9560895 – ident: ref16 doi: 10.1007/978-3-030-71151-1_43 – ident: ref8 doi: 10.1146/annurev-control-101119-071628 – ident: ref12 doi: 10.15607/RSS.2018.XIV.028 – start-page: 991 year: 0 ident: ref25 article-title: BC-0: Zero-shot task generalization with robotic imitation learning publication-title: Proc Conf Robot Learn – start-page: 1094 year: 0 ident: ref29 article-title: Learning to achieve goals publication-title: Proc Int Joint Conf Artif Intell – ident: ref13 doi: 10.1109/ICRA.2018.8460699 – year: 2018 ident: ref20 article-title: Interactive grounded language acquisition and generalization in a 2D world publication-title: Proc Int Conf Learn Representations – ident: ref15 doi: 10.15607/RSS.2021.XVII.020 – start-page: 557 year: 0 ident: ref3 article-title: Scaling up multi-task robotic reinforcement learning publication-title: Proc 5th Conf Robot Learn |
| SSID | ssj0001527395 |
| Score | 2.6338027 |
| Snippet | General-purpose robots coexisting with humans in their environment must learn to relate human language to their perceptions and actions to be useful in a range... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 7327 |
| SubjectTerms | Benchmark testing Benchmarks Cameras Data sets for robot learning Grippers Horizon imitation learning Language machine learning for robot control natural dialog for HRI Robot learning Robot sensing systems Robot vision systems Robots Task analysis Task complexity |
| Title | CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks |
| URI | https://ieeexplore.ieee.org/document/9788026 https://www.proquest.com/docview/2679392505 |
| Volume | 7 |
| WOSCitedRecordID | wos000814637000014&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Xplore customDbUrl: eissn: 2377-3766 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001527395 issn: 2377-3766 databaseCode: RIE dateStart: 20160101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2377-3766 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001527395 issn: 2377-3766 databaseCode: M~E dateStart: 20160101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFH848aAHv8X5MXLwIhjXdV3TeJvDMWEOEZXdSpomc0xb2aYHD_7tvtd2c6AI9pRDEkJ_eV95ye8BnFjPU-Sp88DaiKMkCh6hWeGxkhK907owOsiKTYheL-j35e0SnM3fwhhjsstn5pyaWS4_TvUbHZURG2yAMUMJSkL4-Vut7_MUYhKTjVkm0pHV7l0T4z_XxbAU1TDVj1ywPFkplR_6NzMq7Y3_LWcT1gvnkTVztLdgySTbsLZAKbgDttXsPl73LliTXeIGfHpR4xFDx5R1i4NJ3kopS00ERTHLWYFZQbI6yDumyYB30vHwI03YXRqlU3ajkuGszhe7V5PRZBce2lf3rQ4vailw7cralAvtS2m1U1NxDfWKkYEjYmV9YyPEJgq06_k2cA2xw6OIK9HwBJG9OdaLPFVT9T1YTnBl-8CcRoReplbSqRMXf0yNSNe1H1v8TFCG6uw_h7ogGqd6F89hFnA4MkRkQkImLJApw-l8xGtOsvFH3x1CYt6vAKEMRzMow0IKJ6Hro_aR5OQd_D7qEFZp7vz67REsT8dv5hhW9Pt0OBlXoHTzeVXJttkX-SLPVQ |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8QwEB58gXrwLa7PHLwIRrvdbtt4WxdlF-sisoq3kqaJLmor-_Dgr3emza6CIthTDgkJ_ZLJTCb5PoBD43mSPHUeGpNwXIkBT3Bb4akUAr3TWqBVWIhNBJ1O-PAgbqbgePIWRmtdXD7TJ1QscvlprkZ0VEZssCHGDNMwS8pZ9rXW14kKcYmJ-jgX6YjT6LaBEaDrYmCKhpgUJL_tPYWYyg8LXGwrl8v_G9AKLFn3kTVKvFdhSmdrsPiNVHAdTLMR3bc7Z6zBznEKPr3K_jND15RF9miSN3PKUxNFUcpKXmBmaVYfy4p59shbeb_3kWfsNk_yIbuWWW-s9MW6cvA82IC7y4tus8WtmgJXrqgOeaB8IYxyqjKtomXRInSCVBpfmwTRSULler4JXU388LjIZVD3AqJ7c4yXeLIqa5swk-HItoA59QT9TCWFUyM2_pQKiaopPzX46bACp-P_HCtLNU6KFy9xEXI4IkZkYkImtshU4GjS4q2k2fij7johMalnQajA7hjK2K7DQez6aH8EuXnbv7c6gPlW9zqKo3bnagcWqJ_yMu4uzAz7I70Hc-p92Bv094vJ9gkGLdFt |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=CALVIN%3A+A+Benchmark+for+Language-Conditioned+Policy+Learning+for+Long-Horizon+Robot+Manipulation+Tasks&rft.jtitle=IEEE+robotics+and+automation+letters&rft.au=Mees%2C+Oier&rft.au=Hermann%2C+Lukas&rft.au=Rosete-Beas%2C+Erick&rft.au=Burgard%2C+Wolfram+Burgard&rft.date=2022-07-01&rft.issn=2377-3766&rft.eissn=2377-3766&rft.volume=7&rft.issue=3&rft.spage=7327&rft.epage=7334&rft_id=info:doi/10.1109%2FLRA.2022.3180108&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_LRA_2022_3180108 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2377-3766&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2377-3766&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2377-3766&client=summon |