CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

General-purpose robots coexisting with humans in their environment must learn to relate human language to their perceptions and actions to be useful in a range of daily tasks. Moreover, they need to acquire a diverse repertoire of general-purpose skills that allow composing long-horizon tasks by fol...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE robotics and automation letters Ročník 7; číslo 3; s. 7327 - 7334
Hlavní autoři: Mees, Oier, Hermann, Lukas, Rosete-Beas, Erick, Burgard, Wolfram Burgard
Médium: Journal Article
Jazyk:angličtina
Vydáno: Piscataway IEEE 01.07.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:2377-3766, 2377-3766
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract General-purpose robots coexisting with humans in their environment must learn to relate human language to their perceptions and actions to be useful in a range of daily tasks. Moreover, they need to acquire a diverse repertoire of general-purpose skills that allow composing long-horizon tasks by following unconstrained language instructions. In this letter, we present Composing Actions from Language and Vision (CALVIN) ( C omposing A ctions from L anguage and Vi sio n ), an open-source simulated benchmark to learn long-horizon language-conditioned tasks. Our aim is to make it possible to develop agents that can solve many robotic manipulation tasks over a long horizon, from onboard sensors, and specified only via human language. CALVIN tasks are more complex in terms of sequence length, action space, and language than existing vision-and-language task datasets and supports flexible specification of sensor suites. We evaluate the agents in zero-shot to novel language instructions and to novel environments. We show that a baseline model based on multi-context imitation learning performs poorly on CALVIN, suggesting that there is significant room for developing innovative agents that learn to relate human language to their world models with this benchmark.
AbstractList General-purpose robots coexisting with humans in their environment must learn to relate human language to their perceptions and actions to be useful in a range of daily tasks. Moreover, they need to acquire a diverse repertoire of general-purpose skills that allow composing long-horizon tasks by following unconstrained language instructions. In this letter, we present Composing Actions from Language and Vision (CALVIN) ( C omposing A ctions from L anguage and Vi sio n ), an open-source simulated benchmark to learn long-horizon language-conditioned tasks. Our aim is to make it possible to develop agents that can solve many robotic manipulation tasks over a long horizon, from onboard sensors, and specified only via human language. CALVIN tasks are more complex in terms of sequence length, action space, and language than existing vision-and-language task datasets and supports flexible specification of sensor suites. We evaluate the agents in zero-shot to novel language instructions and to novel environments. We show that a baseline model based on multi-context imitation learning performs poorly on CALVIN, suggesting that there is significant room for developing innovative agents that learn to relate human language to their world models with this benchmark.
Author Hermann, Lukas
Burgard, Wolfram Burgard
Mees, Oier
Rosete-Beas, Erick
Author_xml – sequence: 1
  givenname: Oier
  orcidid: 0000-0001-6020-9744
  surname: Mees
  fullname: Mees, Oier
  email: meeso@informatik.uni-freiburg.de
  organization: University of Freiburg, Freiburg, Germany
– sequence: 2
  givenname: Lukas
  surname: Hermann
  fullname: Hermann, Lukas
  email: hermannl@informatik.uni-freiburg.de
  organization: University of Freiburg, Freiburg, Germany
– sequence: 3
  givenname: Erick
  surname: Rosete-Beas
  fullname: Rosete-Beas, Erick
  email: erick.rosete@students.uni-freiburg.de
  organization: University of Freiburg, Freiburg, Germany
– sequence: 4
  givenname: Wolfram Burgard
  orcidid: 0000-0002-5680-6500
  surname: Burgard
  fullname: Burgard, Wolfram Burgard
  email: burgard@informatik.uni-freiburg.de
  organization: University of Freiburg, Freiburg, Germany
BookMark eNp9kDtPwzAUhS1UJErpjsRiiTnFjyS22UoFtFJ4qCqskZPYwW2wi5MM5deTkgohBqZ7h_Pdc885BQPrrALgHKMJxkhcJcvphCBCJhRzhBE_AkNCGQsoi-PBr_0EjOt6jRDCEWFUREOgZ9PkdfF4DafwRtn87V36DdTOw0TaspWlCmbOFqYxnWEBn11l8h1MlPTW2LIXOlsGc-fNp7Nw6TLXwAdpzbat5J6CK1lv6jNwrGVVq_FhjsDL3e1qNg-Sp_tF90KQE4GbgOWxEDpHWBaYc6YER6yQOlY6yzDJeE7CWHOiIsQpIZFkUcgIpiHSYRZKLOkIXPZ3t959tKpu0rVrve0sUxIzQQWJUNSpUK_Kvatrr3S69aZLvksxSveFpl2h6b7Q9FBoh8R_kNw03wEbL031H3jRg0Yp9eMjGOeIxPQLwmCDFQ
CODEN IRALC6
CitedBy_id crossref_primary_10_1016_j_robot_2022_104294
crossref_primary_10_1007_s10514_023_10129_1
crossref_primary_10_1109_LRA_2024_3421849
crossref_primary_10_1177_02783649251351658
crossref_primary_10_1109_LRA_2025_3526436
crossref_primary_10_1109_LRA_2025_3595034
crossref_primary_10_1080_01691864_2024_2408593
crossref_primary_10_1109_TRO_2025_3577437
crossref_primary_10_1109_LRA_2024_3477095
crossref_primary_10_1016_j_comcom_2024_04_029
crossref_primary_10_1016_j_inffus_2025_103652
crossref_primary_10_1109_LRA_2024_3433309
crossref_primary_10_1007_s12555_024_0438_7
crossref_primary_10_1080_01691864_2024_2379381
crossref_primary_10_1109_LRA_2025_3585390
crossref_primary_10_1007_s10514_023_10131_7
crossref_primary_10_1007_s10514_023_10134_4
crossref_primary_10_1016_j_inffus_2025_103198
crossref_primary_10_1109_LRA_2022_3196123
crossref_primary_10_1109_LRA_2023_3313058
crossref_primary_10_1109_LRA_2024_3443610
crossref_primary_10_3389_frobt_2025_1606247
crossref_primary_10_1016_j_engappai_2025_111004
crossref_primary_10_1109_LRA_2024_3466076
crossref_primary_10_1109_LRA_2025_3597846
crossref_primary_10_1631_FITEE_2300548
crossref_primary_10_1016_j_neucom_2025_129963
crossref_primary_10_1002_aaai_12197
crossref_primary_10_1016_j_imavis_2024_105280
crossref_primary_10_1177_02783649241304789
crossref_primary_10_1109_LRA_2025_3575013
crossref_primary_10_1016_j_jmsy_2024_05_003
crossref_primary_10_7717_peerj_cs_2097
crossref_primary_10_1007_s11081_025_09990_z
Cites_doi 10.1007/978-3-030-58621-8_45
10.1109/CVPR.2018.00387
10.1109/ICRA40945.2020.9196582
10.1109/ICRA.2019.8793485
10.15607/RSS.2020.XVI.080
10.3115/v1/D14-1086
10.15607/RSS.2016.XII.037
10.1109/LRA.2022.3146945
10.1177/02783649211046285
10.18653/v1/D17-1106
10.15607/RSS.2021.XVII.047
10.1109/CVPR42600.2020.01075
10.1109/IROS.2016.7759048
10.1109/ICRA48506.2021.9560895
10.1007/978-3-030-71151-1_43
10.1146/annurev-control-101119-071628
10.15607/RSS.2018.XIV.028
10.1109/ICRA.2018.8460699
10.15607/RSS.2021.XVII.020
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/LRA.2022.3180108
DatabaseName IEEE Xplore (IEEE)
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList Technology Research Database

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2377-3766
EndPage 7334
ExternalDocumentID 10_1109_LRA_2022_3180108
9788026
Genre orig-research
GrantInformation_xml – fundername: German Federal Ministry of Education and Research
  grantid: 01IS18040B-OML
GroupedDBID 0R~
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFS
AGQYO
AGSQL
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
IFIPE
IPLJI
JAVBF
KQ8
M43
M~E
O9-
OCL
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c291t-7c699fc01ad1887e9807daf6efbb12b8c246f82e5083225a754721340f4b4a1a3
IEDL.DBID RIE
ISICitedReferencesCount 94
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000814637000014&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2377-3766
IngestDate Sun Nov 30 04:06:16 EST 2025
Sat Nov 29 06:03:17 EST 2025
Tue Nov 18 19:41:45 EST 2025
Wed Aug 27 02:23:54 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 3
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c291t-7c699fc01ad1887e9807daf6efbb12b8c246f82e5083225a754721340f4b4a1a3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-5680-6500
0000-0001-6020-9744
PQID 2679392505
PQPubID 4437225
PageCount 8
ParticipantIDs crossref_citationtrail_10_1109_LRA_2022_3180108
ieee_primary_9788026
proquest_journals_2679392505
crossref_primary_10_1109_LRA_2022_3180108
PublicationCentury 2000
PublicationDate 2022-07-01
PublicationDateYYYYMMDD 2022-07-01
PublicationDate_xml – month: 07
  year: 2022
  text: 2022-07-01
  day: 01
PublicationDecade 2020
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE robotics and automation letters
PublicationTitleAbbrev LRA
PublicationYear 2022
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref35
ref13
ref34
ref12
ref15
ref36
ref14
ref33
wang (ref31) 2020; 33
ref11
radford (ref37) 0
liu (ref18) 2022
shridhar (ref23) 0
stepputtis (ref24) 2020; 33
ref17
yu (ref2) 0
ref16
ref19
lu (ref10) 2019; 32
kalashnikov (ref3) 0
nair (ref7) 0
lynch (ref1) 0
ref26
haan (ref32) 2019; 32
ref22
ref21
yu (ref20) 2018
yu (ref38) 2021; 34
coumans (ref28) 2016
jang (ref25) 0
ref8
young (ref4) 2021
ref9
kaelbling (ref29) 0
blukis (ref27) 0
ref6
ref5
andrychowicz (ref30) 2017; 30
References_xml – year: 2016
  ident: ref28
  article-title: Pybullet, a python module for physics simulation for games, robotics and machine learning
– ident: ref35
  doi: 10.1007/978-3-030-58621-8_45
– ident: ref21
  doi: 10.1109/CVPR.2018.00387
– ident: ref36
  doi: 10.1109/ICRA40945.2020.9196582
– volume: 32
  start-page: 11698
  year: 2019
  ident: ref32
  article-title: Causal confusion in imitation learning
  publication-title: Neural Inf Process Syst
– ident: ref34
  doi: 10.1109/ICRA.2019.8793485
– start-page: 894
  year: 0
  ident: ref23
  article-title: CLIPort: What and where pathways for robotic manipulation
  publication-title: Proc Conf Robot Learn
– volume: 33
  start-page: 5776
  year: 2020
  ident: ref31
  article-title: Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers
  publication-title: Neural Inf Process Syst
– ident: ref14
  doi: 10.15607/RSS.2020.XVI.080
– start-page: 1113
  year: 0
  ident: ref1
  article-title: Learning latent plans from play
  publication-title: Proc Conf Robot Learn
– volume: 32
  start-page: 13
  year: 2019
  ident: ref10
  article-title: Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks
  publication-title: Neural Inf Process Syst
– start-page: 1094
  year: 0
  ident: ref2
  article-title: Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning
  publication-title: Proc Conf Robot Learn
– year: 2022
  ident: ref18
  article-title: Structformer: Learning spatial structure for language-guided semantic rearrangement of novel objects
  publication-title: ICRA
– ident: ref9
  doi: 10.3115/v1/D14-1086
– volume: 30
  start-page: 5055
  year: 2017
  ident: ref30
  article-title: Hindsight experience replay
  publication-title: Neural Inf Process Syst
– ident: ref11
  doi: 10.15607/RSS.2016.XII.037
– year: 2021
  ident: ref4
  article-title: Playful interactions for representation learning
– ident: ref5
  doi: 10.1109/LRA.2022.3146945
– volume: 34
  start-page: 11501
  year: 2021
  ident: ref38
  article-title: Conservative data sharing for multi-task offline reinforcement learning
  publication-title: Neural Inf Process Syst
– ident: ref26
  doi: 10.1177/02783649211046285
– start-page: 8748
  year: 0
  ident: ref37
  article-title: Learning transferable visual models from natural language supervision
– ident: ref19
  doi: 10.18653/v1/D17-1106
– ident: ref6
  doi: 10.15607/RSS.2021.XVII.047
– start-page: 1303
  year: 0
  ident: ref7
  article-title: Learning language-conditioned robot behavior from offline data and crowd-sourced annotation
  publication-title: Proc Conf Robot Learn
– ident: ref22
  doi: 10.1109/CVPR42600.2020.01075
– ident: ref33
  doi: 10.1109/IROS.2016.7759048
– start-page: 505
  year: 0
  ident: ref27
  article-title: Mapping navigation instructions to continuous control actions with position-visitation prediction
  publication-title: Proc Conf Robot Learn
– volume: 33
  start-page: 13139
  year: 2020
  ident: ref24
  article-title: Language-conditioned imitation learning for robot manipulation tasks
  publication-title: Neural Inf Process Syst
– ident: ref17
  doi: 10.1109/ICRA48506.2021.9560895
– ident: ref16
  doi: 10.1007/978-3-030-71151-1_43
– ident: ref8
  doi: 10.1146/annurev-control-101119-071628
– ident: ref12
  doi: 10.15607/RSS.2018.XIV.028
– start-page: 991
  year: 0
  ident: ref25
  article-title: BC-0: Zero-shot task generalization with robotic imitation learning
  publication-title: Proc Conf Robot Learn
– start-page: 1094
  year: 0
  ident: ref29
  article-title: Learning to achieve goals
  publication-title: Proc Int Joint Conf Artif Intell
– ident: ref13
  doi: 10.1109/ICRA.2018.8460699
– year: 2018
  ident: ref20
  article-title: Interactive grounded language acquisition and generalization in a 2D world
  publication-title: Proc Int Conf Learn Representations
– ident: ref15
  doi: 10.15607/RSS.2021.XVII.020
– start-page: 557
  year: 0
  ident: ref3
  article-title: Scaling up multi-task robotic reinforcement learning
  publication-title: Proc 5th Conf Robot Learn
SSID ssj0001527395
Score 2.6338027
Snippet General-purpose robots coexisting with humans in their environment must learn to relate human language to their perceptions and actions to be useful in a range...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 7327
SubjectTerms Benchmark testing
Benchmarks
Cameras
Data sets for robot learning
Grippers
Horizon
imitation learning
Language
machine learning for robot control
natural dialog for HRI
Robot learning
Robot sensing systems
Robot vision systems
Robots
Task analysis
Task complexity
Title CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
URI https://ieeexplore.ieee.org/document/9788026
https://www.proquest.com/docview/2679392505
Volume 7
WOSCitedRecordID wos000814637000014&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Xplore
  customDbUrl:
  eissn: 2377-3766
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001527395
  issn: 2377-3766
  databaseCode: RIE
  dateStart: 20160101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2377-3766
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001527395
  issn: 2377-3766
  databaseCode: M~E
  dateStart: 20160101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFH848aAHv8X5MXLwIhjXdV3TeJvDMWEOEZXdSpomc0xb2aYHD_7tvtd2c6AI9pRDEkJ_eV95ye8BnFjPU-Sp88DaiKMkCh6hWeGxkhK907owOsiKTYheL-j35e0SnM3fwhhjsstn5pyaWS4_TvUbHZURG2yAMUMJSkL4-Vut7_MUYhKTjVkm0pHV7l0T4z_XxbAU1TDVj1ywPFkplR_6NzMq7Y3_LWcT1gvnkTVztLdgySTbsLZAKbgDttXsPl73LliTXeIGfHpR4xFDx5R1i4NJ3kopS00ERTHLWYFZQbI6yDumyYB30vHwI03YXRqlU3ajkuGszhe7V5PRZBce2lf3rQ4vailw7cralAvtS2m1U1NxDfWKkYEjYmV9YyPEJgq06_k2cA2xw6OIK9HwBJG9OdaLPFVT9T1YTnBl-8CcRoReplbSqRMXf0yNSNe1H1v8TFCG6uw_h7ogGqd6F89hFnA4MkRkQkImLJApw-l8xGtOsvFH3x1CYt6vAKEMRzMow0IKJ6Hro_aR5OQd_D7qEFZp7vz67REsT8dv5hhW9Pt0OBlXoHTzeVXJttkX-SLPVQ
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8QwEB58gXrwLa7PHLwIRrvdbtt4WxdlF-sisoq3kqaJLmor-_Dgr3emza6CIthTDgkJ_ZLJTCb5PoBD43mSPHUeGpNwXIkBT3Bb4akUAr3TWqBVWIhNBJ1O-PAgbqbgePIWRmtdXD7TJ1QscvlprkZ0VEZssCHGDNMwS8pZ9rXW14kKcYmJ-jgX6YjT6LaBEaDrYmCKhpgUJL_tPYWYyg8LXGwrl8v_G9AKLFn3kTVKvFdhSmdrsPiNVHAdTLMR3bc7Z6zBznEKPr3K_jND15RF9miSN3PKUxNFUcpKXmBmaVYfy4p59shbeb_3kWfsNk_yIbuWWW-s9MW6cvA82IC7y4tus8WtmgJXrqgOeaB8IYxyqjKtomXRInSCVBpfmwTRSULler4JXU388LjIZVD3AqJ7c4yXeLIqa5swk-HItoA59QT9TCWFUyM2_pQKiaopPzX46bACp-P_HCtLNU6KFy9xEXI4IkZkYkImtshU4GjS4q2k2fij7johMalnQajA7hjK2K7DQez6aH8EuXnbv7c6gPlW9zqKo3bnagcWqJ_yMu4uzAz7I70Hc-p92Bv094vJ9gkGLdFt
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=CALVIN%3A+A+Benchmark+for+Language-Conditioned+Policy+Learning+for+Long-Horizon+Robot+Manipulation+Tasks&rft.jtitle=IEEE+robotics+and+automation+letters&rft.au=Mees%2C+Oier&rft.au=Hermann%2C+Lukas&rft.au=Rosete-Beas%2C+Erick&rft.au=Burgard%2C+Wolfram+Burgard&rft.date=2022-07-01&rft.issn=2377-3766&rft.eissn=2377-3766&rft.volume=7&rft.issue=3&rft.spage=7327&rft.epage=7334&rft_id=info:doi/10.1109%2FLRA.2022.3180108&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_LRA_2022_3180108
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2377-3766&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2377-3766&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2377-3766&client=summon