ImageInThat: Manipulating Images to Convey User Instructions to Robots
Foundation models are rapidly improving the capability of robots in performing everyday tasks autonomously such as meal preparation, yet robots will still need to be instructed by humans due to model performance, the difficulty of capturing user preferences, and the need for user agency. Robots can...
Uložené v:
| Vydané v: | 2025 20th ACM/IEEE International Conference on Human-Robot Interaction (HRI) s. 757 - 766 |
|---|---|
| Hlavní autori: | , , , , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
04.03.2025
|
| Predmet: | |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Foundation models are rapidly improving the capability of robots in performing everyday tasks autonomously such as meal preparation, yet robots will still need to be instructed by humans due to model performance, the difficulty of capturing user preferences, and the need for user agency. Robots can be instructed using various methods-natural language conveys immediate instructions but can be abstract or ambiguous, whereas end-user programming supports longer-horizon tasks but interfaces face difficulties in capturing user intent. In this work, we propose using direct manipulation of images as an alternative paradigm to instruct robots, and introduce a specific instantiation called ImageInThat which allows users to perform direct manipulation on images in a timeline-style interface to generate robot instructions. Through a user study, we demonstrate the efficacy of ImageInThat to instruct robots in kitchen manipulation tasks, comparing it to a text-based natural language instruction method. The results show that participants were faster with ImageInThat and preferred to use it over the text-based method. Supplementary material including code can be found at: https://image-in-that.github.io/. |
|---|---|
| AbstractList | Foundation models are rapidly improving the capability of robots in performing everyday tasks autonomously such as meal preparation, yet robots will still need to be instructed by humans due to model performance, the difficulty of capturing user preferences, and the need for user agency. Robots can be instructed using various methods-natural language conveys immediate instructions but can be abstract or ambiguous, whereas end-user programming supports longer-horizon tasks but interfaces face difficulties in capturing user intent. In this work, we propose using direct manipulation of images as an alternative paradigm to instruct robots, and introduce a specific instantiation called ImageInThat which allows users to perform direct manipulation on images in a timeline-style interface to generate robot instructions. Through a user study, we demonstrate the efficacy of ImageInThat to instruct robots in kitchen manipulation tasks, comparing it to a text-based natural language instruction method. The results show that participants were faster with ImageInThat and preferred to use it over the text-based method. Supplementary material including code can be found at: https://image-in-that.github.io/. |
| Author | Tang, Anthony Mahadevan, Karthik Lewis, Blaine Mutlu, Bilge Grossman, Tovi Li, Jiannan |
| Author_xml | – sequence: 1 givenname: Karthik surname: Mahadevan fullname: Mahadevan, Karthik email: karthikm@dgp.toronto.edu organization: University of Toronto,Department of Computer Science,Toronto,Canada – sequence: 2 givenname: Blaine surname: Lewis fullname: Lewis, Blaine email: blaine@dgp.toronto.edu organization: University of Toronto,Department of Computer Science,Toronto,Canada – sequence: 3 givenname: Jiannan surname: Li fullname: Li, Jiannan email: jiannanli@smu.edu.sg organization: School of Computing & Information Systems, Singapore Management University,Singapore,Singapore – sequence: 4 givenname: Bilge surname: Mutlu fullname: Mutlu, Bilge email: bilge@cs.wisc.edu organization: University of Wisconsin-Madison,Department of Computer Sciences,Madison,USA – sequence: 5 givenname: Anthony surname: Tang fullname: Tang, Anthony email: tonyt@smu.edu.sg organization: School of Computing & Information Systems, Singapore Management University,Singapore,Singapore – sequence: 6 givenname: Tovi surname: Grossman fullname: Grossman, Tovi email: tovi@dgp.toronto.edu organization: University of Toronto,Department of Computer Science,Toronto,Canada |
| BookMark | eNo1j81qAjEYRVOoi9b6BqXkBWaaLz-TpLsy1DpgEUTXksTEBjSRmVjw7dtqu7pwDxzuvUe3KSeP0BOQGoDo59mya0AQUlNCRf3TSA5S36CJlloxQZhUmsEdmnYHs_NdWn2a8oI_TIrH096UmHb4QgZcMm5z-vJnvB58j7s0lP7kSszpwpbZ5jI8oFEw-8FP_nKM1tO3VTur5ov3rn2dVxGkKhVoAY3cWkMDtcFwYokUAML9bmWhsbzZUketU9Z5ZpgJzHMVqCK8cYIpNkaPV2_03m-OfTyY_rz5v8e-AQdVScE |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/HRI61500.2025.10974179 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library (IEL) (UW System Shared) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library (IEL) (UW System Shared) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798350378931 |
| EndPage | 766 |
| ExternalDocumentID | 10974179 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Science Foundation grantid: IIS-1925043 funderid: 10.13039/100000001 |
| GroupedDBID | 6IE 6IL CBEJK RIE RIL |
| ID | FETCH-LOGICAL-i178t-195167dba2f2bfa40b075115c61503f6b46d2c2bc8bce3a3af3e48f28046c5383 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001492540600078&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Thu May 29 05:57:37 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i178t-195167dba2f2bfa40b075115c61503f6b46d2c2bc8bce3a3af3e48f28046c5383 |
| PageCount | 10 |
| ParticipantIDs | ieee_primary_10974179 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-March-4 |
| PublicationDateYYYYMMDD | 2025-03-04 |
| PublicationDate_xml | – month: 03 year: 2025 text: 2025-March-4 day: 04 |
| PublicationDecade | 2020 |
| PublicationTitle | 2025 20th ACM/IEEE International Conference on Human-Robot Interaction (HRI) |
| PublicationTitleAbbrev | HRI |
| PublicationYear | 2025 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 1.9007237 |
| Snippet | Foundation models are rapidly improving the capability of robots in performing everyday tasks autonomously such as meal preparation, yet robots will still need... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 757 |
| SubjectTerms | Codes direct manipulation end-user robot programming Faces Foundation models Human-robot interaction Natural languages Prototypes robot instruction following Robot programming Robots |
| Title | ImageInThat: Manipulating Images to Convey User Instructions to Robots |
| URI | https://ieeexplore.ieee.org/document/10974179 |
| WOSCitedRecordID | wos001492540600078&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8MwDLbYxIETIIZ4Kweu3dY0a1quE9MqwTRNm7TbFOcBO9Ai1iHx73GyjYkDB25VHLWyk8Z27M8GuHcx-ugbRrZnyEGhszHKu0pENJqbLEUp0wAUfpKjUTaf5-MtWD1gYay1IfnMtv1jiOWbSq_9VVnHR0t9x6wGNOgdG7DWFvVLpM5wUvjy5l3y-nivvZv8q21K0BqD439-7wRae_wdG_9ollM4sOUZDIo3-veLcvqq6gf2rMrlpvVW-cICZcXqivV9EvkXm9HGYsW-OGygTSqs6lULZoPHaX8YbbsgRMtYZnUUkw2USoOKO45OiS6Slic7TnteE5eiSA3XHHWG2iYqUS6xInM8I89X03GWnEOzrEp7AQyRy54xsVCxFsE4RJvLRMXSGaF4egktL4TF-6bQxWLH_9Uf49dw5EUdUrLEDTSJKXsLh_qzXq4-7sLyfAPUhpH2 |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8MwDLZgIMEJEEO8yYFrtzXNmpbrxLSKbZqmTdptivOAHWgRLUj8e5KsY-LAgVsVV4qcNLFd-_MHcG9CdNk3DHRX2QDF3o1B2hEssKOpSmLkPPZA4SEfj5PFIp3UYHWPhdFa--Iz3XKPPpevCvnhfpW1XbbUMWbtwp6jzqrhWjXu1wrbg2nmGpx3bNxHu63N67-IU7zd6B_9c8ZjaG4ReGTyY1tOYEfnp9DPXu3pz_LZi6geyEjkqzX5Vv5MvKQkVUF6roz8i8ztp0WybXtYL5sWWFRlE-b9x1lvENQ8CMEq5EkVhNYLirlCQQ1FI1gHrZ23npx0ukYmRhYrKinKBKWORCRMpFliaGJjX2kvtOgMGnmR63MgiJR3lQqZCCXz7iHqlEci5EYxQeMLaLpFWL6tW10sN_pf_jF-BweD2Wi4HGbjpys4dMvuC7TYNTSsgvoG9uVntSrfb_1WfQOghJU_ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+20th+ACM%2FIEEE+International+Conference+on+Human-Robot+Interaction+%28HRI%29&rft.atitle=ImageInThat%3A+Manipulating+Images+to+Convey+User+Instructions+to+Robots&rft.au=Mahadevan%2C+Karthik&rft.au=Lewis%2C+Blaine&rft.au=Li%2C+Jiannan&rft.au=Mutlu%2C+Bilge&rft.date=2025-03-04&rft.pub=IEEE&rft.spage=757&rft.epage=766&rft_id=info:doi/10.1109%2FHRI61500.2025.10974179&rft.externalDocID=10974179 |