TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments

We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task. We introduce the Touchdown task and dataset, where an agent must first follow navigation instructions in a Street View environment to a goal position, and then guess a location in its...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 12530 - 12539
Main Authors: Chen, Howard, Suhr, Alane, Misra, Dipendra, Snavely, Noah, Artzi, Yoav
Format: Conference Proceeding
Language:English
Published: IEEE 01.06.2019
Subjects:
ISSN:1063-6919
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task. We introduce the Touchdown task and dataset, where an agent must first follow navigation instructions in a Street View environment to a goal position, and then guess a location in its observed environment described in natural language to find a hidden object. The data contains 9326 examples of English instructions and spatial descriptions paired with demonstrations. We perform qualitative linguistic analysis, and show that the data displays a rich use of spatial reasoning. Empirical analysis shows the data presents an open challenge to existing methods.
AbstractList We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task. We introduce the Touchdown task and dataset, where an agent must first follow navigation instructions in a Street View environment to a goal position, and then guess a location in its observed environment described in natural language to find a hidden object. The data contains 9326 examples of English instructions and spatial descriptions paired with demonstrations. We perform qualitative linguistic analysis, and show that the data displays a rich use of spatial reasoning. Empirical analysis shows the data presents an open challenge to existing methods.
Author Suhr, Alane
Chen, Howard
Artzi, Yoav
Misra, Dipendra
Snavely, Noah
Author_xml – sequence: 1
  givenname: Howard
  surname: Chen
  fullname: Chen, Howard
  organization: Cornell
– sequence: 2
  givenname: Alane
  surname: Suhr
  fullname: Suhr, Alane
  organization: Cornell Univ
– sequence: 3
  givenname: Dipendra
  surname: Misra
  fullname: Misra, Dipendra
  organization: Cornell Univ
– sequence: 4
  givenname: Noah
  surname: Snavely
  fullname: Snavely, Noah
  organization: Cornell Univ. and Google AI
– sequence: 5
  givenname: Yoav
  surname: Artzi
  fullname: Artzi, Yoav
  organization: Cornell Univ
BookMark eNotjEtLAzEURqMoWGvXLtzkD8yYm0xe7mSsViit9CW4KZnmdoi0mTKPgv_eAV19h3PguyVXsYpIyD2wFIDZx3zzsUg5A5sy4IZfkJHVBjQ3ILgV5pIMgCmRKAv2hoya5psxJjiAsmZAvlbzdT55mX_OnujMtV3tDnTqYtm5EntxDqVrQxWpi54uTz33fYGuqWKIJQ2RbkLT9W7Z1ogtHcdzqKt4xNg2d-R67w4Njv53SNav41U-Sabzt_f8eZoEzkSbCOV5Zhn3hZdFAQKNUHsrJXqr9I5rkJlCLa0yxU5z5UGDRpmZbMd8UTAjhuTh7zcg4vZUh6Orf7bGykz09ReJ8FOY
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR.2019.01282
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 9781728132938
1728132932
EISSN 1063-6919
EndPage 12539
ExternalDocumentID 8954308
Genre orig-research
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i203t-36d24902dbd5bb13e836f955ed967c271546e75968bc726d1717e5484c0dbb083
IEDL.DBID RIE
ISICitedReferencesCount 188
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000542649306016&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:24:35 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-36d24902dbd5bb13e836f955ed967c271546e75968bc726d1717e5484c0dbb083
PageCount 10
ParticipantIDs ieee_primary_8954308
PublicationCentury 2000
PublicationDate 2019-June
PublicationDateYYYYMMDD 2019-06-01
PublicationDate_xml – month: 06
  year: 2019
  text: 2019-June
PublicationDecade 2010
PublicationTitle Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev CVPR
PublicationYear 2019
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003211698
Score 2.5959425
Snippet We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task. We introduce the Touchdown task and...
SourceID ieee
SourceType Publisher
StartPage 12530
SubjectTerms Cognition
Data collection
Datasets and Evaluation
Linguistics
Navigation
Spatial databases
Urban areas
Vision + Language
Visual Reasoning
Visualization
Title TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments
URI https://ieeexplore.ieee.org/document/8954308
WOSCitedRecordID wos000542649306016&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3Na8IwFA9OdtjJbTr2TQ47rpo2bdLs6hQPUkXUyS7SJK_QSxWr_v1L0uI22GW38MIj8ELyPn_vIfSiUhHIUDMvE5J6oUiVl2rf97SvqOSGiTgszHLMkyRercS0gV5PWBgAcMVn0LVLl8vXG3WwobJeLKKQWmTvGee8wmqd4inUeDJMxHX3Hp-IXn85ndnaLdG1n3Dwa3yK0x7D1v_OvUSdbxgenp4UzBVqQHGNWrXdiOtXWbbR53yy6I_eJx_JG05S10gDj-s4pCEcXReNTYHTwnDZCmqzP4O0dJFYnBd4mZcHQ6sy1HjwA_vWQYvhYN4fefXMBC8PCN17lGnjUJFASx1J6VOIKctEFIEWjKuAG4uJAY8Ei6XiAdO-cefAeC2hIlpKY4_doGaxKeAWYQIQS-YmGclQgRBZzLSCgDChfZJmd6htRbXeVm0x1rWU7v8mP6ALexdVldUjau53B3hC5-q4z8vds7vLL0jKoOA
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NT8IwFG8ImugJFYzf9uDRQT-2bvWKEIw4CAEkXsj6sWSXYRjw99t2C2rixVvzmpcmr2nf5-89AB5kwonwFfNSLqjn80R6icLYU1hSERom5LAw82EYx9Fiwcc18LjHwmitXfGZbtuly-WrldzaUFkn4oFPLbL3IPB9gku01j6iQo0vw3hU9e_BiHe68_HEVm_xtv2Gya8BKk5_9Bv_O_kEtL6BeHC8VzGnoKbzM9CoLEdYvcuiCT6mo1l38Dx6j59gnLhWGnBYRSINYef6aKxymOSGy9ZQm_2JTgoXi4VZDudZsTW0MkcNez_Qby0w6_em3YFXTU3wMoLoxqNMGZcKESVUIASmOqIs5UGgFWehJKGxmZgOA84iIUPCFDYOnTZ-iy-REsJYZOegnq9yfQEg0joSzM0yEr7UnKcRU1ITxLjCKEkvQdOKavlZNsZYVlK6-pt8D44G07fhcvgSv16DY3svZc3VDahv1lt9Cw7lbpMV6zt3r1_WRKQn
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=TOUCHDOWN%3A+Natural+Language+Navigation+and+Spatial+Reasoning+in+Visual+Street+Environments&rft.au=Chen%2C+Howard&rft.au=Suhr%2C+Alane&rft.au=Misra%2C+Dipendra&rft.au=Snavely%2C+Noah&rft.date=2019-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=12530&rft.epage=12539&rft_id=info:doi/10.1109%2FCVPR.2019.01282&rft.externalDocID=8954308