TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments

We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task. We introduce the Touchdown task and dataset, where an agent must first follow navigation instructions in a Street View environment to a goal position, and then guess a location in its...

Full description

Saved in:

Bibliographic Details
Published in:	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 12530 - 12539
Main Authors:	Chen, Howard, Suhr, Alane, Misra, Dipendra, Snavely, Noah, Artzi, Yoav
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01.06.2019
Subjects:	Cognition Data collection Datasets and Evaluation Linguistics Navigation Spatial databases Urban areas Vision + Language Visual Reasoning Visualization
ISSN:	1063-6919
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task. We introduce the Touchdown task and dataset, where an agent must first follow navigation instructions in a Street View environment to a goal position, and then guess a location in its observed environment described in natural language to find a hidden object. The data contains 9326 examples of English instructions and spatial descriptions paired with demonstrations. We perform qualitative linguistic analysis, and show that the data displays a rich use of spatial reasoning. Empirical analysis shows the data presents an open challenge to existing methods.
AbstractList	We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task. We introduce the Touchdown task and dataset, where an agent must first follow navigation instructions in a Street View environment to a goal position, and then guess a location in its observed environment described in natural language to find a hidden object. The data contains 9326 examples of English instructions and spatial descriptions paired with demonstrations. We perform qualitative linguistic analysis, and show that the data displays a rich use of spatial reasoning. Empirical analysis shows the data presents an open challenge to existing methods.
Author	Suhr, Alane Chen, Howard Artzi, Yoav Misra, Dipendra Snavely, Noah
Author_xml	– sequence: 1 givenname: Howard surname: Chen fullname: Chen, Howard organization: Cornell – sequence: 2 givenname: Alane surname: Suhr fullname: Suhr, Alane organization: Cornell Univ – sequence: 3 givenname: Dipendra surname: Misra fullname: Misra, Dipendra organization: Cornell Univ – sequence: 4 givenname: Noah surname: Snavely fullname: Snavely, Noah organization: Cornell Univ. and Google AI – sequence: 5 givenname: Yoav surname: Artzi fullname: Artzi, Yoav organization: Cornell Univ
BookMark	eNotjEtLAzEURqMoWGvXLtzkD8yYm0xe7mSsViit9CW4KZnmdoi0mTKPgv_eAV19h3PguyVXsYpIyD2wFIDZx3zzsUg5A5sy4IZfkJHVBjQ3ILgV5pIMgCmRKAv2hoya5psxJjiAsmZAvlbzdT55mX_OnujMtV3tDnTqYtm5EntxDqVrQxWpi54uTz33fYGuqWKIJQ2RbkLT9W7Z1ogtHcdzqKt4xNg2d-R67w4Njv53SNav41U-Sabzt_f8eZoEzkSbCOV5Zhn3hZdFAQKNUHsrJXqr9I5rkJlCLa0yxU5z5UGDRpmZbMd8UTAjhuTh7zcg4vZUh6Orf7bGykz09ReJ8FOY
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/CVPR.2019.01282
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences
EISBN	9781728132938 1728132932
EISSN	1063-6919
EndPage	12539
ExternalDocumentID	8954308
Genre	orig-research
GroupedDBID	6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO
ID	FETCH-LOGICAL-i203t-36d24902dbd5bb13e836f955ed967c271546e75968bc726d1717e5484c0dbb083
IEDL.DBID	RIE
ISICitedReferencesCount	188
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000542649306016&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Aug 27 02:24:35 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i203t-36d24902dbd5bb13e836f955ed967c271546e75968bc726d1717e5484c0dbb083
PageCount	10
ParticipantIDs	ieee_primary_8954308
PublicationCentury	2000
PublicationDate	2019-June
PublicationDateYYYYMMDD	2019-06-01
PublicationDate_xml	– month: 06 year: 2019 text: 2019-June
PublicationDecade	2010
PublicationTitle	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev	CVPR
PublicationYear	2019
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0003211698
Score	2.5959425
Snippet	We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task. We introduce the Touchdown task and...
SourceID	ieee
SourceType	Publisher
StartPage	12530
SubjectTerms	Cognition Data collection Datasets and Evaluation Linguistics Navigation Spatial databases Urban areas Vision + Language Visual Reasoning Visualization
Title	TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments
URI	https://ieeexplore.ieee.org/document/8954308
WOSCitedRecordID	wos000542649306016&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3Na8IwFA9OdtjJbTr2TQ47rpo2bdLs6hQPUkXUyS7SJK_QSxWr_v1L0uI22GW38MIj8ELyPn_vIfSiUhHIUDMvE5J6oUiVl2rf97SvqOSGiTgszHLMkyRercS0gV5PWBgAcMVn0LVLl8vXG3WwobJeLKKQWmTvGee8wmqd4inUeDJMxHX3Hp-IXn85ndnaLdG1n3Dwa3yK0x7D1v_OvUSdbxgenp4UzBVqQHGNWrXdiOtXWbbR53yy6I_eJx_JG05S10gDj-s4pCEcXReNTYHTwnDZCmqzP4O0dJFYnBd4mZcHQ6sy1HjwA_vWQYvhYN4fefXMBC8PCN17lGnjUJFASx1J6VOIKctEFIEWjKuAG4uJAY8Ei6XiAdO-cefAeC2hIlpKY4_doGaxKeAWYQIQS-YmGclQgRBZzLSCgDChfZJmd6htRbXeVm0x1rWU7v8mP6ALexdVldUjau53B3hC5-q4z8vds7vLL0jKoOA
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NT8IwFG8ImugJFYzf9uDRQT-2bvWKEIw4CAEkXsj6sWSXYRjw99t2C2rixVvzmpcmr2nf5-89AB5kwonwFfNSLqjn80R6icLYU1hSERom5LAw82EYx9Fiwcc18LjHwmitXfGZbtuly-WrldzaUFkn4oFPLbL3IPB9gku01j6iQo0vw3hU9e_BiHe68_HEVm_xtv2Gya8BKk5_9Bv_O_kEtL6BeHC8VzGnoKbzM9CoLEdYvcuiCT6mo1l38Dx6j59gnLhWGnBYRSINYef6aKxymOSGy9ZQm_2JTgoXi4VZDudZsTW0MkcNez_Qby0w6_em3YFXTU3wMoLoxqNMGZcKESVUIASmOqIs5UGgFWehJKGxmZgOA84iIUPCFDYOnTZ-iy-REsJYZOegnq9yfQEg0joSzM0yEr7UnKcRU1ITxLjCKEkvQdOKavlZNsZYVlK6-pt8D44G07fhcvgSv16DY3svZc3VDahv1lt9Cw7lbpMV6zt3r1_WRKQn
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=TOUCHDOWN%3A+Natural+Language+Navigation+and+Spatial+Reasoning+in+Visual+Street+Environments&rft.au=Chen%2C+Howard&rft.au=Suhr%2C+Alane&rft.au=Misra%2C+Dipendra&rft.au=Snavely%2C+Noah&rft.date=2019-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=12530&rft.epage=12539&rft_id=info:doi/10.1109%2FCVPR.2019.01282&rft.externalDocID=8954308