TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments
We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task. We introduce the Touchdown task and dataset, where an agent must first follow navigation instructions in a Street View environment to a goal position, and then guess a location in its...
Uloženo v:
| Vydáno v: | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) s. 12530 - 12539 |
|---|---|
| Hlavní autoři: | , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
01.06.2019
|
| Témata: | |
| ISSN: | 1063-6919 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task. We introduce the Touchdown task and dataset, where an agent must first follow navigation instructions in a Street View environment to a goal position, and then guess a location in its observed environment described in natural language to find a hidden object. The data contains 9326 examples of English instructions and spatial descriptions paired with demonstrations. We perform qualitative linguistic analysis, and show that the data displays a rich use of spatial reasoning. Empirical analysis shows the data presents an open challenge to existing methods. |
|---|---|
| AbstractList | We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task. We introduce the Touchdown task and dataset, where an agent must first follow navigation instructions in a Street View environment to a goal position, and then guess a location in its observed environment described in natural language to find a hidden object. The data contains 9326 examples of English instructions and spatial descriptions paired with demonstrations. We perform qualitative linguistic analysis, and show that the data displays a rich use of spatial reasoning. Empirical analysis shows the data presents an open challenge to existing methods. |
| Author | Suhr, Alane Chen, Howard Artzi, Yoav Misra, Dipendra Snavely, Noah |
| Author_xml | – sequence: 1 givenname: Howard surname: Chen fullname: Chen, Howard organization: Cornell – sequence: 2 givenname: Alane surname: Suhr fullname: Suhr, Alane organization: Cornell Univ – sequence: 3 givenname: Dipendra surname: Misra fullname: Misra, Dipendra organization: Cornell Univ – sequence: 4 givenname: Noah surname: Snavely fullname: Snavely, Noah organization: Cornell Univ. and Google AI – sequence: 5 givenname: Yoav surname: Artzi fullname: Artzi, Yoav organization: Cornell Univ |
| BookMark | eNotjEtLAzEURqMoWGvXLtzkD8yYm0xe7mSsViit9CW4KZnmdoi0mTKPgv_eAV19h3PguyVXsYpIyD2wFIDZx3zzsUg5A5sy4IZfkJHVBjQ3ILgV5pIMgCmRKAv2hoya5psxJjiAsmZAvlbzdT55mX_OnujMtV3tDnTqYtm5EntxDqVrQxWpi54uTz33fYGuqWKIJQ2RbkLT9W7Z1ogtHcdzqKt4xNg2d-R67w4Njv53SNav41U-Sabzt_f8eZoEzkSbCOV5Zhn3hZdFAQKNUHsrJXqr9I5rkJlCLa0yxU5z5UGDRpmZbMd8UTAjhuTh7zcg4vZUh6Orf7bGykz09ReJ8FOY |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/CVPR.2019.01282 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Applied Sciences |
| EISBN | 9781728132938 1728132932 |
| EISSN | 1063-6919 |
| EndPage | 12539 |
| ExternalDocumentID | 8954308 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO |
| ID | FETCH-LOGICAL-i203t-36d24902dbd5bb13e836f955ed967c271546e75968bc726d1717e5484c0dbb083 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 188 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000542649306016&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:24:35 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i203t-36d24902dbd5bb13e836f955ed967c271546e75968bc726d1717e5484c0dbb083 |
| PageCount | 10 |
| ParticipantIDs | ieee_primary_8954308 |
| PublicationCentury | 2000 |
| PublicationDate | 2019-June |
| PublicationDateYYYYMMDD | 2019-06-01 |
| PublicationDate_xml | – month: 06 year: 2019 text: 2019-June |
| PublicationDecade | 2010 |
| PublicationTitle | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) |
| PublicationTitleAbbrev | CVPR |
| PublicationYear | 2019 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0003211698 |
| Score | 2.5959425 |
| Snippet | We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task. We introduce the Touchdown task and... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 12530 |
| SubjectTerms | Cognition Data collection Datasets and Evaluation Linguistics Navigation Spatial databases Urban areas Vision + Language Visual Reasoning Visualization |
| Title | TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments |
| URI | https://ieeexplore.ieee.org/document/8954308 |
| WOSCitedRecordID | wos000542649306016&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09b8IwELUo6tCJtlD1Wx46NpDYjj-60iIGFBACirqgxGekLKEiwO-v7US0lbp0s86yLNmy79753jNCTzxeQ2T9fKBJZAK2BhJkTNNAOcoXFQBMe3X9kUgSuVyqSQM9H7kwxhhffGa6runf8mGj9y5V1pMqZtQxe0-EEBVX65hPoRbJcCVr9Z4oVL3-YjJ1tVuq6y5h8uv7FO89Bq3_zXuOOt80PDw5OpgL1DDFJWrVcSOuT2XZRh-z8bw_fB2_Jy84Sb2QBh7VeUhrOHgVjU2B08KOchXUtn9q0tJnYnFe4EVe7q2teqHGbz-4bx00H7zN-sOg_jMhyElIdwHlYAFVSCCDOMsiaiTlaxXHBhQXmggbMXEjYsVlpgXhEFk4ZyxqYTqELLPx2BVqFpvCXCMcxUZqCSJSIBkLqWRAFaMW4KQGQkVuUNst1eqzksVY1at0-7f5Dp25vaiqrO5Rc7fdmwd0qg-7vNw--r38AkOfn64 |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFG4ImugJFYy_7cGjg_XnWq8IwTgHIYDEC9nakuwyDAP-ftttQU28eGte0zRp0773vb7vKwAPnC01sn7eUxgZjy419hKqiCcd5YsEWlNVqOuHQRSJ-VyOauBxz4UxxhTFZ6btmsVbvl6prUuVdYRklDhm7wGjFKOSrbXPqBCLZbgUlX4P8mWnOxuNXfWWbLtrGP_6QKXwH_3G_2Y-Aa1vIh4c7V3MKaiZ7Aw0qsgRVucyb4KPyXDaHTwP36MnGMWFlAYMq0ykNewKHY1VBuPMjnI11LZ_bOK8yMXCNIOzNN9aW_lGDXs_2G8tMO33Jt2BV_2a4KXYJxuPcG0hlY91olmSIGIE4UvJmNGSBwoHNmbiJmCSi0QFmGtkAZ2xuIUqXyeJjcjOQT1bZeYCQMSMUEIHSGpBqU8E1URSYiFObLQv8SVouqVafJbCGItqla7-Nt-Do8HkLVyEL9HrNTh2-1LWXN2A-ma9NbfgUO02ab6-K_b1C8IjovU |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=TOUCHDOWN%3A+Natural+Language+Navigation+and+Spatial+Reasoning+in+Visual+Street+Environments&rft.au=Chen%2C+Howard&rft.au=Suhr%2C+Alane&rft.au=Misra%2C+Dipendra&rft.au=Snavely%2C+Noah&rft.date=2019-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=12530&rft.epage=12539&rft_id=info:doi/10.1109%2FCVPR.2019.01282&rft.externalDocID=8954308 |