Action-Inclusive Multi-Future Prediction Using a Generative Model in Human-Related Scenes for Mobile Robots
Mobility in daily unstructured environments, particularly in human-centered scenarios, remains a fundamental challenge for mobile robots. While traditional prediction-based approaches primarily estimate partial features for robot decision making, such as position and velocity, recent world models en...
Saved in:
| Published in: | IEEE access Vol. 13; pp. 167034 - 167044 |
|---|---|
| Main Authors: | , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Piscataway
IEEE
2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 2169-3536, 2169-3536 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Mobility in daily unstructured environments, particularly in human-centered scenarios, remains a fundamental challenge for mobile robots. While traditional prediction-based approaches primarily estimate partial features for robot decision making, such as position and velocity, recent world models enable direct prediction of future sensory data. However, their potentials in human-inclusive environments remain underexplored. To assess the feasibility of world models in facilitating human-robot interactions, we propose a robot framework using a deep generative model that jointly predicts multiple future observations and actions. Our approach leverages first-person-view (FPV) raw sensor data, integrating both observations and actions to enhance predictive capabilities in dynamic human-populated settings. Experimental results demonstrate that our method is capable of generating a range of candidate futures for one condition and planning actions based on observation guidance. These findings highlight the potential of our approach for facilitating autonomous robots' coexistence with human. |
|---|---|
| AbstractList | Mobility in daily unstructured environments, particularly in human-centered scenarios, remains a fundamental challenge for mobile robots. While traditional prediction-based approaches primarily estimate partial features for robot decision making, such as position and velocity, recent world models enable direct prediction of future sensory data. However, their potentials in human-inclusive environments remain underexplored. To assess the feasibility of world models in facilitating human-robot interactions, we propose a robot framework using a deep generative model that jointly predicts multiple future observations and actions. Our approach leverages first-person-view (FPV) raw sensor data, integrating both observations and actions to enhance predictive capabilities in dynamic human-populated settings. Experimental results demonstrate that our method is capable of generating a range of candidate futures for one condition and planning actions based on observation guidance. These findings highlight the potential of our approach for facilitating autonomous robots’ coexistence with human. |
| Author | Xu, Chenfei Nakamura, Yutaka Ahmad, Huthaifa Okadome, Yuya Ishiguro, Hiroshi |
| Author_xml | – sequence: 1 givenname: Chenfei orcidid: 0009-0008-9778-7853 surname: Xu fullname: Xu, Chenfei organization: Graduate School of Engineering Science, Osaka University, Osaka, Japan – sequence: 2 givenname: Huthaifa orcidid: 0000-0002-8865-955X surname: Ahmad fullname: Ahmad, Huthaifa organization: RIKEN Information Research and Development and Strategy Headquarters, RIKEN, Kyoto, Japan – sequence: 3 givenname: Yuya orcidid: 0009-0008-2961-8417 surname: Okadome fullname: Okadome, Yuya organization: Department of Information and Computer Technology, Tokyo University of Science, Tokyo, Japan – sequence: 4 givenname: Hiroshi orcidid: 0000-0002-0805-7648 surname: Ishiguro fullname: Ishiguro, Hiroshi organization: Graduate School of Engineering Science, Osaka University, Osaka, Japan – sequence: 5 givenname: Yutaka orcidid: 0000-0001-6307-5104 surname: Nakamura fullname: Nakamura, Yutaka email: yutaka.nakamura@grp.riken.jp organization: RIKEN Information Research and Development and Strategy Headquarters, RIKEN, Kyoto, Japan |
| BookMark | eNpNkV1LHDEYhUOxUKv-gvYi0OtZ8zWZyeWy-LGgKG69DpnJG8l2TGySEfrvG3ekNTdJDs857wvnKzoKMQBC3yhZUUrU-XqzudjtVoywdsUlpT1ln9Axo1I1vOXy6MP7CzrLeU_q6avUdsfo13osPoZmG8Zpzv4V8O08Fd9czmVOgO8TWH8g8GP24QkbfAUBkikHNFqYsA_4en42oXmAyRSweDdWJGMXUyUGPwF-iEMs-RR9dmbKcPZ-n6DHy4ufm-vm5u5qu1nfNGNdvzSC9KwVsh-VsMQJA6YDytQ4Ot6RQclBccOhN0T2TvRWGl7FKgCvP6jUCdouuTaavX5J_tmkPzoarw9CTE_apOLHCTRXLeulo60TnaCOGicEc9ZZ0zEDdqhZP5aslxR_z5CL3sc5hbq-5qyVtFVKkUrxhRpTzDmB-zeVEv1Wkl5K0m8l6feSquv74vIA8N9BaUcEa_lfm06Pug |
| CODEN | IAECCG |
| Cites_doi | 10.1109/ICCV51070.2023.00387 10.3390/app14010089 10.1109/ICRA57147.2024.10611090 10.1109/iros.2012.6385773 10.3390/s21248229 10.1007/s12065-023-00817-3 10.1109/CVPR52688.2022.01042 10.1038/s41586-025-08744-2 10.1109/MMAR49549.2021.9528442 10.1109/ICCAE59995.2024.10569186 10.1016/c2012-0-06836-6 10.1109/CVPR52734.2025.01472 10.1109/IROS51168.2021.9636613 10.1109/IROS55552.2023.10342447 10.1109/cvpr52688.2022.01842 10.1109/ICRA57147.2024.10610665 10.1016/j.ifacol.2021.10.472 10.1155/2021/6694084 10.1109/LRA.2023.3329626 10.1176/appi.books.9781585622665.33114 10.1109/CVPR52729.2023.00297 10.20965/jrm.2022.p0654 10.1109/LRA.2019.2925731 10.1109/ICRA48891.2023.10161227 10.1016/j.engappai.2023.107631 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025 |
| DBID | 97E ESBDL RIA RIE AAYXX CITATION 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D DOA |
| DOI | 10.1109/ACCESS.2025.3611812 |
| DatabaseName | IEEE Xplore (IEEE) IEEE Xplore Open Access Journals (WRLC) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts METADEX Technology Research Database Materials Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef Materials Research Database Engineered Materials Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace METADEX Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Materials Research Database |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 2169-3536 |
| EndPage | 167044 |
| ExternalDocumentID | oai_doaj_org_article_395286f15f4741f1af442fdfda72aedb 10_1109_ACCESS_2025_3611812 11170425 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: Japan Science and Technology Agency (JST) Moonshot Research and Development Grant through the Development of Semi-Autonomous CA grantid: JPMJMS2011 |
| GroupedDBID | 0R~ 4.4 5VS 6IK 97E AAJGR ABAZT ABVLG ACGFS ADBBV AGSQL ALMA_UNASSIGNED_HOLDINGS BCNDV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD ESBDL GROUPED_DOAJ IPLJI JAVBF KQ8 M43 M~E O9- OCL OK1 RIA RIE RNS AAYXX CITATION 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c361t-40825468c94d0f4aea7e129ccf370b96b93a3e8a068f48d6a3b963e8e348decf3 |
| IEDL.DBID | DOA |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001586194400003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2169-3536 |
| IngestDate | Fri Oct 03 12:40:14 EDT 2025 Sat Nov 01 15:53:13 EDT 2025 Sat Nov 29 07:22:32 EST 2025 Wed Oct 08 06:22:38 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Language | English |
| License | https://creativecommons.org/licenses/by-nc-nd/4.0 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c361t-40825468c94d0f4aea7e129ccf370b96b93a3e8a068f48d6a3b963e8e348decf3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0009-0008-9778-7853 0000-0001-6307-5104 0000-0002-0805-7648 0009-0008-2961-8417 0000-0002-8865-955X |
| OpenAccessLink | https://doaj.org/article/395286f15f4741f1af442fdfda72aedb |
| PQID | 3256159990 |
| PQPubID | 4845423 |
| PageCount | 11 |
| ParticipantIDs | crossref_primary_10_1109_ACCESS_2025_3611812 doaj_primary_oai_doaj_org_article_395286f15f4741f1af442fdfda72aedb ieee_primary_11170425 proquest_journals_3256159990 |
| PublicationCentury | 2000 |
| PublicationDate | 20250000 2025-00-00 20250101 2025-01-01 |
| PublicationDateYYYYMMDD | 2025-01-01 |
| PublicationDate_xml | – year: 2025 text: 20250000 |
| PublicationDecade | 2020 |
| PublicationPlace | Piscataway |
| PublicationPlace_xml | – name: Piscataway |
| PublicationTitle | IEEE access |
| PublicationTitleAbbrev | Access |
| PublicationYear | 2025 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref35 ref34 ref36 ref11 ref33 ref2 ref1 ref17 Blattmann (ref30) 2023 ref16 Vuong (ref19) 2023 ref18 Ha (ref6) Song (ref32) 2020 Lu (ref8); 2025 Xu (ref15) ref24 ref26 ref25 Dosovitskiy (ref12) 2020 ref20 ref22 ref21 Lei Ba (ref31) 2016 Yang (ref10); 2024 ref28 ref27 Li (ref23); 37 ref29 ref7 ref9 ref4 Srivastava (ref37); 30 ref3 Lu (ref14); 2024 ref5 |
| References_xml | – ident: ref13 doi: 10.1109/ICCV51070.2023.00387 – start-page: 2455 volume-title: Proc. 32nd Int. Conf. Neural Inf. Process. Syst. ident: ref6 article-title: Recurrent world models facilitate policy evolution – ident: ref28 doi: 10.3390/app14010089 – ident: ref22 doi: 10.1109/ICRA57147.2024.10611090 – ident: ref36 doi: 10.1109/iros.2012.6385773 – ident: ref20 doi: 10.3390/s21248229 – volume: 2024 start-page: 45210 volume-title: Proc. 12th Int. Conf. Learn. Represent. ident: ref10 article-title: Learning interactive real-world simulators – ident: ref2 doi: 10.1007/s12065-023-00817-3 – year: 2020 ident: ref32 article-title: Denoising diffusion implicit models publication-title: arXiv:2010.02502 – ident: ref34 doi: 10.1109/CVPR52688.2022.01042 – ident: ref7 doi: 10.1038/s41586-025-08744-2 – ident: ref1 doi: 10.1109/MMAR49549.2021.9528442 – ident: ref33 doi: 10.1109/ICCAE59995.2024.10569186 – ident: ref35 doi: 10.1016/c2012-0-06836-6 – ident: ref9 doi: 10.1109/CVPR52734.2025.01472 – volume: 2024 start-page: 19259 volume-title: Proc. Int. Conf. Learn. Represent. (ICLR) ident: ref14 article-title: VDT: General-purpose video diffusion transformers via mask modeling – ident: ref21 doi: 10.1109/IROS51168.2021.9636613 – volume: 30 start-page: 3308 volume-title: Proc. Annu. Conf. Neural Inf. Process. Syst. ident: ref37 article-title: VEEGAN: Reducing mode collapse in GANs using implicit variational learning – ident: ref26 doi: 10.1109/IROS55552.2023.10342447 – ident: ref27 doi: 10.1109/cvpr52688.2022.01842 – year: 2020 ident: ref12 article-title: An image is worth 16×16 words: Transformers for image recognition at scale publication-title: arXiv:2010.11929 – ident: ref17 doi: 10.1109/ICRA57147.2024.10610665 – volume: 37 start-page: 119411 volume-title: Proc. Annu. Conf. Neural Inf. Process. Syst. ident: ref23 article-title: Human-aware vision-and-language navigation: Bridging simulation to reality with dynamic human interactions – ident: ref3 doi: 10.1016/j.ifacol.2021.10.472 – year: 2023 ident: ref19 article-title: Open X-embodiment: Robotic learning datasets and RT-X models publication-title: arXiv:2310.08864 – ident: ref5 doi: 10.1155/2021/6694084 – ident: ref25 doi: 10.1109/LRA.2023.3329626 – ident: ref11 doi: 10.1176/appi.books.9781585622665.33114 – ident: ref29 doi: 10.1109/CVPR52729.2023.00297 – ident: ref4 doi: 10.20965/jrm.2022.p0654 – ident: ref24 doi: 10.1109/LRA.2019.2925731 – ident: ref16 doi: 10.1109/ICRA48891.2023.10161227 – year: 2016 ident: ref31 article-title: Layer normalization publication-title: arXiv:1607.06450 – start-page: 4383 volume-title: Proc. Annu. Conf. Neural Inf. Process. Syst. ident: ref15 article-title: Understanding and improving layer normalization – ident: ref18 doi: 10.1016/j.engappai.2023.107631 – volume: 2025 start-page: 52310 volume-title: Proc. 13th Int. Conf. Learn. Represent. (ICLR) ident: ref8 article-title: Generative world explorer – year: 2023 ident: ref30 article-title: Stable video diffusion: Scaling latent video diffusion models to large datasets publication-title: arXiv:2311.15127 |
| SSID | ssj0000816957 |
| Score | 2.3341572 |
| Snippet | Mobility in daily unstructured environments, particularly in human-centered scenarios, remains a fundamental challenge for mobile robots. While traditional... |
| SourceID | doaj proquest crossref ieee |
| SourceType | Open Website Aggregation Database Index Database Publisher |
| StartPage | 167034 |
| SubjectTerms | Behavioral sciences Data models Data sets for robot learning deep learning methods human-aware motion planning human–robot interaction Mobile robots Planning Predictive models Robot sensing systems Robots Transformers Videos Visualization |
| SummonAdditionalLinks | – databaseName: IEEE Electronic Library (IEL) dbid: RIE link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB61VQ9w4NUiFgrygSNuk9hJ7OOy6opTVdEi9WY59liqQLtoN8vvZ8ZxCxXi0FviTBTH33g8Hs8D4GM0g_cxRWlVjVIb3UrbhiiZeRrro8Kkc7GJ_uLC3NzYyxKsnmNhEDE7n-EpX-az_LgOOzaVndVcJoWYbB_2-76bgrXuDSpcQcK2fcksVFf2bL5Y0E_QHrBpT1XHEZbNg9UnJ-kvVVX-EcV5fVk-f2TPXsCzokiK-YT8S9jD1St4-ld6wSP4Ps9BC5JkwI8du6mLHG4rlzmPiLjc8CENU4jsOCC8mJJQj5mUS-SI25XIZn6ZneYwiqvAwlGQqksUA4kU8XU9rMftMXxbnl8vvshSXEEGGotR6ikVvglWxyppj75HWvtDSKqvBtsNVnmFxledSdrEzitqpAZUdIdE9RoOVusVvgGha3pkbOAKRDr2ZsCo09BVqbLBk743g093g-5-Tjk0XN57VNZNGDnGyBWMZvCZgbkn5QTYuYFG3JX55JRtG9Oluk2adKJU-6R1k4jpfN94jMMMjhmlP98rAM3g5A5nV2br1inS-0ito4X57X9eewdPuIuT7eUEDsbNDt_DYfg13m43HzIj_gaWvt0H priority: 102 providerName: IEEE |
| Title | Action-Inclusive Multi-Future Prediction Using a Generative Model in Human-Related Scenes for Mobile Robots |
| URI | https://ieeexplore.ieee.org/document/11170425 https://www.proquest.com/docview/3256159990 https://doaj.org/article/395286f15f4741f1af442fdfda72aedb |
| Volume | 13 |
| WOSCitedRecordID | wos001586194400003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2169-3536 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000816957 issn: 2169-3536 databaseCode: DOA dateStart: 20130101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources (Open Access) customDbUrl: eissn: 2169-3536 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000816957 issn: 2169-3536 databaseCode: M~E dateStart: 20130101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELZQxQAD4lFEoVQeGDFNYiexx1K1YqGqeEjdLMcPqQKlqE0Z-e2cnRSKGFhYIsU5xfGdfXd27r5D6MrwQinjDBE0toRxlhKRakP85EmEMtQ6FopN5JMJn83EdKvUl48Jq-GBa8b1qUgTnrk4dQyMn4uVYyxx8HaVJ8qawmvfKBdbm6mgg3mciTRvYIbiSPQHwyGMCDaESXpDM59umfwwRQGxvymx8ksvB2MzPkQHjZeIB_XXHaEdWx6j_S3swBP0MggZCQQW-Ovax6DjkEtLxgEkBE-X_g-Mp8AhKgArXCNMV4HU17_B8xKHM3wSIuKswY_aaz4MfixQFKAv8MOiWFSrNnoej56Gd6SpnEA0jK0irMa551owEzmmrMotGHatHc2jQmSFoIparqKMO8ZNpig0QoOlcGeB6hS1ykVpzxBmMTziQvvyQszkvLCGuSKLXCS0Ameug643TJRvNUCGDBuLSMia59LzXDY876Bbz-gvUo9uHRpA5rKRufxL5h3U9mL67s-XzwHl00HdjdxksxRXkoJTBz4bWN3z_-j7Au358dSnMF3UqpZre4l29Xs1Xy17YRbC9f5j1Au5hJ9h6eKR |
| linkProvider | Directory of Open Access Journals |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dT9wwDLc2NmnbA-yDaTdgy8MeF2ibtE0ejxMnEOyEgEm8RWk-JMR0N9319vfPTgPbNPHAW5u6aho7juPYPwN88aqz1kfPtSgDl0rWXNfOcxKeSlsvQpSp2EQ7m6nra32ek9VTLkwIIQWfhX26TGf5fuHW5Co7KKlMCgrZU3hWS1kVQ7rWvUuFakjous3YQmWhD8aTCf4G7gKrel80lGNZ_bP-JJj-XFflP2WcVpjp1iP79ho2synJxgPv38CTMH8Lr_4CGHwHt-OUtsBRC_xYU6A6Swm3fJqQRNj5ko5piIKl0AFm2QBD3SdSKpLDbuYsOfp5CpsLnl06Uo8MjV2k6FCpsItFt-hX2_B9enQ1Oea5vAJ3OBY9lwMYvnJa-iJKG2wbcPV3Loq26HTTaWFFULZoVJTKN1ZgIzYEgXcBqd7DxnwxDx-AyRIfKe2oBpH0reqCl7FrilhoZ9HiG8HXu0E3PwcUDZN2H4U2A48M8chkHo3gkBhzT0oQ2KkBR9zkGWWErivVxLKOEq2iWNqIIhFR7Gxb2eC7EWwTl_58LzNoBLt3fDZ5vq6MQMsPDTtcmj8-8NpneHF89e3MnJ3MTnfgJXV38MTswka_XIc9eO5-9Ter5acklL8BOwPgTg |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Action-Inclusive+Multi-Future+Prediction+Using+a+Generative+Model+in+Human-Related+Scenes+for+Mobile+Robots&rft.jtitle=IEEE+access&rft.au=Xu%2C+Chenfei&rft.au=Ahmad%2C+Huthaifa&rft.au=Okadome%2C+Yuya&rft.au=Ishiguro%2C+Hiroshi&rft.date=2025&rft.pub=IEEE&rft.eissn=2169-3536&rft.volume=13&rft.spage=167034&rft.epage=167044&rft_id=info:doi/10.1109%2FACCESS.2025.3611812&rft.externalDocID=11170425 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon |