Action-Inclusive Multi-Future Prediction Using a Generative Model in Human-Related Scenes for Mobile Robots

Mobility in daily unstructured environments, particularly in human-centered scenarios, remains a fundamental challenge for mobile robots. While traditional prediction-based approaches primarily estimate partial features for robot decision making, such as position and velocity, recent world models en...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access Vol. 13; pp. 167034 - 167044
Main Authors: Xu, Chenfei, Ahmad, Huthaifa, Okadome, Yuya, Ishiguro, Hiroshi, Nakamura, Yutaka
Format: Journal Article
Language:English
Published: Piscataway IEEE 2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:2169-3536, 2169-3536
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Mobility in daily unstructured environments, particularly in human-centered scenarios, remains a fundamental challenge for mobile robots. While traditional prediction-based approaches primarily estimate partial features for robot decision making, such as position and velocity, recent world models enable direct prediction of future sensory data. However, their potentials in human-inclusive environments remain underexplored. To assess the feasibility of world models in facilitating human-robot interactions, we propose a robot framework using a deep generative model that jointly predicts multiple future observations and actions. Our approach leverages first-person-view (FPV) raw sensor data, integrating both observations and actions to enhance predictive capabilities in dynamic human-populated settings. Experimental results demonstrate that our method is capable of generating a range of candidate futures for one condition and planning actions based on observation guidance. These findings highlight the potential of our approach for facilitating autonomous robots' coexistence with human.
AbstractList Mobility in daily unstructured environments, particularly in human-centered scenarios, remains a fundamental challenge for mobile robots. While traditional prediction-based approaches primarily estimate partial features for robot decision making, such as position and velocity, recent world models enable direct prediction of future sensory data. However, their potentials in human-inclusive environments remain underexplored. To assess the feasibility of world models in facilitating human-robot interactions, we propose a robot framework using a deep generative model that jointly predicts multiple future observations and actions. Our approach leverages first-person-view (FPV) raw sensor data, integrating both observations and actions to enhance predictive capabilities in dynamic human-populated settings. Experimental results demonstrate that our method is capable of generating a range of candidate futures for one condition and planning actions based on observation guidance. These findings highlight the potential of our approach for facilitating autonomous robots’ coexistence with human.
Author Xu, Chenfei
Nakamura, Yutaka
Ahmad, Huthaifa
Okadome, Yuya
Ishiguro, Hiroshi
Author_xml – sequence: 1
  givenname: Chenfei
  orcidid: 0009-0008-9778-7853
  surname: Xu
  fullname: Xu, Chenfei
  organization: Graduate School of Engineering Science, Osaka University, Osaka, Japan
– sequence: 2
  givenname: Huthaifa
  orcidid: 0000-0002-8865-955X
  surname: Ahmad
  fullname: Ahmad, Huthaifa
  organization: RIKEN Information Research and Development and Strategy Headquarters, RIKEN, Kyoto, Japan
– sequence: 3
  givenname: Yuya
  orcidid: 0009-0008-2961-8417
  surname: Okadome
  fullname: Okadome, Yuya
  organization: Department of Information and Computer Technology, Tokyo University of Science, Tokyo, Japan
– sequence: 4
  givenname: Hiroshi
  orcidid: 0000-0002-0805-7648
  surname: Ishiguro
  fullname: Ishiguro, Hiroshi
  organization: Graduate School of Engineering Science, Osaka University, Osaka, Japan
– sequence: 5
  givenname: Yutaka
  orcidid: 0000-0001-6307-5104
  surname: Nakamura
  fullname: Nakamura, Yutaka
  email: yutaka.nakamura@grp.riken.jp
  organization: RIKEN Information Research and Development and Strategy Headquarters, RIKEN, Kyoto, Japan
BookMark eNpNkV1LHDEYhUOxUKv-gvYi0OtZ8zWZyeWy-LGgKG69DpnJG8l2TGySEfrvG3ekNTdJDs857wvnKzoKMQBC3yhZUUrU-XqzudjtVoywdsUlpT1ln9Axo1I1vOXy6MP7CzrLeU_q6avUdsfo13osPoZmG8Zpzv4V8O08Fd9czmVOgO8TWH8g8GP24QkbfAUBkikHNFqYsA_4en42oXmAyRSweDdWJGMXUyUGPwF-iEMs-RR9dmbKcPZ-n6DHy4ufm-vm5u5qu1nfNGNdvzSC9KwVsh-VsMQJA6YDytQ4Ot6RQclBccOhN0T2TvRWGl7FKgCvP6jUCdouuTaavX5J_tmkPzoarw9CTE_apOLHCTRXLeulo60TnaCOGicEc9ZZ0zEDdqhZP5aslxR_z5CL3sc5hbq-5qyVtFVKkUrxhRpTzDmB-zeVEv1Wkl5K0m8l6feSquv74vIA8N9BaUcEa_lfm06Pug
CODEN IAECCG
Cites_doi 10.1109/ICCV51070.2023.00387
10.3390/app14010089
10.1109/ICRA57147.2024.10611090
10.1109/iros.2012.6385773
10.3390/s21248229
10.1007/s12065-023-00817-3
10.1109/CVPR52688.2022.01042
10.1038/s41586-025-08744-2
10.1109/MMAR49549.2021.9528442
10.1109/ICCAE59995.2024.10569186
10.1016/c2012-0-06836-6
10.1109/CVPR52734.2025.01472
10.1109/IROS51168.2021.9636613
10.1109/IROS55552.2023.10342447
10.1109/cvpr52688.2022.01842
10.1109/ICRA57147.2024.10610665
10.1016/j.ifacol.2021.10.472
10.1155/2021/6694084
10.1109/LRA.2023.3329626
10.1176/appi.books.9781585622665.33114
10.1109/CVPR52729.2023.00297
10.20965/jrm.2022.p0654
10.1109/LRA.2019.2925731
10.1109/ICRA48891.2023.10161227
10.1016/j.engappai.2023.107631
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025
DBID 97E
ESBDL
RIA
RIE
AAYXX
CITATION
7SC
7SP
7SR
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
DOA
DOI 10.1109/ACCESS.2025.3611812
DatabaseName IEEE Xplore (IEEE)
IEEE Xplore Open Access Journals (WRLC)
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Engineered Materials Abstracts
METADEX
Technology Research Database
Materials Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
Materials Research Database
Engineered Materials Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
METADEX
Computer and Information Systems Abstracts Professional
DatabaseTitleList Materials Research Database


Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2169-3536
EndPage 167044
ExternalDocumentID oai_doaj_org_article_395286f15f4741f1af442fdfda72aedb
10_1109_ACCESS_2025_3611812
11170425
Genre orig-research
GrantInformation_xml – fundername: Japan Science and Technology Agency (JST) Moonshot Research and Development Grant through the Development of Semi-Autonomous CA
  grantid: JPMJMS2011
GroupedDBID 0R~
4.4
5VS
6IK
97E
AAJGR
ABAZT
ABVLG
ACGFS
ADBBV
AGSQL
ALMA_UNASSIGNED_HOLDINGS
BCNDV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
ESBDL
GROUPED_DOAJ
IPLJI
JAVBF
KQ8
M43
M~E
O9-
OCL
OK1
RIA
RIE
RNS
AAYXX
CITATION
7SC
7SP
7SR
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c361t-40825468c94d0f4aea7e129ccf370b96b93a3e8a068f48d6a3b963e8e348decf3
IEDL.DBID DOA
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001586194400003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2169-3536
IngestDate Fri Oct 03 12:40:14 EDT 2025
Sat Nov 01 15:53:13 EDT 2025
Sat Nov 29 07:22:32 EST 2025
Wed Oct 08 06:22:38 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
License https://creativecommons.org/licenses/by-nc-nd/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c361t-40825468c94d0f4aea7e129ccf370b96b93a3e8a068f48d6a3b963e8e348decf3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0009-0008-9778-7853
0000-0001-6307-5104
0000-0002-0805-7648
0009-0008-2961-8417
0000-0002-8865-955X
OpenAccessLink https://doaj.org/article/395286f15f4741f1af442fdfda72aedb
PQID 3256159990
PQPubID 4845423
PageCount 11
ParticipantIDs crossref_primary_10_1109_ACCESS_2025_3611812
doaj_primary_oai_doaj_org_article_395286f15f4741f1af442fdfda72aedb
ieee_primary_11170425
proquest_journals_3256159990
PublicationCentury 2000
PublicationDate 20250000
2025-00-00
20250101
2025-01-01
PublicationDateYYYYMMDD 2025-01-01
PublicationDate_xml – year: 2025
  text: 20250000
PublicationDecade 2020
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE access
PublicationTitleAbbrev Access
PublicationYear 2025
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref35
ref34
ref36
ref11
ref33
ref2
ref1
ref17
Blattmann (ref30) 2023
ref16
Vuong (ref19) 2023
ref18
Ha (ref6)
Song (ref32) 2020
Lu (ref8); 2025
Xu (ref15)
ref24
ref26
ref25
Dosovitskiy (ref12) 2020
ref20
ref22
ref21
Lei Ba (ref31) 2016
Yang (ref10); 2024
ref28
ref27
Li (ref23); 37
ref29
ref7
ref9
ref4
Srivastava (ref37); 30
ref3
Lu (ref14); 2024
ref5
References_xml – ident: ref13
  doi: 10.1109/ICCV51070.2023.00387
– start-page: 2455
  volume-title: Proc. 32nd Int. Conf. Neural Inf. Process. Syst.
  ident: ref6
  article-title: Recurrent world models facilitate policy evolution
– ident: ref28
  doi: 10.3390/app14010089
– ident: ref22
  doi: 10.1109/ICRA57147.2024.10611090
– ident: ref36
  doi: 10.1109/iros.2012.6385773
– ident: ref20
  doi: 10.3390/s21248229
– volume: 2024
  start-page: 45210
  volume-title: Proc. 12th Int. Conf. Learn. Represent.
  ident: ref10
  article-title: Learning interactive real-world simulators
– ident: ref2
  doi: 10.1007/s12065-023-00817-3
– year: 2020
  ident: ref32
  article-title: Denoising diffusion implicit models
  publication-title: arXiv:2010.02502
– ident: ref34
  doi: 10.1109/CVPR52688.2022.01042
– ident: ref7
  doi: 10.1038/s41586-025-08744-2
– ident: ref1
  doi: 10.1109/MMAR49549.2021.9528442
– ident: ref33
  doi: 10.1109/ICCAE59995.2024.10569186
– ident: ref35
  doi: 10.1016/c2012-0-06836-6
– ident: ref9
  doi: 10.1109/CVPR52734.2025.01472
– volume: 2024
  start-page: 19259
  volume-title: Proc. Int. Conf. Learn. Represent. (ICLR)
  ident: ref14
  article-title: VDT: General-purpose video diffusion transformers via mask modeling
– ident: ref21
  doi: 10.1109/IROS51168.2021.9636613
– volume: 30
  start-page: 3308
  volume-title: Proc. Annu. Conf. Neural Inf. Process. Syst.
  ident: ref37
  article-title: VEEGAN: Reducing mode collapse in GANs using implicit variational learning
– ident: ref26
  doi: 10.1109/IROS55552.2023.10342447
– ident: ref27
  doi: 10.1109/cvpr52688.2022.01842
– year: 2020
  ident: ref12
  article-title: An image is worth 16×16 words: Transformers for image recognition at scale
  publication-title: arXiv:2010.11929
– ident: ref17
  doi: 10.1109/ICRA57147.2024.10610665
– volume: 37
  start-page: 119411
  volume-title: Proc. Annu. Conf. Neural Inf. Process. Syst.
  ident: ref23
  article-title: Human-aware vision-and-language navigation: Bridging simulation to reality with dynamic human interactions
– ident: ref3
  doi: 10.1016/j.ifacol.2021.10.472
– year: 2023
  ident: ref19
  article-title: Open X-embodiment: Robotic learning datasets and RT-X models
  publication-title: arXiv:2310.08864
– ident: ref5
  doi: 10.1155/2021/6694084
– ident: ref25
  doi: 10.1109/LRA.2023.3329626
– ident: ref11
  doi: 10.1176/appi.books.9781585622665.33114
– ident: ref29
  doi: 10.1109/CVPR52729.2023.00297
– ident: ref4
  doi: 10.20965/jrm.2022.p0654
– ident: ref24
  doi: 10.1109/LRA.2019.2925731
– ident: ref16
  doi: 10.1109/ICRA48891.2023.10161227
– year: 2016
  ident: ref31
  article-title: Layer normalization
  publication-title: arXiv:1607.06450
– start-page: 4383
  volume-title: Proc. Annu. Conf. Neural Inf. Process. Syst.
  ident: ref15
  article-title: Understanding and improving layer normalization
– ident: ref18
  doi: 10.1016/j.engappai.2023.107631
– volume: 2025
  start-page: 52310
  volume-title: Proc. 13th Int. Conf. Learn. Represent. (ICLR)
  ident: ref8
  article-title: Generative world explorer
– year: 2023
  ident: ref30
  article-title: Stable video diffusion: Scaling latent video diffusion models to large datasets
  publication-title: arXiv:2311.15127
SSID ssj0000816957
Score 2.3341572
Snippet Mobility in daily unstructured environments, particularly in human-centered scenarios, remains a fundamental challenge for mobile robots. While traditional...
SourceID doaj
proquest
crossref
ieee
SourceType Open Website
Aggregation Database
Index Database
Publisher
StartPage 167034
SubjectTerms Behavioral sciences
Data models
Data sets for robot learning
deep learning methods
human-aware motion planning
human–robot interaction
Mobile robots
Planning
Predictive models
Robot sensing systems
Robots
Transformers
Videos
Visualization
SummonAdditionalLinks – databaseName: IEEE Electronic Library (IEL)
  dbid: RIE
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB61VQ9w4NUiFgrygSNuk9hJ7OOy6opTVdEi9WY59liqQLtoN8vvZ8ZxCxXi0FviTBTH33g8Hs8D4GM0g_cxRWlVjVIb3UrbhiiZeRrro8Kkc7GJ_uLC3NzYyxKsnmNhEDE7n-EpX-az_LgOOzaVndVcJoWYbB_2-76bgrXuDSpcQcK2fcksVFf2bL5Y0E_QHrBpT1XHEZbNg9UnJ-kvVVX-EcV5fVk-f2TPXsCzokiK-YT8S9jD1St4-ld6wSP4Ps9BC5JkwI8du6mLHG4rlzmPiLjc8CENU4jsOCC8mJJQj5mUS-SI25XIZn6ZneYwiqvAwlGQqksUA4kU8XU9rMftMXxbnl8vvshSXEEGGotR6ikVvglWxyppj75HWvtDSKqvBtsNVnmFxledSdrEzitqpAZUdIdE9RoOVusVvgGha3pkbOAKRDr2ZsCo09BVqbLBk743g093g-5-Tjk0XN57VNZNGDnGyBWMZvCZgbkn5QTYuYFG3JX55JRtG9Oluk2adKJU-6R1k4jpfN94jMMMjhmlP98rAM3g5A5nV2br1inS-0ito4X57X9eewdPuIuT7eUEDsbNDt_DYfg13m43HzIj_gaWvt0H
  priority: 102
  providerName: IEEE
Title Action-Inclusive Multi-Future Prediction Using a Generative Model in Human-Related Scenes for Mobile Robots
URI https://ieeexplore.ieee.org/document/11170425
https://www.proquest.com/docview/3256159990
https://doaj.org/article/395286f15f4741f1af442fdfda72aedb
Volume 13
WOSCitedRecordID wos001586194400003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2169-3536
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000816957
  issn: 2169-3536
  databaseCode: DOA
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources (Open Access)
  customDbUrl:
  eissn: 2169-3536
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000816957
  issn: 2169-3536
  databaseCode: M~E
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELZQxQAD4lFEoVQeGDFNYiexx1K1YqGqeEjdLMcPqQKlqE0Z-e2cnRSKGFhYIsU5xfGdfXd27r5D6MrwQinjDBE0toRxlhKRakP85EmEMtQ6FopN5JMJn83EdKvUl48Jq-GBa8b1qUgTnrk4dQyMn4uVYyxx8HaVJ8qawmvfKBdbm6mgg3mciTRvYIbiSPQHwyGMCDaESXpDM59umfwwRQGxvymx8ksvB2MzPkQHjZeIB_XXHaEdWx6j_S3swBP0MggZCQQW-Ovax6DjkEtLxgEkBE-X_g-Mp8AhKgArXCNMV4HU17_B8xKHM3wSIuKswY_aaz4MfixQFKAv8MOiWFSrNnoej56Gd6SpnEA0jK0irMa551owEzmmrMotGHatHc2jQmSFoIparqKMO8ZNpig0QoOlcGeB6hS1ykVpzxBmMTziQvvyQszkvLCGuSKLXCS0Ameug643TJRvNUCGDBuLSMia59LzXDY876Bbz-gvUo9uHRpA5rKRufxL5h3U9mL67s-XzwHl00HdjdxksxRXkoJTBz4bWN3z_-j7Au358dSnMF3UqpZre4l29Xs1Xy17YRbC9f5j1Au5hJ9h6eKR
linkProvider Directory of Open Access Journals
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dT9wwDLc2NmnbA-yDaTdgy8MeF2ibtE0ejxMnEOyEgEm8RWk-JMR0N9319vfPTgPbNPHAW5u6aho7juPYPwN88aqz1kfPtSgDl0rWXNfOcxKeSlsvQpSp2EQ7m6nra32ek9VTLkwIIQWfhX26TGf5fuHW5Co7KKlMCgrZU3hWS1kVQ7rWvUuFakjous3YQmWhD8aTCf4G7gKrel80lGNZ_bP-JJj-XFflP2WcVpjp1iP79ho2synJxgPv38CTMH8Lr_4CGHwHt-OUtsBRC_xYU6A6Swm3fJqQRNj5ko5piIKl0AFm2QBD3SdSKpLDbuYsOfp5CpsLnl06Uo8MjV2k6FCpsItFt-hX2_B9enQ1Oea5vAJ3OBY9lwMYvnJa-iJKG2wbcPV3Loq26HTTaWFFULZoVJTKN1ZgIzYEgXcBqd7DxnwxDx-AyRIfKe2oBpH0reqCl7FrilhoZ9HiG8HXu0E3PwcUDZN2H4U2A48M8chkHo3gkBhzT0oQ2KkBR9zkGWWErivVxLKOEq2iWNqIIhFR7Gxb2eC7EWwTl_58LzNoBLt3fDZ5vq6MQMsPDTtcmj8-8NpneHF89e3MnJ3MTnfgJXV38MTswka_XIc9eO5-9Ter5acklL8BOwPgTg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Action-Inclusive+Multi-Future+Prediction+Using+a+Generative+Model+in+Human-Related+Scenes+for+Mobile+Robots&rft.jtitle=IEEE+access&rft.au=Xu%2C+Chenfei&rft.au=Ahmad%2C+Huthaifa&rft.au=Okadome%2C+Yuya&rft.au=Ishiguro%2C+Hiroshi&rft.date=2025&rft.pub=IEEE&rft.eissn=2169-3536&rft.volume=13&rft.spage=167034&rft.epage=167044&rft_id=info:doi/10.1109%2FACCESS.2025.3611812&rft.externalDocID=11170425
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon