A Universal Empirical Dynamic Programming Algorithm for Continuous State MDPs

We propose universal randomized function approximation-based empirical value learning (EVL) algorithms for Markov decision processes. The "empirical" nature comes from each iteration being done empirically from samples available from simulations of the next state. This makes the Bellman op...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on automatic control Ročník 65; číslo 1; s. 115 - 129
Hlavní autoři: Haskell, William B., Jain, Rahul, Sharma, Hiteshi, Yu, Pengqian
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.01.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:0018-9286, 1558-2523
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract We propose universal randomized function approximation-based empirical value learning (EVL) algorithms for Markov decision processes. The "empirical" nature comes from each iteration being done empirically from samples available from simulations of the next state. This makes the Bellman operator a random operator. A parametric and a nonparametric method for function approximation using a parametric function space and a reproducing kernel Hilbert space respectively are then combined with EVL. Both function spaces have the universal function approximation property. Basis functions are picked randomly. Convergence analysis is performed using a random operator framework with techniques from the theory of stochastic dominance. Finite time sample complexity bounds are derived for both universal approximate dynamic programming algorithms. Numerical experiments support the versatility and computational tractability of this approach.
AbstractList We propose universal randomized function approximation-based empirical value learning (EVL) algorithms for Markov decision processes. The "empirical" nature comes from each iteration being done empirically from samples available from simulations of the next state. This makes the Bellman operator a random operator. A parametric and a nonparametric method for function approximation using a parametric function space and a reproducing kernel Hilbert space respectively are then combined with EVL. Both function spaces have the universal function approximation property. Basis functions are picked randomly. Convergence analysis is performed using a random operator framework with techniques from the theory of stochastic dominance. Finite time sample complexity bounds are derived for both universal approximate dynamic programming algorithms. Numerical experiments support the versatility and computational tractability of this approach.
Author Jain, Rahul
Haskell, William B.
Yu, Pengqian
Sharma, Hiteshi
Author_xml – sequence: 1
  givenname: William B.
  orcidid: 0000-0002-9518-4310
  surname: Haskell
  fullname: Haskell, William B.
  email: isehwb@nus.edu.sg
  organization: Department of Industrial and Systems Engineering, National University of Singapore, Singapore
– sequence: 2
  givenname: Rahul
  orcidid: 0000-0003-3786-8682
  surname: Jain
  fullname: Jain, Rahul
  email: rahul.jain@usc.edu
  organization: EE Department, University of Southern California, Los Angeles, CA, USA
– sequence: 3
  givenname: Hiteshi
  orcidid: 0000-0002-4057-0302
  surname: Sharma
  fullname: Sharma, Hiteshi
  email: hiteshis@usc.edu
  organization: EE Department, University of Southern California, Los Angeles, CA, USA
– sequence: 4
  givenname: Pengqian
  orcidid: 0000-0002-4660-6679
  surname: Yu
  fullname: Yu, Pengqian
  email: yupengqian@u.nus.edu
  organization: Department of Industrial and Systems Engineering, National University of Singapore, Singapore
BookMark eNp9kEtLAzEUhYNUsK3uBTcB11PzmEeyHKb1ARYLtushk2RqykxSkxmh_96pLS5cuLr3wvnu4ZwJGFlnNQC3GM0wRvxhnRczgjCfEY6yGMcXYIyThEUkIXQExghhFnHC0iswCWE3nGkc4zFY5nBjzZf2QTRw0e6NN3LY5gcrWiPhyrutF21r7BbmzdZ50320sHYeFs52xvauD_C9E52Gy_kqXIPLWjRB35znFGweF-viOXp9e3op8tdIEo67KMWMqUwQWvFKylRVVapUrTinVGHNhUxiTrIaZVLElUoxZopQrIhKqopliNEpuD_93Xv32evQlTvXeztYloRSgnnGf1TopJLeheB1Xe69aYU_lBiVx9LKobTyWFp5Lm1A0j-INEM6M4T1wjT_gXcn0Gitf31YmjEWU_oNCct7CQ
CODEN IETAA9
CitedBy_id crossref_primary_10_1109_LCSYS_2021_3092196
crossref_primary_10_1109_ACCESS_2020_3001143
crossref_primary_10_1007_s10957_020_01747_1
crossref_primary_10_1016_j_automatica_2022_110179
crossref_primary_10_1093_imamci_dnaf014
crossref_primary_10_1016_j_ifacol_2023_10_854
crossref_primary_10_1287_mnsc_2020_00038
crossref_primary_10_1109_TAC_2024_3362686
crossref_primary_10_1109_TAC_2024_3371380
Cites_doi 10.1090/mbk/058
10.1287/moor.1040.0094
10.1002/9780470182963
10.3166/ejc.11.310-334
10.1137/040614384
10.1007/978-0-387-34675-5
10.1007/978-1-4899-7502-7_646-1
10.1016/0097-3165(95)90052-7
10.1016/j.acha.2005.03.001
10.1109/CDC.2017.8264011
10.1016/j.automatica.2010.05.021
10.1287/opre.51.6.850.24925
10.2307/2171751
10.1038/nature14236
10.1109/ALLERTON.2008.4797607
10.1287/moor.2015.0733
10.1023/A:1017928328829
10.1017/S0962492900002816
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
7TB
8FD
FR3
JQ2
L7M
L~C
L~D
DOI 10.1109/TAC.2019.2907414
DatabaseName IEEE Xplore (IEEE)
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Xplore Digital Library
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Mechanical & Transportation Engineering Abstracts
Technology Research Database
Engineering Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Mechanical & Transportation Engineering Abstracts
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Engineering Research Database
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1558-2523
EndPage 129
ExternalDocumentID 10_1109_TAC_2019_2907414
8678843
Genre orig-research
GrantInformation_xml – fundername: NSF
  grantid: CCF-1817212
– fundername: ONR Young Investigator
  grantid: #N000141210766
– fundername: Ministry of Education - Singapore; Singapore Ministry of Education
  grantid: MOE2015-T2-2-148
  funderid: 10.13039/501100001459
GroupedDBID -~X
.DC
0R~
29I
3EH
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
ACNCT
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
F5P
HZ~
H~9
IAAWW
IBMZZ
ICLAB
IDIHD
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
RIA
RIE
RNS
TAE
TN5
VH1
VJK
~02
AAYXX
CITATION
7SC
7SP
7TB
8FD
FR3
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c291t-6188d7a23b9bcc6dbb6ddfd9933d1e9ac54927f07ca4bd6118d231d2d5bb87083
IEDL.DBID RIE
ISICitedReferencesCount 13
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000506851100009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0018-9286
IngestDate Mon Jun 30 10:16:43 EDT 2025
Tue Nov 18 22:18:32 EST 2025
Sat Nov 29 05:40:54 EST 2025
Wed Aug 27 02:38:53 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c291t-6188d7a23b9bcc6dbb6ddfd9933d1e9ac54927f07ca4bd6118d231d2d5bb87083
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-4057-0302
0000-0003-3786-8682
0000-0002-9518-4310
0000-0002-4660-6679
PQID 2332197908
PQPubID 85475
PageCount 15
ParticipantIDs crossref_primary_10_1109_TAC_2019_2907414
ieee_primary_8678843
crossref_citationtrail_10_1109_TAC_2019_2907414
proquest_journals_2332197908
PublicationCentury 2000
PublicationDate 2020-Jan.
2020-1-00
20200101
PublicationDateYYYYMMDD 2020-01-01
PublicationDate_xml – month: 01
  year: 2020
  text: 2020-Jan.
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on automatic control
PublicationTitleAbbrev TAC
PublicationYear 2020
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
szepesvári (ref15) 2001; 14
munos (ref9) 2008; 9
ref30
ref11
ref32
ref10
ref1
ref17
ref16
powell (ref8) 2007
konda (ref20) 0
rahimi (ref26) 0
sutton (ref22) 0
anthony (ref31) 2009
bertsekas (ref7) 2010
mnih (ref24) 2015; 518
ref23
ref25
grünewälder (ref14) 2012
puterman (ref27) 2014
ref21
bhat (ref18) 0
munos (ref19) 0; 3
sutton (ref4) 1998; 1
ref28
ref29
ref3
ref6
ref5
bertsekas (ref2) 2011; 2
References_xml – ident: ref30
  doi: 10.1090/mbk/058
– ident: ref12
  doi: 10.1287/moor.1040.0094
– start-page: 386
  year: 0
  ident: ref18
  article-title: Non-parametric approximate dynamic programming via the kernel method
  publication-title: Proc Adv Neural Inf Process Syst
– volume: 1
  year: 1998
  ident: ref4
  publication-title: Reinforcement Learning An Introduction
– year: 2014
  ident: ref27
  publication-title: Markov Decision Processes Discrete Stochastic Dynamic Programming
– start-page: 1057
  year: 0
  ident: ref22
  article-title: Policy gradient methods for reinforcement learning with function approximation
  publication-title: Proc Adv Neural Inf Process Syst
– ident: ref3
  doi: 10.1002/9780470182963
– ident: ref6
  doi: 10.3166/ejc.11.310-334
– start-page: 535
  year: 2012
  ident: ref14
  article-title: Modelling transition dynamics in MDPS with RKHS embeddings
  publication-title: ICML
– year: 2009
  ident: ref31
  publication-title: Neural Network Learning Theoretical Foundations
– volume: 14
  start-page: 163
  year: 2001
  ident: ref15
  article-title: Efficient approximate planning in continuous space markovian decision problems
  publication-title: AI Commun
– year: 2007
  ident: ref8
  publication-title: Approximate Dynamic Programming Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)
  doi: 10.1002/9780470182963
– start-page: 1008
  year: 0
  ident: ref20
  article-title: Actor-critic algorithms
  publication-title: Proc Advances Neural Inf Process Syst
– ident: ref16
  doi: 10.1137/040614384
– volume: 9
  start-page: 815
  year: 2008
  ident: ref9
  article-title: Finite-time bounds for fitted value iteration
  publication-title: J Mach Learn Res
– ident: ref29
  doi: 10.1007/978-0-387-34675-5
– ident: ref23
  doi: 10.1007/978-1-4899-7502-7_646-1
– ident: ref32
  doi: 10.1016/0097-3165(95)90052-7
– ident: ref28
  doi: 10.1016/j.acha.2005.03.001
– start-page: 1313
  year: 0
  ident: ref26
  article-title: Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning
  publication-title: Proc Adv Neural Inf Process Syst
– volume: 3
  start-page: 560
  year: 0
  ident: ref19
  article-title: Error bounds for approximate policy iteration
  publication-title: Proc Int Conf Mach Learn
– ident: ref1
  doi: 10.1109/CDC.2017.8264011
– ident: ref21
  doi: 10.1016/j.automatica.2010.05.021
– ident: ref17
  doi: 10.1287/opre.51.6.850.24925
– ident: ref11
  doi: 10.2307/2171751
– volume: 518
  start-page: 529
  year: 2015
  ident: ref24
  article-title: Human-level control through deep reinforcement learning
  publication-title: Nature
  doi: 10.1038/nature14236
– ident: ref25
  doi: 10.1109/ALLERTON.2008.4797607
– volume: 2
  year: 2011
  ident: ref2
  publication-title: Dynamic Programming and Optimal Control
– ident: ref10
  doi: 10.1287/moor.2015.0733
– year: 2010
  ident: ref7
  publication-title: Dynamic Programming and Optimal Control
– ident: ref13
  doi: 10.1023/A:1017928328829
– ident: ref5
  doi: 10.1017/S0962492900002816
SSID ssj0016441
Score 2.4197307
Snippet We propose universal randomized function approximation-based empirical value learning (EVL) algorithms for Markov decision processes. The "empirical" nature...
We propose universal randomized function approximation-based empirical value learning (EVL) algorithms for Markov decision processes. The “empirical” nature...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 115
SubjectTerms Algorithms
Approximation
Approximation algorithms
Basis functions
Complexity theory
Computer simulation
Continuous state-space Markov decision processes (MDPs)
Convergence
Dynamic programming
dynamic programming (DP)
Empirical analysis
Function approximation
Function space
Heuristic algorithms
Hilbert space
Iterative methods
Machine learning
Markov processes
Mathematical analysis
Operators (mathematics)
Probabilistic logic
reinforcement learning (RL)
Title A Universal Empirical Dynamic Programming Algorithm for Continuous State MDPs
URI https://ieeexplore.ieee.org/document/8678843
https://www.proquest.com/docview/2332197908
Volume 65
WOSCitedRecordID wos000506851100009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1558-2523
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0016441
  issn: 0018-9286
  databaseCode: RIE
  dateStart: 19630101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8JAEJ4A8aAHX2jEV_bgxcRKX7S7R6IQD0I4oOHWdHcWJQFKePj7nd2WRqMx8dbDbNPMdB7fzgvgRhEmEB76jkZEx7ReOinHsSNbQqqWDKPIto-9Psf9Ph-NxKACd2UvjNbaFp_pe_Noc_mYqY25Kmtysqw8DKpQjeMo79UqMwbGr-dWlxTY52VK0hXNYfvB1HCJe98gQS_85oLsTpUfhth6l-7B_77rEPaLKJK1c7EfQUXPj2Hvy2zBOvTarCi6IMLObDGxs0DYY76Bng3yuqwZ0bL29C1bTtbvM0YRLDPzqibzTbZZMRuJst7jYHUCL93O8OHJKXYnOMoX3poQIecYp34giecqQikjxDFSNBKgp0WqzGS2eOzGKg0lRgQzkCI99LElJakwD06hNs_m-gwYBXWaVJ1whVIhCTL1onEQK0mva7mo3QY0t-xMVDFY3Oy3mCYWYLgiIQEkRgBJIYAG3JYnFvlQjT9o64bhJV3B6wZcbiWWFFq3SvwgIAMcC5ef_37qAnZ9g5ftFcol1NbLjb6CHfWxnqyW1_aH-gRwhsiE
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1JS8NAFH64gXpwF-s6By-CsclkmzkWFxTb0kMVbyEzb6oF20oXf79vJjEoiuAthzchvJe3fPM2gFNNmEAGyD2DiJ5tvfRygT1PxVLpWEVJ4trHHptpuy2enmRnDs6rXhhjjCs-Mxf20eXycaRn9qqsLsiyiiich8U4irhfdGtVOQPr2Qu7SyrMRZWU9GW927i0VVzyglssGETfnJDbqvLDFDv_crP-vy_bgLUyjmSNQvCbMGeGW7D6ZbrgNrQarCy7IMLrwVvfTQNhV8UOetYpKrMGRMsar8-jcX_6MmAUwzI7sao_nI1mE-ZiUda66kx24OHmunt565XbEzzNZTAlTCgEpjkPFXFdJ6hUgthDikdCDIzMtZ3Nlvb8VOeRwoSABlKshxxjpUiJRbgLC8PR0OwBo7DOkLITstA6IlHmQdILU63odbGPxq9B_ZOdmS5Hi9sNF6-Zgxi-zEgAmRVAVgqgBmfVibdirMYftNuW4RVdyesaHH5KLCv1bpLxMCQTnEpf7P9-6gSWb7utZta8a98fwAq36NldqBzCwnQ8M0ewpN-n_cn42P1cHwnmy8s
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Universal+Empirical+Dynamic+Programming+Algorithm+for+Continuous+State+MDPs&rft.jtitle=IEEE+transactions+on+automatic+control&rft.au=Haskell%2C+William+B.&rft.au=Jain%2C+Rahul&rft.au=Sharma%2C+Hiteshi&rft.au=Yu%2C+Pengqian&rft.date=2020-01-01&rft.issn=0018-9286&rft.eissn=1558-2523&rft.volume=65&rft.issue=1&rft.spage=115&rft.epage=129&rft_id=info:doi/10.1109%2FTAC.2019.2907414&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TAC_2019_2907414
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9286&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9286&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9286&client=summon