A Universal Empirical Dynamic Programming Algorithm for Continuous State MDPs
We propose universal randomized function approximation-based empirical value learning (EVL) algorithms for Markov decision processes. The "empirical" nature comes from each iteration being done empirically from samples available from simulations of the next state. This makes the Bellman op...
Uloženo v:
| Vydáno v: | IEEE transactions on automatic control Ročník 65; číslo 1; s. 115 - 129 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
New York
IEEE
01.01.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Témata: | |
| ISSN: | 0018-9286, 1558-2523 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | We propose universal randomized function approximation-based empirical value learning (EVL) algorithms for Markov decision processes. The "empirical" nature comes from each iteration being done empirically from samples available from simulations of the next state. This makes the Bellman operator a random operator. A parametric and a nonparametric method for function approximation using a parametric function space and a reproducing kernel Hilbert space respectively are then combined with EVL. Both function spaces have the universal function approximation property. Basis functions are picked randomly. Convergence analysis is performed using a random operator framework with techniques from the theory of stochastic dominance. Finite time sample complexity bounds are derived for both universal approximate dynamic programming algorithms. Numerical experiments support the versatility and computational tractability of this approach. |
|---|---|
| AbstractList | We propose universal randomized function approximation-based empirical value learning (EVL) algorithms for Markov decision processes. The "empirical" nature comes from each iteration being done empirically from samples available from simulations of the next state. This makes the Bellman operator a random operator. A parametric and a nonparametric method for function approximation using a parametric function space and a reproducing kernel Hilbert space respectively are then combined with EVL. Both function spaces have the universal function approximation property. Basis functions are picked randomly. Convergence analysis is performed using a random operator framework with techniques from the theory of stochastic dominance. Finite time sample complexity bounds are derived for both universal approximate dynamic programming algorithms. Numerical experiments support the versatility and computational tractability of this approach. |
| Author | Jain, Rahul Haskell, William B. Yu, Pengqian Sharma, Hiteshi |
| Author_xml | – sequence: 1 givenname: William B. orcidid: 0000-0002-9518-4310 surname: Haskell fullname: Haskell, William B. email: isehwb@nus.edu.sg organization: Department of Industrial and Systems Engineering, National University of Singapore, Singapore – sequence: 2 givenname: Rahul orcidid: 0000-0003-3786-8682 surname: Jain fullname: Jain, Rahul email: rahul.jain@usc.edu organization: EE Department, University of Southern California, Los Angeles, CA, USA – sequence: 3 givenname: Hiteshi orcidid: 0000-0002-4057-0302 surname: Sharma fullname: Sharma, Hiteshi email: hiteshis@usc.edu organization: EE Department, University of Southern California, Los Angeles, CA, USA – sequence: 4 givenname: Pengqian orcidid: 0000-0002-4660-6679 surname: Yu fullname: Yu, Pengqian email: yupengqian@u.nus.edu organization: Department of Industrial and Systems Engineering, National University of Singapore, Singapore |
| BookMark | eNp9kEtLAzEUhYNUsK3uBTcB11PzmEeyHKb1ARYLtushk2RqykxSkxmh_96pLS5cuLr3wvnu4ZwJGFlnNQC3GM0wRvxhnRczgjCfEY6yGMcXYIyThEUkIXQExghhFnHC0iswCWE3nGkc4zFY5nBjzZf2QTRw0e6NN3LY5gcrWiPhyrutF21r7BbmzdZ50320sHYeFs52xvauD_C9E52Gy_kqXIPLWjRB35znFGweF-viOXp9e3op8tdIEo67KMWMqUwQWvFKylRVVapUrTinVGHNhUxiTrIaZVLElUoxZopQrIhKqopliNEpuD_93Xv32evQlTvXeztYloRSgnnGf1TopJLeheB1Xe69aYU_lBiVx9LKobTyWFp5Lm1A0j-INEM6M4T1wjT_gXcn0Gitf31YmjEWU_oNCct7CQ |
| CODEN | IETAA9 |
| CitedBy_id | crossref_primary_10_1109_LCSYS_2021_3092196 crossref_primary_10_1109_ACCESS_2020_3001143 crossref_primary_10_1007_s10957_020_01747_1 crossref_primary_10_1016_j_automatica_2022_110179 crossref_primary_10_1093_imamci_dnaf014 crossref_primary_10_1016_j_ifacol_2023_10_854 crossref_primary_10_1287_mnsc_2020_00038 crossref_primary_10_1109_TAC_2024_3362686 crossref_primary_10_1109_TAC_2024_3371380 |
| Cites_doi | 10.1090/mbk/058 10.1287/moor.1040.0094 10.1002/9780470182963 10.3166/ejc.11.310-334 10.1137/040614384 10.1007/978-0-387-34675-5 10.1007/978-1-4899-7502-7_646-1 10.1016/0097-3165(95)90052-7 10.1016/j.acha.2005.03.001 10.1109/CDC.2017.8264011 10.1016/j.automatica.2010.05.021 10.1287/opre.51.6.850.24925 10.2307/2171751 10.1038/nature14236 10.1109/ALLERTON.2008.4797607 10.1287/moor.2015.0733 10.1023/A:1017928328829 10.1017/S0962492900002816 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 7TB 8FD FR3 JQ2 L7M L~C L~D |
| DOI | 10.1109/TAC.2019.2907414 |
| DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Xplore Digital Library CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Mechanical & Transportation Engineering Abstracts Technology Research Database Engineering Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Engineering Research Database Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 1558-2523 |
| EndPage | 129 |
| ExternalDocumentID | 10_1109_TAC_2019_2907414 8678843 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: NSF grantid: CCF-1817212 – fundername: ONR Young Investigator grantid: #N000141210766 – fundername: Ministry of Education - Singapore; Singapore Ministry of Education grantid: MOE2015-T2-2-148 funderid: 10.13039/501100001459 |
| GroupedDBID | -~X .DC 0R~ 29I 3EH 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK ACNCT AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD F5P HZ~ H~9 IAAWW IBMZZ ICLAB IDIHD IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P RIA RIE RNS TAE TN5 VH1 VJK ~02 AAYXX CITATION 7SC 7SP 7TB 8FD FR3 JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c291t-6188d7a23b9bcc6dbb6ddfd9933d1e9ac54927f07ca4bd6118d231d2d5bb87083 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 13 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000506851100009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0018-9286 |
| IngestDate | Mon Jun 30 10:16:43 EDT 2025 Tue Nov 18 22:18:32 EST 2025 Sat Nov 29 05:40:54 EST 2025 Wed Aug 27 02:38:53 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c291t-6188d7a23b9bcc6dbb6ddfd9933d1e9ac54927f07ca4bd6118d231d2d5bb87083 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0002-4057-0302 0000-0003-3786-8682 0000-0002-9518-4310 0000-0002-4660-6679 |
| PQID | 2332197908 |
| PQPubID | 85475 |
| PageCount | 15 |
| ParticipantIDs | crossref_primary_10_1109_TAC_2019_2907414 ieee_primary_8678843 crossref_citationtrail_10_1109_TAC_2019_2907414 proquest_journals_2332197908 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-Jan. 2020-1-00 20200101 |
| PublicationDateYYYYMMDD | 2020-01-01 |
| PublicationDate_xml | – month: 01 year: 2020 text: 2020-Jan. |
| PublicationDecade | 2020 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on automatic control |
| PublicationTitleAbbrev | TAC |
| PublicationYear | 2020 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref12 szepesvári (ref15) 2001; 14 munos (ref9) 2008; 9 ref30 ref11 ref32 ref10 ref1 ref17 ref16 powell (ref8) 2007 konda (ref20) 0 rahimi (ref26) 0 sutton (ref22) 0 anthony (ref31) 2009 bertsekas (ref7) 2010 mnih (ref24) 2015; 518 ref23 ref25 grünewälder (ref14) 2012 puterman (ref27) 2014 ref21 bhat (ref18) 0 munos (ref19) 0; 3 sutton (ref4) 1998; 1 ref28 ref29 ref3 ref6 ref5 bertsekas (ref2) 2011; 2 |
| References_xml | – ident: ref30 doi: 10.1090/mbk/058 – ident: ref12 doi: 10.1287/moor.1040.0094 – start-page: 386 year: 0 ident: ref18 article-title: Non-parametric approximate dynamic programming via the kernel method publication-title: Proc Adv Neural Inf Process Syst – volume: 1 year: 1998 ident: ref4 publication-title: Reinforcement Learning An Introduction – year: 2014 ident: ref27 publication-title: Markov Decision Processes Discrete Stochastic Dynamic Programming – start-page: 1057 year: 0 ident: ref22 article-title: Policy gradient methods for reinforcement learning with function approximation publication-title: Proc Adv Neural Inf Process Syst – ident: ref3 doi: 10.1002/9780470182963 – ident: ref6 doi: 10.3166/ejc.11.310-334 – start-page: 535 year: 2012 ident: ref14 article-title: Modelling transition dynamics in MDPS with RKHS embeddings publication-title: ICML – year: 2009 ident: ref31 publication-title: Neural Network Learning Theoretical Foundations – volume: 14 start-page: 163 year: 2001 ident: ref15 article-title: Efficient approximate planning in continuous space markovian decision problems publication-title: AI Commun – year: 2007 ident: ref8 publication-title: Approximate Dynamic Programming Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics) doi: 10.1002/9780470182963 – start-page: 1008 year: 0 ident: ref20 article-title: Actor-critic algorithms publication-title: Proc Advances Neural Inf Process Syst – ident: ref16 doi: 10.1137/040614384 – volume: 9 start-page: 815 year: 2008 ident: ref9 article-title: Finite-time bounds for fitted value iteration publication-title: J Mach Learn Res – ident: ref29 doi: 10.1007/978-0-387-34675-5 – ident: ref23 doi: 10.1007/978-1-4899-7502-7_646-1 – ident: ref32 doi: 10.1016/0097-3165(95)90052-7 – ident: ref28 doi: 10.1016/j.acha.2005.03.001 – start-page: 1313 year: 0 ident: ref26 article-title: Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning publication-title: Proc Adv Neural Inf Process Syst – volume: 3 start-page: 560 year: 0 ident: ref19 article-title: Error bounds for approximate policy iteration publication-title: Proc Int Conf Mach Learn – ident: ref1 doi: 10.1109/CDC.2017.8264011 – ident: ref21 doi: 10.1016/j.automatica.2010.05.021 – ident: ref17 doi: 10.1287/opre.51.6.850.24925 – ident: ref11 doi: 10.2307/2171751 – volume: 518 start-page: 529 year: 2015 ident: ref24 article-title: Human-level control through deep reinforcement learning publication-title: Nature doi: 10.1038/nature14236 – ident: ref25 doi: 10.1109/ALLERTON.2008.4797607 – volume: 2 year: 2011 ident: ref2 publication-title: Dynamic Programming and Optimal Control – ident: ref10 doi: 10.1287/moor.2015.0733 – year: 2010 ident: ref7 publication-title: Dynamic Programming and Optimal Control – ident: ref13 doi: 10.1023/A:1017928328829 – ident: ref5 doi: 10.1017/S0962492900002816 |
| SSID | ssj0016441 |
| Score | 2.4197307 |
| Snippet | We propose universal randomized function approximation-based empirical value learning (EVL) algorithms for Markov decision processes. The "empirical" nature... We propose universal randomized function approximation-based empirical value learning (EVL) algorithms for Markov decision processes. The “empirical” nature... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 115 |
| SubjectTerms | Algorithms Approximation Approximation algorithms Basis functions Complexity theory Computer simulation Continuous state-space Markov decision processes (MDPs) Convergence Dynamic programming dynamic programming (DP) Empirical analysis Function approximation Function space Heuristic algorithms Hilbert space Iterative methods Machine learning Markov processes Mathematical analysis Operators (mathematics) Probabilistic logic reinforcement learning (RL) |
| Title | A Universal Empirical Dynamic Programming Algorithm for Continuous State MDPs |
| URI | https://ieeexplore.ieee.org/document/8678843 https://www.proquest.com/docview/2332197908 |
| Volume | 65 |
| WOSCitedRecordID | wos000506851100009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1558-2523 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0016441 issn: 0018-9286 databaseCode: RIE dateStart: 19630101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8JAEJ4A8aAHX2jEV_bgxcRKX7S7R6IQD0I4oOHWdHcWJQFKePj7nd2WRqMx8dbDbNPMdB7fzgvgRhEmEB76jkZEx7ReOinHsSNbQqqWDKPIto-9Psf9Ph-NxKACd2UvjNbaFp_pe_Noc_mYqY25Kmtysqw8DKpQjeMo79UqMwbGr-dWlxTY52VK0hXNYfvB1HCJe98gQS_85oLsTpUfhth6l-7B_77rEPaLKJK1c7EfQUXPj2Hvy2zBOvTarCi6IMLObDGxs0DYY76Bng3yuqwZ0bL29C1bTtbvM0YRLDPzqibzTbZZMRuJst7jYHUCL93O8OHJKXYnOMoX3poQIecYp34giecqQikjxDFSNBKgp0WqzGS2eOzGKg0lRgQzkCI99LElJakwD06hNs_m-gwYBXWaVJ1whVIhCTL1onEQK0mva7mo3QY0t-xMVDFY3Oy3mCYWYLgiIQEkRgBJIYAG3JYnFvlQjT9o64bhJV3B6wZcbiWWFFq3SvwgIAMcC5ef_37qAnZ9g5ftFcol1NbLjb6CHfWxnqyW1_aH-gRwhsiE |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1JS8NAFH64gXpwF-s6By-CsclkmzkWFxTb0kMVbyEzb6oF20oXf79vJjEoiuAthzchvJe3fPM2gFNNmEAGyD2DiJ5tvfRygT1PxVLpWEVJ4trHHptpuy2enmRnDs6rXhhjjCs-Mxf20eXycaRn9qqsLsiyiiich8U4irhfdGtVOQPr2Qu7SyrMRZWU9GW927i0VVzyglssGETfnJDbqvLDFDv_crP-vy_bgLUyjmSNQvCbMGeGW7D6ZbrgNrQarCy7IMLrwVvfTQNhV8UOetYpKrMGRMsar8-jcX_6MmAUwzI7sao_nI1mE-ZiUda66kx24OHmunt565XbEzzNZTAlTCgEpjkPFXFdJ6hUgthDikdCDIzMtZ3Nlvb8VOeRwoSABlKshxxjpUiJRbgLC8PR0OwBo7DOkLITstA6IlHmQdILU63odbGPxq9B_ZOdmS5Hi9sNF6-Zgxi-zEgAmRVAVgqgBmfVibdirMYftNuW4RVdyesaHH5KLCv1bpLxMCQTnEpf7P9-6gSWb7utZta8a98fwAq36NldqBzCwnQ8M0ewpN-n_cn42P1cHwnmy8s |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Universal+Empirical+Dynamic+Programming+Algorithm+for+Continuous+State+MDPs&rft.jtitle=IEEE+transactions+on+automatic+control&rft.au=Haskell%2C+William+B.&rft.au=Jain%2C+Rahul&rft.au=Sharma%2C+Hiteshi&rft.au=Yu%2C+Pengqian&rft.date=2020-01-01&rft.issn=0018-9286&rft.eissn=1558-2523&rft.volume=65&rft.issue=1&rft.spage=115&rft.epage=129&rft_id=info:doi/10.1109%2FTAC.2019.2907414&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TAC_2019_2907414 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9286&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9286&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9286&client=summon |