MEC-A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems
In this paper, the first probably approximately correct (PAC) algorithm for continuous deterministic systems without relying on any system dynamics is proposed. It combines the state aggregation technique and the efficient exploration principle, and makes high utilization of online observed samples....
Uloženo v:
| Vydáno v: | IEEE transaction on neural networks and learning systems Ročník 26; číslo 2; s. 346 - 356 |
|---|---|
| Hlavní autoři: | , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
United States
IEEE
01.02.2015
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Témata: | |
| ISSN: | 2162-237X, 2162-2388, 2162-2388 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | In this paper, the first probably approximately correct (PAC) algorithm for continuous deterministic systems without relying on any system dynamics is proposed. It combines the state aggregation technique and the efficient exploration principle, and makes high utilization of online observed samples. We use a grid to partition the continuous state space into different cells to save samples. A near-upper Q operator is defined to produce a near-upper Q function using samples in each cell. The corresponding greedy policy effectively balances between exploration and exploitation. With the rigorous analysis, we prove that there is a polynomial time bound of executing nonoptimal actions in our algorithm. After finite steps, the final policy reaches near optimal in the framework of PAC. The implementation requires no knowledge of systems and has less computation complexity. Simulation studies confirm that it is a better performance than other similar PAC algorithms. |
|---|---|
| AbstractList | In this paper, the first probably approximately correct (PAC) algorithm for continuous deterministic systems without relying on any system dynamics is proposed. It combines the state aggregation technique and the efficient exploration principle, and makes high utilization of online observed samples. We use a grid to partition the continuous state space into different cells to save samples. A near-upper Q operator is defined to produce a near-upper Q function using samples in each cell. The corresponding greedy policy effectively balances between exploration and exploitation. With the rigorous analysis, we prove that there is a polynomial time bound of executing nonoptimal actions in our algorithm. After finite steps, the final policy reaches near optimal in the framework of PAC. The implementation requires no knowledge of systems and has less computation complexity. Simulation studies confirm that it is a better performance than other similar PAC algorithms. In this paper, the first probably approximately correct (PAC) algorithm for continuous deterministic systems without relying on any system dynamics is proposed. It combines the state aggregation technique and the efficient exploration principle, and makes high utilization of online observed samples. We use a grid to partition the continuous state space into different cells to save samples. A near-upper Q operator is defined to produce a near-upper Q function using samples in each cell. The corresponding greedy policy effectively balances between exploration and exploitation. With the rigorous analysis, we prove that there is a polynomial time bound of executing nonoptimal actions in our algorithm. After finite steps, the final policy reaches near optimal in the framework of PAC. The implementation requires no knowledge of systems and has less computation complexity. Simulation studies confirm that it is a better performance than other similar PAC algorithms.In this paper, the first probably approximately correct (PAC) algorithm for continuous deterministic systems without relying on any system dynamics is proposed. It combines the state aggregation technique and the efficient exploration principle, and makes high utilization of online observed samples. We use a grid to partition the continuous state space into different cells to save samples. A near-upper Q operator is defined to produce a near-upper Q function using samples in each cell. The corresponding greedy policy effectively balances between exploration and exploitation. With the rigorous analysis, we prove that there is a polynomial time bound of executing nonoptimal actions in our algorithm. After finite steps, the final policy reaches near optimal in the framework of PAC. The implementation requires no knowledge of systems and has less computation complexity. Simulation studies confirm that it is a better performance than other similar PAC algorithms. |
| Author | Yuanheng Zhu Dongbin Zhao |
| Author_xml | – sequence: 1 givenname: Dongbin surname: Zhao fullname: Zhao, Dongbin – sequence: 2 givenname: Yuanheng surname: Zhu fullname: Zhu, Yuanheng |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/25474812$$D View this record in MEDLINE/PubMed |
| BookMark | eNqNkUFvFCEYhompsbX2D2hiSLx4mS0wDAPHzVq1ybqbtDXxRhjmo9LMMCswh_57qbvtoYdGLpDwPF_gfd-iozAFQOg9JQtKiTq_2WzW1wtGKF-wuqWEi1fohFHBKlZLefR0bn8do7OU7khZgjSCqzfomDW85ZKyE6R_XKyqJd6AidV2l_1oBrwNgw-Ar8AHN0ULI4SM14UIPtzi5XA7RZ9_j7hc4tUUsg_zNCf8BTLE0Qefsrf4-j5lGNM79NqZIcHZYT9FP79e3Ky-V-vtt8vVcl3ZhvFcNb1SlhDOje07VUvaOSdM3zXGEGlVLx1pHXRSKqfa3lLKlek619c1da4hUJ-iz_u5uzj9mSFlPfpkYRhMgPI4TYVQinBFyX-gDatVw6ks6Kdn6N00x1A-Uije1iVTzgr18UDN3Qi93sUSY7zXjykXQO4BG6eUIjhtfTbZl-yi8YOmRD90qv91qh861YdOi8qeqY_TX5Q-7CUPAE-CUG3JTdR_AeyYrGc |
| CODEN | ITNNAL |
| CitedBy_id | crossref_primary_10_1109_TNSM_2018_2863563 crossref_primary_10_1002_rnc_4843 crossref_primary_10_1016_j_neucom_2016_11_041 crossref_primary_10_1109_TNNLS_2016_2558295 crossref_primary_10_1016_j_neucom_2016_06_020 crossref_primary_10_1016_j_neucom_2017_01_076 crossref_primary_10_1109_ACCESS_2022_3208058 crossref_primary_10_1109_TNNLS_2016_2561300 crossref_primary_10_1016_j_knosys_2017_11_005 crossref_primary_10_1049_iet_cta_2015_0669 crossref_primary_10_1109_ACCESS_2018_2847048 crossref_primary_10_1155_2021_5549678 crossref_primary_10_1007_s10462_017_9548_4 crossref_primary_10_1080_00207721_2022_2111238 crossref_primary_10_1109_JIOT_2018_2882783 crossref_primary_10_1109_TSG_2021_3051564 crossref_primary_10_1109_TSMC_2021_3089768 crossref_primary_10_1109_TCYB_2016_2611613 crossref_primary_10_1109_TNNLS_2018_2806087 crossref_primary_10_1007_s00500_017_2526_6 crossref_primary_10_1109_TNNLS_2016_2614002 crossref_primary_10_1109_TNNLS_2016_2522401 crossref_primary_10_1109_TSC_2020_3037224 crossref_primary_10_1016_j_neucom_2017_09_020 crossref_primary_10_1109_TNNLS_2024_3467338 crossref_primary_10_1016_j_neucom_2017_05_086 crossref_primary_10_1109_TSMC_2016_2531680 crossref_primary_10_1109_TSMC_2019_2897379 crossref_primary_10_1007_s10462_017_9603_1 crossref_primary_10_1002_oca_2782 crossref_primary_10_1109_TNNLS_2018_2844165 crossref_primary_10_1002_rnc_6372 crossref_primary_10_1109_TNNLS_2018_2805689 crossref_primary_10_1109_TNNLS_2018_2790981 crossref_primary_10_1016_j_automatica_2019_01_018 crossref_primary_10_1049_iet_cta_2015_0769 crossref_primary_10_1109_TSMC_2018_2837899 crossref_primary_10_1016_j_neucom_2018_02_107 crossref_primary_10_1007_s12652_019_01503_y crossref_primary_10_1016_j_neucom_2016_02_029 crossref_primary_10_1109_TCYB_2021_3053414 crossref_primary_10_1109_TCYB_2018_2823199 crossref_primary_10_1109_TNNLS_2020_3041469 crossref_primary_10_1007_s11071_021_06908_z crossref_primary_10_1007_s12559_015_9350_z crossref_primary_10_1016_j_neunet_2018_05_005 crossref_primary_10_1109_TCYB_2020_2982168 crossref_primary_10_1109_TSMC_2018_2810117 crossref_primary_10_1002_oca_2855 crossref_primary_10_1109_TII_2019_2933867 crossref_primary_10_1049_iet_cta_2020_0098 crossref_primary_10_1007_s40815_018_0586_0 crossref_primary_10_1016_j_neucom_2017_01_047 crossref_primary_10_1109_TNNLS_2017_2654539 |
| Cites_doi | 10.1109/TSMC.2013.2295351 10.1049/iet-cta.2011.0783 10.1145/1102351.1102459 10.1109/TSMCB.2008.920269 10.1007/s10994-010-5202-y 10.1109/TNNLS.2013.2280013 10.1016/j.neucom.2013.04.006 10.1109/TNNLS.2013.2281663 10.1016/j.neucom.2011.05.032 10.1007/s10994-010-5186-7 10.1109/TSMCB.2012.2203336 10.1049/iet-cta.2012.0486 10.1016/j.neucom.2013.06.037 10.1109/IJCNN.2013.6706755 10.1016/j.automatica.2012.05.049 10.1016/j.neunet.2012.02.027 10.1016/j.neucom.2012.09.034 10.23919/ACC.1989.4790360 10.1023/A:1017984413808 10.1109/TNNLS.2012.2236354 10.1145/1143844.1143955 10.1109/TSMCB.2012.2216523 10.1007/978-3-642-15880-3_44 10.1109/TSMCB.2008.922019 10.1109/TNNLS.2013.2270561 10.1007/978-1-4471-4757-2 10.1007/s00500-013-1110-y 10.1109/TITS.2011.2122257 10.1201/9781439821091 10.1109/TCYB.2014.2357896 10.1007/978-3-540-73580-9_21 10.1049/iet-cta.2013.0472 10.1109/TASE.2014.2300532 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Feb 2015 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Feb 2015 |
| DBID | 97E RIA RIE AAYXX CITATION NPM 7QF 7QO 7QP 7QQ 7QR 7SC 7SE 7SP 7SR 7TA 7TB 7TK 7U5 8BQ 8FD F28 FR3 H8D JG9 JQ2 KR7 L7M L~C L~D P64 7X8 |
| DOI | 10.1109/TNNLS.2014.2371046 |
| DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef PubMed Aluminium Industry Abstracts Biotechnology Research Abstracts Calcium & Calcified Tissue Abstracts Ceramic Abstracts Chemoreception Abstracts Computer and Information Systems Abstracts Corrosion Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts Materials Business File Mechanical & Transportation Engineering Abstracts Neurosciences Abstracts Solid State and Superconductivity Abstracts METADEX Technology Research Database ANTE: Abstracts in New Technology & Engineering Engineering Research Database Aerospace Database Materials Research Database ProQuest Computer Science Collection Civil Engineering Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Biotechnology and BioEngineering Abstracts MEDLINE - Academic |
| DatabaseTitle | CrossRef PubMed Materials Research Database Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Materials Business File Aerospace Database Engineered Materials Abstracts Biotechnology Research Abstracts Chemoreception Abstracts Advanced Technologies Database with Aerospace ANTE: Abstracts in New Technology & Engineering Civil Engineering Abstracts Aluminium Industry Abstracts Electronics & Communications Abstracts Ceramic Abstracts Neurosciences Abstracts METADEX Biotechnology and BioEngineering Abstracts Computer and Information Systems Abstracts Professional Solid State and Superconductivity Abstracts Engineering Research Database Calcium & Calcified Tissue Abstracts Corrosion Abstracts MEDLINE - Academic |
| DatabaseTitleList | Technology Research Database MEDLINE - Academic Materials Research Database PubMed |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher – sequence: 3 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2162-2388 |
| EndPage | 356 |
| ExternalDocumentID | 3564561811 25474812 10_1109_TNNLS_2014_2371046 6971146 |
| Genre | orig-research Research Support, Non-U.S. Gov't Journal Article |
| GrantInformation_xml | – fundername: National Natural Science Foundation of China grantid: 61273136; 61034002 funderid: 10.13039/501100001809 – fundername: Natural Science Foundation of Beijing grantid: 4122083 |
| GroupedDBID | 0R~ 4.4 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACIWK ACPRK AENEX AFRAH AGQYO AGSQL AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD IFIPE IPLJI JAVBF M43 MS~ O9- OCL PQQKQ RIA RIE RNS AAYXX CITATION NPM RIG 7QF 7QO 7QP 7QQ 7QR 7SC 7SE 7SP 7SR 7TA 7TB 7TK 7U5 8BQ 8FD F28 FR3 H8D JG9 JQ2 KR7 L7M L~C L~D P64 7X8 |
| ID | FETCH-LOGICAL-c524t-5d99c0044acdb9381bff6adb5aa08c9d8f07feb889f97dc1149abbfd331ff50e3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 63 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000348856200012&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2162-237X 2162-2388 |
| IngestDate | Sun Sep 28 07:02:22 EDT 2025 Sun Nov 09 14:07:14 EST 2025 Sun Nov 30 05:06:48 EST 2025 Mon Jul 21 05:55:54 EDT 2025 Sat Nov 29 01:39:51 EST 2025 Tue Nov 18 22:53:29 EST 2025 Tue Aug 26 16:37:37 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Issue | 2 |
| Keywords | Efficient exploration probably approximately correct (PAC) state aggregation reinforcement learning (RL) |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c524t-5d99c0044acdb9381bff6adb5aa08c9d8f07feb889f97dc1149abbfd331ff50e3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| PMID | 25474812 |
| PQID | 1647300042 |
| PQPubID | 85436 |
| PageCount | 11 |
| ParticipantIDs | ieee_primary_6971146 proquest_journals_1647300042 proquest_miscellaneous_1652395418 proquest_miscellaneous_1669904910 crossref_primary_10_1109_TNNLS_2014_2371046 pubmed_primary_25474812 crossref_citationtrail_10_1109_TNNLS_2014_2371046 |
| PublicationCentury | 2000 |
| PublicationDate | 2015-02-01 |
| PublicationDateYYYYMMDD | 2015-02-01 |
| PublicationDate_xml | – month: 02 year: 2015 text: 2015-02-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States – name: Piscataway |
| PublicationTitle | IEEE transaction on neural networks and learning systems |
| PublicationTitleAbbrev | TNNLS |
| PublicationTitleAlternate | IEEE Trans Neural Netw Learn Syst |
| PublicationYear | 2015 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref35 sutton (ref1) 1998 ref37 ref15 ref14 ref31 ref11 ref32 liu (ref22) 2008; 38 brafman (ref30) 2003; 3 ref39 ref38 kakade (ref33) 2003 ref16 ref19 ref18 liu (ref20) 2014; 25 ref24 ref23 ref26 ref25 ref42 ref21 ref43 li (ref40) 2009; 2 zhang (ref10) 2013 thrun (ref13) 1992 liu (ref17) 2014; 25 ref28 nouri (ref34) 2008; 21 ref29 ref8 thrun (ref12) 1992 ref9 liu (ref41) 2013; 43 ref3 bai (ref27) 2009; 5 ref6 ref5 liu (ref4) 2014; 25 pazis (ref36) 2013 li (ref7) 1989 busoniu (ref2) 2010 |
| References_xml | – ident: ref14 doi: 10.1109/TSMC.2013.2295351 – ident: ref42 doi: 10.1049/iet-cta.2011.0783 – year: 1992 ident: ref13 – ident: ref31 doi: 10.1145/1102351.1102459 – ident: ref28 doi: 10.1109/TSMCB.2008.920269 – ident: ref38 doi: 10.1007/s10994-010-5202-y – volume: 2 start-page: 733 year: 2009 ident: ref40 article-title: Online exploration in least-squares policy iteration publication-title: Proc 8th Int Conf Auto Agents Multiagent Syst (AAMAS) – volume: 5 start-page: 3471 year: 2009 ident: ref27 article-title: The application of ADHDP( $\lambda $ ) method to coordinated multiple ramps metering publication-title: Int J Innovative Comput – volume: 25 start-page: 418 year: 2014 ident: ref17 article-title: Decentralized stabilization for a class of continuous-time nonlinear interconnected systems using online learning optimal control approach publication-title: IEEE Trans Neural Netw Learn Syst doi: 10.1109/TNNLS.2013.2280013 – ident: ref6 doi: 10.1016/j.neucom.2013.04.006 – volume: 25 start-page: 621 year: 2014 ident: ref4 article-title: Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems publication-title: IEEE Trans Neural Netw Learn Syst doi: 10.1109/TNNLS.2013.2281663 – ident: ref25 doi: 10.1016/j.neucom.2011.05.032 – year: 1992 ident: ref12 article-title: The role of exploration in learning control publication-title: Handbook for Intelligent Control Neural Fuzzy and Adaptive Approaches – ident: ref35 doi: 10.1007/s10994-010-5186-7 – volume: 21 start-page: 1209 year: 2008 ident: ref34 article-title: Multi-resolution exploration in continuous spaces publication-title: Proc Adv Neural Inf Process Syst (NIPS) – ident: ref11 doi: 10.1109/TSMCB.2012.2203336 – start-page: 306 year: 2003 ident: ref33 article-title: Exploration in metric state spaces publication-title: Proc 20th Int Conf Mach Learn (ICML) – ident: ref5 doi: 10.1049/iet-cta.2012.0486 – ident: ref3 doi: 10.1016/j.neucom.2013.06.037 – ident: ref43 doi: 10.1109/IJCNN.2013.6706755 – ident: ref9 doi: 10.1016/j.automatica.2012.05.049 – ident: ref8 doi: 10.1016/j.neunet.2012.02.027 – ident: ref23 doi: 10.1016/j.neucom.2012.09.034 – start-page: 1136 year: 1989 ident: ref7 article-title: neural network control of unknown nonlinear systems publication-title: 1989 American Control Conference ACC doi: 10.23919/ACC.1989.4790360 – ident: ref29 doi: 10.1023/A:1017984413808 – year: 2013 ident: ref36 article-title: PAC optimal exploration in continuous space Markov decision processes publication-title: Proc AAAI Conf Artif Intell – year: 1998 ident: ref1 publication-title: Reinforcement Learning An Introduction – volume: 3 start-page: 213 year: 2003 ident: ref30 article-title: R-max-A general polynomial time algorithm for near-optimal reinforcement learning publication-title: J Mach Learn Res – ident: ref18 doi: 10.1109/TNNLS.2012.2236354 – ident: ref32 doi: 10.1145/1143844.1143955 – volume: 43 start-page: 779 year: 2013 ident: ref41 article-title: Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems publication-title: IEEE Trans Cybern doi: 10.1109/TSMCB.2012.2216523 – ident: ref39 doi: 10.1007/978-3-642-15880-3_44 – volume: 38 start-page: 988 year: 2008 ident: ref22 article-title: Adaptive critic learning techniques for engine torque and air-fuel ratio control publication-title: IEEE Trans Syst Man Cybern B Cybern doi: 10.1109/TSMCB.2008.922019 – ident: ref19 doi: 10.1109/TNNLS.2013.2270561 – year: 2013 ident: ref10 publication-title: Adaptive Dynamic Programming for Control Algorithms and Stability doi: 10.1007/978-1-4471-4757-2 – ident: ref24 doi: 10.1007/s00500-013-1110-y – ident: ref26 doi: 10.1109/TITS.2011.2122257 – year: 2010 ident: ref2 publication-title: Reinforcement Learning and Dynamic Programming Using Function Approximators doi: 10.1201/9781439821091 – volume: 25 start-page: 418 year: 2014 ident: ref20 article-title: Decentralized stabilization for a class of continuous-time nonlinear interconnected systems using online learning optimal control approach publication-title: IEEE Trans Neural Netw Learn Syst doi: 10.1109/TNNLS.2013.2280013 – ident: ref16 doi: 10.1109/TCYB.2014.2357896 – ident: ref37 doi: 10.1007/978-3-540-73580-9_21 – ident: ref21 doi: 10.1049/iet-cta.2013.0472 – ident: ref15 doi: 10.1109/TASE.2014.2300532 |
| SSID | ssj0000605649 |
| Score | 2.4099205 |
| Snippet | In this paper, the first probably approximately correct (PAC) algorithm for continuous deterministic systems without relying on any system dynamics is... |
| SourceID | proquest pubmed crossref ieee |
| SourceType | Aggregation Database Index Database Enrichment Source Publisher |
| StartPage | 346 |
| SubjectTerms | Algorithm design and analysis Algorithms Approximation algorithms Computer simulation Dynamical systems Dynamics Efficient exploration Exploration Heuristic algorithms Learning Learning systems Mathematical analysis Neural networks Partitioning algorithms Policies Polynomials probably approximately correct (PAC) reinforcement learning (RL) state aggregation Upper bound |
| Title | MEC-A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems |
| URI | https://ieeexplore.ieee.org/document/6971146 https://www.ncbi.nlm.nih.gov/pubmed/25474812 https://www.proquest.com/docview/1647300042 https://www.proquest.com/docview/1652395418 https://www.proquest.com/docview/1669904910 |
| Volume | 26 |
| WOSCitedRecordID | wos000348856200012&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 2162-2388 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000605649 issn: 2162-237X databaseCode: RIE dateStart: 20120101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB4B4sCltDzaUIqM1Bs15OXYPq4oiAMEVCjaW5T4QVeCLNoHv5-x40Sq1CJxi2RbmXjGmc8ezzcA33WacZM3jDKpJG5QuKYIQxRNtEPziKELT-pzf8nLUozH8mYFfgy5MMYYf_nMHLtHH8vXU7V0R2UnheQuiXYVVjkvulyt4TwlRlxeeLSbJkVKUYRxnyMTy5O7sry8dRe58mNscXFNxwLMcp6LJP3LJfkaK_-Hm97tnG--T-CP8CHASzLq7OETrJh2Czb70g0krORtqK7OTumIlGjn9Br_Gk84qGMdJb-MJ1NV_tyQBP7VBzJ6fJjOJos_TwQbiSO1mrTL6XJOfoYLNZ7xmQQG9B34fX52d3pBQ60FqliaLyjTqCEX3a2VbiS68cbaotYNq-tYKKmFjbk1jRDSSq4VfpWsm8bqLEusZbHJdmGtnbbmCxAtdZw2eYFIBOFJpqXgaa0Sm-FvtRaaR5D0012pQETu6mE8Vn5DEsvKa6ty2qqCtiI4GsY8dzQcb_bedroYegY1RLDfa7UKK3VeOT61zCHbNILDoRnXmAuc1K3BicQ-uF2XLE_EW30KdOw5oq8IPncWM7y_N7S9f8v1FTZQetbdBd-HtcVsab7BunpZTOazAzT2sTjwxv4Kaon3uw |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB6VggQXCpRCoICRuEHaxLET-7gqrYrYBgQL2puV-FFWarNoH_x-xo4TCQkqcYtkW7E9Y89nj-cbgDeGFpVlLU-51BIPKJVJEYboNDcezSOGLgOpz_dpVddiPpefd-DdGAtjrQ2Pz-yR_wy-fLPUW39VdlzKygfR3oLbnDGa9dFa441Khsi8DHiX5iVNsRPzIUomk8ezup5-9U-52BGWeM-m5wHmrGIip38YpZBl5d-AMxies73_6_IDuB8BJpn0GvEQdmz3CPaG5A0kruV9UBenJ-mE1Kjp6SfcN66xUc87Sr7YQKeqw80hiQysl2RydblcLTY_rgkWEk9rtei2y-2avI9PagLnM4kc6I_h29np7OQ8jdkWUs0p26TcoIy8f7fRppVoyFvnysa0vGkyoaURLqucbYWQTlZG46hk07bOFEXuHM9scQC73bKzT4EYaTLashKxCAKUwkhR0UbnrsCNtRGmSiAfplvpSEXuM2JcqXAkyaQK0lJeWipKK4G3Y5ufPRHHjbX3vSzGmlEMCRwOUlVxra6VZ1QrPLalCbwei3GVeddJ01mcSKyDB3bJWS5uqlOiaWeIvxJ40mvM-P9B0Z79vV-v4O757GKqph_qj8_hHo6E9y_DD2F3s9raF3BH_9os1quXQeV_A5i0-ho |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=MEC--a+near-optimal+online+reinforcement+learning+algorithm+for+continuous+deterministic+systems&rft.jtitle=IEEE+transaction+on+neural+networks+and+learning+systems&rft.au=Zhao%2C+Dongbin&rft.au=Zhu%2C+Yuanheng&rft.date=2015-02-01&rft.issn=2162-2388&rft.eissn=2162-2388&rft.volume=26&rft.issue=2&rft.spage=346&rft_id=info:doi/10.1109%2FTNNLS.2014.2371046&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2162-237X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2162-237X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2162-237X&client=summon |