MEC-A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems

In this paper, the first probably approximately correct (PAC) algorithm for continuous deterministic systems without relying on any system dynamics is proposed. It combines the state aggregation technique and the efficient exploration principle, and makes high utilization of online observed samples....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transaction on neural networks and learning systems Jg. 26; H. 2; S. 346 - 356
Hauptverfasser: Zhao, Dongbin, Zhu, Yuanheng
Format: Journal Article
Sprache:Englisch
Veröffentlicht: United States IEEE 01.02.2015
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:
ISSN:2162-237X, 2162-2388, 2162-2388
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract In this paper, the first probably approximately correct (PAC) algorithm for continuous deterministic systems without relying on any system dynamics is proposed. It combines the state aggregation technique and the efficient exploration principle, and makes high utilization of online observed samples. We use a grid to partition the continuous state space into different cells to save samples. A near-upper Q operator is defined to produce a near-upper Q function using samples in each cell. The corresponding greedy policy effectively balances between exploration and exploitation. With the rigorous analysis, we prove that there is a polynomial time bound of executing nonoptimal actions in our algorithm. After finite steps, the final policy reaches near optimal in the framework of PAC. The implementation requires no knowledge of systems and has less computation complexity. Simulation studies confirm that it is a better performance than other similar PAC algorithms.
AbstractList In this paper, the first probably approximately correct (PAC) algorithm for continuous deterministic systems without relying on any system dynamics is proposed. It combines the state aggregation technique and the efficient exploration principle, and makes high utilization of online observed samples. We use a grid to partition the continuous state space into different cells to save samples. A near-upper Q operator is defined to produce a near-upper Q function using samples in each cell. The corresponding greedy policy effectively balances between exploration and exploitation. With the rigorous analysis, we prove that there is a polynomial time bound of executing nonoptimal actions in our algorithm. After finite steps, the final policy reaches near optimal in the framework of PAC. The implementation requires no knowledge of systems and has less computation complexity. Simulation studies confirm that it is a better performance than other similar PAC algorithms.
In this paper, the first probably approximately correct (PAC) algorithm for continuous deterministic systems without relying on any system dynamics is proposed. It combines the state aggregation technique and the efficient exploration principle, and makes high utilization of online observed samples. We use a grid to partition the continuous state space into different cells to save samples. A near-upper Q operator is defined to produce a near-upper Q function using samples in each cell. The corresponding greedy policy effectively balances between exploration and exploitation. With the rigorous analysis, we prove that there is a polynomial time bound of executing nonoptimal actions in our algorithm. After finite steps, the final policy reaches near optimal in the framework of PAC. The implementation requires no knowledge of systems and has less computation complexity. Simulation studies confirm that it is a better performance than other similar PAC algorithms.In this paper, the first probably approximately correct (PAC) algorithm for continuous deterministic systems without relying on any system dynamics is proposed. It combines the state aggregation technique and the efficient exploration principle, and makes high utilization of online observed samples. We use a grid to partition the continuous state space into different cells to save samples. A near-upper Q operator is defined to produce a near-upper Q function using samples in each cell. The corresponding greedy policy effectively balances between exploration and exploitation. With the rigorous analysis, we prove that there is a polynomial time bound of executing nonoptimal actions in our algorithm. After finite steps, the final policy reaches near optimal in the framework of PAC. The implementation requires no knowledge of systems and has less computation complexity. Simulation studies confirm that it is a better performance than other similar PAC algorithms.
Author Yuanheng Zhu
Dongbin Zhao
Author_xml – sequence: 1
  givenname: Dongbin
  surname: Zhao
  fullname: Zhao, Dongbin
– sequence: 2
  givenname: Yuanheng
  surname: Zhu
  fullname: Zhu, Yuanheng
BackLink https://www.ncbi.nlm.nih.gov/pubmed/25474812$$D View this record in MEDLINE/PubMed
BookMark eNqNkUFvFCEYhompsbX2D2hiSLx4mS0wDAPHzVq1ybqbtDXxRhjmo9LMMCswh_57qbvtoYdGLpDwPF_gfd-iozAFQOg9JQtKiTq_2WzW1wtGKF-wuqWEi1fohFHBKlZLefR0bn8do7OU7khZgjSCqzfomDW85ZKyE6R_XKyqJd6AidV2l_1oBrwNgw-Ar8AHN0ULI4SM14UIPtzi5XA7RZ9_j7hc4tUUsg_zNCf8BTLE0Qefsrf4-j5lGNM79NqZIcHZYT9FP79e3Ky-V-vtt8vVcl3ZhvFcNb1SlhDOje07VUvaOSdM3zXGEGlVLx1pHXRSKqfa3lLKlek619c1da4hUJ-iz_u5uzj9mSFlPfpkYRhMgPI4TYVQinBFyX-gDatVw6ks6Kdn6N00x1A-Uije1iVTzgr18UDN3Qi93sUSY7zXjykXQO4BG6eUIjhtfTbZl-yi8YOmRD90qv91qh861YdOi8qeqY_TX5Q-7CUPAE-CUG3JTdR_AeyYrGc
CODEN ITNNAL
CitedBy_id crossref_primary_10_1109_TNSM_2018_2863563
crossref_primary_10_1002_rnc_4843
crossref_primary_10_1016_j_neucom_2016_11_041
crossref_primary_10_1109_TNNLS_2016_2558295
crossref_primary_10_1016_j_neucom_2016_06_020
crossref_primary_10_1016_j_neucom_2017_01_076
crossref_primary_10_1109_ACCESS_2022_3208058
crossref_primary_10_1109_TNNLS_2016_2561300
crossref_primary_10_1016_j_knosys_2017_11_005
crossref_primary_10_1049_iet_cta_2015_0669
crossref_primary_10_1109_ACCESS_2018_2847048
crossref_primary_10_1155_2021_5549678
crossref_primary_10_1007_s10462_017_9548_4
crossref_primary_10_1080_00207721_2022_2111238
crossref_primary_10_1109_JIOT_2018_2882783
crossref_primary_10_1109_TSG_2021_3051564
crossref_primary_10_1109_TSMC_2021_3089768
crossref_primary_10_1109_TCYB_2016_2611613
crossref_primary_10_1109_TNNLS_2018_2806087
crossref_primary_10_1007_s00500_017_2526_6
crossref_primary_10_1109_TNNLS_2016_2614002
crossref_primary_10_1109_TNNLS_2016_2522401
crossref_primary_10_1109_TSC_2020_3037224
crossref_primary_10_1016_j_neucom_2017_09_020
crossref_primary_10_1109_TNNLS_2024_3467338
crossref_primary_10_1016_j_neucom_2017_05_086
crossref_primary_10_1109_TSMC_2016_2531680
crossref_primary_10_1109_TSMC_2019_2897379
crossref_primary_10_1007_s10462_017_9603_1
crossref_primary_10_1002_oca_2782
crossref_primary_10_1109_TNNLS_2018_2844165
crossref_primary_10_1002_rnc_6372
crossref_primary_10_1109_TNNLS_2018_2805689
crossref_primary_10_1109_TNNLS_2018_2790981
crossref_primary_10_1016_j_automatica_2019_01_018
crossref_primary_10_1049_iet_cta_2015_0769
crossref_primary_10_1109_TSMC_2018_2837899
crossref_primary_10_1016_j_neucom_2018_02_107
crossref_primary_10_1007_s12652_019_01503_y
crossref_primary_10_1016_j_neucom_2016_02_029
crossref_primary_10_1109_TCYB_2021_3053414
crossref_primary_10_1109_TCYB_2018_2823199
crossref_primary_10_1109_TNNLS_2020_3041469
crossref_primary_10_1007_s11071_021_06908_z
crossref_primary_10_1007_s12559_015_9350_z
crossref_primary_10_1016_j_neunet_2018_05_005
crossref_primary_10_1109_TCYB_2020_2982168
crossref_primary_10_1109_TSMC_2018_2810117
crossref_primary_10_1002_oca_2855
crossref_primary_10_1109_TII_2019_2933867
crossref_primary_10_1049_iet_cta_2020_0098
crossref_primary_10_1007_s40815_018_0586_0
crossref_primary_10_1016_j_neucom_2017_01_047
crossref_primary_10_1109_TNNLS_2017_2654539
Cites_doi 10.1109/TSMC.2013.2295351
10.1049/iet-cta.2011.0783
10.1145/1102351.1102459
10.1109/TSMCB.2008.920269
10.1007/s10994-010-5202-y
10.1109/TNNLS.2013.2280013
10.1016/j.neucom.2013.04.006
10.1109/TNNLS.2013.2281663
10.1016/j.neucom.2011.05.032
10.1007/s10994-010-5186-7
10.1109/TSMCB.2012.2203336
10.1049/iet-cta.2012.0486
10.1016/j.neucom.2013.06.037
10.1109/IJCNN.2013.6706755
10.1016/j.automatica.2012.05.049
10.1016/j.neunet.2012.02.027
10.1016/j.neucom.2012.09.034
10.23919/ACC.1989.4790360
10.1023/A:1017984413808
10.1109/TNNLS.2012.2236354
10.1145/1143844.1143955
10.1109/TSMCB.2012.2216523
10.1007/978-3-642-15880-3_44
10.1109/TSMCB.2008.922019
10.1109/TNNLS.2013.2270561
10.1007/978-1-4471-4757-2
10.1007/s00500-013-1110-y
10.1109/TITS.2011.2122257
10.1201/9781439821091
10.1109/TCYB.2014.2357896
10.1007/978-3-540-73580-9_21
10.1049/iet-cta.2013.0472
10.1109/TASE.2014.2300532
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Feb 2015
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Feb 2015
DBID 97E
RIA
RIE
AAYXX
CITATION
NPM
7QF
7QO
7QP
7QQ
7QR
7SC
7SE
7SP
7SR
7TA
7TB
7TK
7U5
8BQ
8FD
F28
FR3
H8D
JG9
JQ2
KR7
L7M
L~C
L~D
P64
7X8
DOI 10.1109/TNNLS.2014.2371046
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE/IET Electronic Library (IEL) (UW System Shared)
CrossRef
PubMed
Aluminium Industry Abstracts
Biotechnology Research Abstracts
Calcium & Calcified Tissue Abstracts
Ceramic Abstracts
Chemoreception Abstracts
Computer and Information Systems Abstracts
Corrosion Abstracts
Electronics & Communications Abstracts
Engineered Materials Abstracts
Materials Business File
Mechanical & Transportation Engineering Abstracts
Neurosciences Abstracts
Solid State and Superconductivity Abstracts
METADEX
Technology Research Database
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
Aerospace Database
Materials Research Database
ProQuest Computer Science Collection
Civil Engineering Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Biotechnology and BioEngineering Abstracts
MEDLINE - Academic
DatabaseTitle CrossRef
PubMed
Materials Research Database
Technology Research Database
Computer and Information Systems Abstracts – Academic
Mechanical & Transportation Engineering Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Materials Business File
Aerospace Database
Engineered Materials Abstracts
Biotechnology Research Abstracts
Chemoreception Abstracts
Advanced Technologies Database with Aerospace
ANTE: Abstracts in New Technology & Engineering
Civil Engineering Abstracts
Aluminium Industry Abstracts
Electronics & Communications Abstracts
Ceramic Abstracts
Neurosciences Abstracts
METADEX
Biotechnology and BioEngineering Abstracts
Computer and Information Systems Abstracts Professional
Solid State and Superconductivity Abstracts
Engineering Research Database
Calcium & Calcified Tissue Abstracts
Corrosion Abstracts
MEDLINE - Academic
DatabaseTitleList Technology Research Database
MEDLINE - Academic
Materials Research Database
PubMed

Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
– sequence: 3
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2162-2388
EndPage 356
ExternalDocumentID 3564561811
25474812
10_1109_TNNLS_2014_2371046
6971146
Genre orig-research
Research Support, Non-U.S. Gov't
Journal Article
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 61273136; 61034002
  funderid: 10.13039/501100001809
– fundername: Natural Science Foundation of Beijing
  grantid: 4122083
GroupedDBID 0R~
4.4
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACIWK
ACPRK
AENEX
AFRAH
AGQYO
AGSQL
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
IFIPE
IPLJI
JAVBF
M43
MS~
O9-
OCL
PQQKQ
RIA
RIE
RNS
AAYXX
CITATION
NPM
RIG
7QF
7QO
7QP
7QQ
7QR
7SC
7SE
7SP
7SR
7TA
7TB
7TK
7U5
8BQ
8FD
F28
FR3
H8D
JG9
JQ2
KR7
L7M
L~C
L~D
P64
7X8
ID FETCH-LOGICAL-c524t-5d99c0044acdb9381bff6adb5aa08c9d8f07feb889f97dc1149abbfd331ff50e3
IEDL.DBID RIE
ISICitedReferencesCount 63
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000348856200012&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2162-237X
2162-2388
IngestDate Sun Sep 28 07:02:22 EDT 2025
Sun Nov 09 14:07:14 EST 2025
Sun Nov 30 05:06:48 EST 2025
Mon Jul 21 05:55:54 EDT 2025
Sat Nov 29 01:39:51 EST 2025
Tue Nov 18 22:53:29 EST 2025
Tue Aug 26 16:37:37 EDT 2025
IsPeerReviewed false
IsScholarly true
Issue 2
Keywords Efficient exploration
probably approximately correct (PAC)
state aggregation
reinforcement learning (RL)
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c524t-5d99c0044acdb9381bff6adb5aa08c9d8f07feb889f97dc1149abbfd331ff50e3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
PMID 25474812
PQID 1647300042
PQPubID 85436
PageCount 11
ParticipantIDs ieee_primary_6971146
proquest_journals_1647300042
proquest_miscellaneous_1652395418
proquest_miscellaneous_1669904910
crossref_primary_10_1109_TNNLS_2014_2371046
pubmed_primary_25474812
crossref_citationtrail_10_1109_TNNLS_2014_2371046
PublicationCentury 2000
PublicationDate 2015-02-01
PublicationDateYYYYMMDD 2015-02-01
PublicationDate_xml – month: 02
  year: 2015
  text: 2015-02-01
  day: 01
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: Piscataway
PublicationTitle IEEE transaction on neural networks and learning systems
PublicationTitleAbbrev TNNLS
PublicationTitleAlternate IEEE Trans Neural Netw Learn Syst
PublicationYear 2015
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref35
sutton (ref1) 1998
ref37
ref15
ref14
ref31
ref11
ref32
liu (ref22) 2008; 38
brafman (ref30) 2003; 3
ref39
ref38
kakade (ref33) 2003
ref16
ref19
ref18
liu (ref20) 2014; 25
ref24
ref23
ref26
ref25
ref42
ref21
ref43
li (ref40) 2009; 2
zhang (ref10) 2013
thrun (ref13) 1992
liu (ref17) 2014; 25
ref28
nouri (ref34) 2008; 21
ref29
ref8
thrun (ref12) 1992
ref9
liu (ref41) 2013; 43
ref3
bai (ref27) 2009; 5
ref6
ref5
liu (ref4) 2014; 25
pazis (ref36) 2013
li (ref7) 1989
busoniu (ref2) 2010
References_xml – ident: ref14
  doi: 10.1109/TSMC.2013.2295351
– ident: ref42
  doi: 10.1049/iet-cta.2011.0783
– year: 1992
  ident: ref13
– ident: ref31
  doi: 10.1145/1102351.1102459
– ident: ref28
  doi: 10.1109/TSMCB.2008.920269
– ident: ref38
  doi: 10.1007/s10994-010-5202-y
– volume: 2
  start-page: 733
  year: 2009
  ident: ref40
  article-title: Online exploration in least-squares policy iteration
  publication-title: Proc 8th Int Conf Auto Agents Multiagent Syst (AAMAS)
– volume: 5
  start-page: 3471
  year: 2009
  ident: ref27
  article-title: The application of ADHDP( $\lambda $ ) method to coordinated multiple ramps metering
  publication-title: Int J Innovative Comput
– volume: 25
  start-page: 418
  year: 2014
  ident: ref17
  article-title: Decentralized stabilization for a class of continuous-time nonlinear interconnected systems using online learning optimal control approach
  publication-title: IEEE Trans Neural Netw Learn Syst
  doi: 10.1109/TNNLS.2013.2280013
– ident: ref6
  doi: 10.1016/j.neucom.2013.04.006
– volume: 25
  start-page: 621
  year: 2014
  ident: ref4
  article-title: Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems
  publication-title: IEEE Trans Neural Netw Learn Syst
  doi: 10.1109/TNNLS.2013.2281663
– ident: ref25
  doi: 10.1016/j.neucom.2011.05.032
– year: 1992
  ident: ref12
  article-title: The role of exploration in learning control
  publication-title: Handbook for Intelligent Control Neural Fuzzy and Adaptive Approaches
– ident: ref35
  doi: 10.1007/s10994-010-5186-7
– volume: 21
  start-page: 1209
  year: 2008
  ident: ref34
  article-title: Multi-resolution exploration in continuous spaces
  publication-title: Proc Adv Neural Inf Process Syst (NIPS)
– ident: ref11
  doi: 10.1109/TSMCB.2012.2203336
– start-page: 306
  year: 2003
  ident: ref33
  article-title: Exploration in metric state spaces
  publication-title: Proc 20th Int Conf Mach Learn (ICML)
– ident: ref5
  doi: 10.1049/iet-cta.2012.0486
– ident: ref3
  doi: 10.1016/j.neucom.2013.06.037
– ident: ref43
  doi: 10.1109/IJCNN.2013.6706755
– ident: ref9
  doi: 10.1016/j.automatica.2012.05.049
– ident: ref8
  doi: 10.1016/j.neunet.2012.02.027
– ident: ref23
  doi: 10.1016/j.neucom.2012.09.034
– start-page: 1136
  year: 1989
  ident: ref7
  article-title: neural network control of unknown nonlinear systems
  publication-title: 1989 American Control Conference ACC
  doi: 10.23919/ACC.1989.4790360
– ident: ref29
  doi: 10.1023/A:1017984413808
– year: 2013
  ident: ref36
  article-title: PAC optimal exploration in continuous space Markov decision processes
  publication-title: Proc AAAI Conf Artif Intell
– year: 1998
  ident: ref1
  publication-title: Reinforcement Learning An Introduction
– volume: 3
  start-page: 213
  year: 2003
  ident: ref30
  article-title: R-max-A general polynomial time algorithm for near-optimal reinforcement learning
  publication-title: J Mach Learn Res
– ident: ref18
  doi: 10.1109/TNNLS.2012.2236354
– ident: ref32
  doi: 10.1145/1143844.1143955
– volume: 43
  start-page: 779
  year: 2013
  ident: ref41
  article-title: Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems
  publication-title: IEEE Trans Cybern
  doi: 10.1109/TSMCB.2012.2216523
– ident: ref39
  doi: 10.1007/978-3-642-15880-3_44
– volume: 38
  start-page: 988
  year: 2008
  ident: ref22
  article-title: Adaptive critic learning techniques for engine torque and air-fuel ratio control
  publication-title: IEEE Trans Syst Man Cybern B Cybern
  doi: 10.1109/TSMCB.2008.922019
– ident: ref19
  doi: 10.1109/TNNLS.2013.2270561
– year: 2013
  ident: ref10
  publication-title: Adaptive Dynamic Programming for Control Algorithms and Stability
  doi: 10.1007/978-1-4471-4757-2
– ident: ref24
  doi: 10.1007/s00500-013-1110-y
– ident: ref26
  doi: 10.1109/TITS.2011.2122257
– year: 2010
  ident: ref2
  publication-title: Reinforcement Learning and Dynamic Programming Using Function Approximators
  doi: 10.1201/9781439821091
– volume: 25
  start-page: 418
  year: 2014
  ident: ref20
  article-title: Decentralized stabilization for a class of continuous-time nonlinear interconnected systems using online learning optimal control approach
  publication-title: IEEE Trans Neural Netw Learn Syst
  doi: 10.1109/TNNLS.2013.2280013
– ident: ref16
  doi: 10.1109/TCYB.2014.2357896
– ident: ref37
  doi: 10.1007/978-3-540-73580-9_21
– ident: ref21
  doi: 10.1049/iet-cta.2013.0472
– ident: ref15
  doi: 10.1109/TASE.2014.2300532
SSID ssj0000605649
Score 2.4098368
Snippet In this paper, the first probably approximately correct (PAC) algorithm for continuous deterministic systems without relying on any system dynamics is...
SourceID proquest
pubmed
crossref
ieee
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 346
SubjectTerms Algorithm design and analysis
Algorithms
Approximation algorithms
Computer simulation
Dynamical systems
Dynamics
Efficient exploration
Exploration
Heuristic algorithms
Learning
Learning systems
Mathematical analysis
Neural networks
Partitioning algorithms
Policies
Polynomials
probably approximately correct (PAC)
reinforcement learning (RL)
state aggregation
Upper bound
Title MEC-A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems
URI https://ieeexplore.ieee.org/document/6971146
https://www.ncbi.nlm.nih.gov/pubmed/25474812
https://www.proquest.com/docview/1647300042
https://www.proquest.com/docview/1652395418
https://www.proquest.com/docview/1669904910
Volume 26
WOSCitedRecordID wos000348856200012&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 2162-2388
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000605649
  issn: 2162-237X
  databaseCode: RIE
  dateStart: 20120101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Jb9UwEB61FQcuFFqWQKmMxA3cJvF-fCqteigBQZHeLYq38qQ2D72F34_tOJGQoFJvkTxWHM84M57lG4D3khLOfGWwUp3GNFj4uNPUYEOp59YKSczQbEI0jZzP1dcd-DjVwjjnUvKZO4mPKZZvl2YbXWWnXIlYRLsLu0LwoVZr8qeUwS7nydqtK17jmoj5WCNTqtPrprn6HhO56EkYiXHNiALMqKCyqv9SSanHyv_NzaR2LvYftuCn8CSbl2g2yMMz2HH9AeyPrRtQPsmH0H4-P8Mz1AQ5x1_CX-MuTBpQR9E3l8BUTfIbooy_eoNmtzfL1WLz8w6FQRRBrRb9drldo085oSYhPqOMgP4cflycX59d4txrARtW0w1mVikTo7udsVoFNa69553VrOtKaZSVvhTeaSmVV8Ka8FWBt9pbQirvWenIC9jrl717BcgLo61govKVptyXSpeGlpZ4Qq1Q0hdQjdvdmgxEHvth3LbpQlKqNnGrjdxqM7cK-DDN-TXAcNxLfRh5MVFmNhRwNHK1zSd13UY8NRIt27qAd9NwOGMxcNL1LmxkoAnXdcVoJe-j4UGx02B9FfBykJjp_aOgvf73ut7A47B6NuSCH8HeZrV1b-GR-b1ZrFfHQdjn8jgJ-x9rgPlz
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB6VggQXChRooICRuIFbJ7Fj-7gqrYrYBgSLtLcofpWV2my1D34_tuNEQoJK3CJ5LD9mnBl7Zr4BeCdoWTGXayxlqzD1Fj5uFdVYU-oqY7godV9sgte1mM_l1x34MObCWGtj8Jk9Cp_Rl2-Wehueyo4ryUMS7R24yygtSJ-tNb6oEG-ZV9HeLfKqwEXJ50OWDJHHs7qefg-hXPTItwTPZsABZpRTkRd_KKVYZeXfBmdUPGd7_zflR_AwGZho0kvEY9ix3RPYG4o3oHSW96G5OD3BE1R7Scdf_H_j2nfqcUfRNxvhVHV8OUQJgfUSTa4ul6vF5uc18o0owFotuu1yu0YfU0hNxHxGCQP9Kfw4O52dnONUbQFrVtANZkZKHfy7rTZKekWunKtao1jbEqGlEY5wZ5UQ0klutF-V565ypixz5xix5TPY7ZadPQDkuFaGM567XNHKEamIpsSUrqSGS-EyyIftbnSCIg8VMa6aeCUhsoncagK3msStDN6PfW56II5bqfcDL0bKxIYMDgeuNumsrpuAqFYG27bI4O3Y7E9ZcJ20nfUb6Wn8hV0ymovbaCqv2qm3vzJ43kvMOP4gaC_-Pq83cP98djFtpp_qzy_hgV8J6yPDD2F3s9raV3BP_9os1qvXUeR_A6XP-9I
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=MEC%E2%80%94A+Near-Optimal+Online+Reinforcement+Learning+Algorithm+for+Continuous+Deterministic+Systems&rft.jtitle=IEEE+transaction+on+neural+networks+and+learning+systems&rft.au=Dongbin+Zhao&rft.au=Yuanheng+Zhu&rft.date=2015-02-01&rft.issn=2162-237X&rft.eissn=2162-2388&rft.volume=26&rft.issue=2&rft.spage=346&rft.epage=356&rft_id=info:doi/10.1109%2FTNNLS.2014.2371046&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TNNLS_2014_2371046
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2162-237X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2162-237X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2162-237X&client=summon