An Improved N-Step Value Gradient Learning Adaptive Dynamic Programming Algorithm for Online Learning

In problems with complex dynamics and challenging state spaces, the dual heuristic programming (DHP) algorithm has been shown theoretically and experimentally to perform well. This was recently extended by an approach called value gradient learning (VGL). VGL was inspired by a version of temporal di...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transaction on neural networks and learning systems Vol. 31; no. 4; pp. 1155 - 1169
Main Authors: Al-Dabooni, Seaar, Wunsch, Donald C.
Format: Journal Article
Language:English
Published: United States IEEE 01.04.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:2162-237X, 2162-2388, 2162-2388
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract In problems with complex dynamics and challenging state spaces, the dual heuristic programming (DHP) algorithm has been shown theoretically and experimentally to perform well. This was recently extended by an approach called value gradient learning (VGL). VGL was inspired by a version of temporal difference (TD) learning that uses eligibility traces. The eligibility traces create an exponential decay of older observations with a decay parameter (λ). This approach is known as TD(λ), and its DHP extension is known as VGL(λ), where VGL(0) is identical to DHP. VGL has presented convergence and other desirable properties, but it is primarily useful for batch learning. Online learning requires an eligibility-trace-work-space matrix, which is not required for the batch learning version of VGL. Since online learning is desirable for many applications, it is important to remove this computational and memory impediment. This paper introduces a dual-critic version of VGL, called N-step VGL (NSVGL), that does not need the eligibility-trace-workspace matrix, thereby allowing online learning. Furthermore, this combination of critic networks allows an NSVGL algorithm to learn faster. The first critic is similar to DHP, which is adapted based on TD(0) learning, while the second critic is adapted based on a gradient of n-step TD(λ) learning. Both networks are combined to train an actor network. The combination of feedback signals from both critic networks provides an optimal decision faster than traditional adaptive dynamic programming (ADP) via mixing current information and event history. Convergence proofs are provided. Gradients of one-and n-step value functions are monotonically nondecreasing and converge to the optimum. Two simulation case studies are presented for NSVGL to show their superior performance.
AbstractList In problems with complex dynamics and challenging state spaces, the dual heuristic programming (DHP) algorithm has been shown theoretically and experimentally to perform well. This was recently extended by an approach called value gradient learning (VGL). VGL was inspired by a version of temporal difference (TD) learning that uses eligibility traces. The eligibility traces create an exponential decay of older observations with a decay parameter ( λ ). This approach is known as TD( λ ), and its DHP extension is known as VGL( λ ), where VGL(0) is identical to DHP. VGL has presented convergence and other desirable properties, but it is primarily useful for batch learning. Online learning requires an eligibility-trace-work-space matrix, which is not required for the batch learning version of VGL. Since online learning is desirable for many applications, it is important to remove this computational and memory impediment. This paper introduces a dual-critic version of VGL, called N -step VGL (NSVGL), that does not need the eligibility-trace-work-space matrix, thereby allowing online learning. Furthermore, this combination of critic networks allows an NSVGL algorithm to learn faster. The first critic is similar to DHP, which is adapted based on TD(0) learning, while the second critic is adapted based on a gradient of n -step TD( λ ) learning. Both networks are combined to train an actor network. The combination of feedback signals from both critic networks provides an optimal decision faster than traditional adaptive dynamic programming (ADP) via mixing current information and event history. Convergence proofs are provided. Gradients of one- and n -step value functions are monotonically nondecreasing and converge to the optimum. Two simulation case studies are presented for NSVGL to show their superior performance.
In problems with complex dynamics and challenging state spaces, the dual heuristic programming (DHP) algorithm has been shown theoretically and experimentally to perform well. This was recently extended by an approach called value gradient learning (VGL). VGL was inspired by a version of temporal difference (TD) learning that uses eligibility traces. The eligibility traces create an exponential decay of older observations with a decay parameter ([Formula Omitted]). This approach is known as TD([Formula Omitted]), and its DHP extension is known as VGL([Formula Omitted]), where VGL(0) is identical to DHP. VGL has presented convergence and other desirable properties, but it is primarily useful for batch learning. Online learning requires an eligibility-trace-work-space matrix, which is not required for the batch learning version of VGL. Since online learning is desirable for many applications, it is important to remove this computational and memory impediment. This paper introduces a dual-critic version of VGL, called [Formula Omitted]-step VGL (NSVGL), that does not need the eligibility-trace-work-space matrix, thereby allowing online learning. Furthermore, this combination of critic networks allows an NSVGL algorithm to learn faster. The first critic is similar to DHP, which is adapted based on TD(0) learning, while the second critic is adapted based on a gradient of [Formula Omitted]-step TD([Formula Omitted]) learning. Both networks are combined to train an actor network. The combination of feedback signals from both critic networks provides an optimal decision faster than traditional adaptive dynamic programming (ADP) via mixing current information and event history. Convergence proofs are provided. Gradients of one- and [Formula Omitted]-step value functions are monotonically nondecreasing and converge to the optimum. Two simulation case studies are presented for NSVGL to show their superior performance.
In problems with complex dynamics and challenging state spaces, the dual heuristic programming (DHP) algorithm has been shown theoretically and experimentally to perform well. This was recently extended by an approach called value gradient learning (VGL). VGL was inspired by a version of temporal difference (TD) learning that uses eligibility traces. The eligibility traces create an exponential decay of older observations with a decay parameter (λ). This approach is known as TD(λ), and its DHP extension is known as VGL(λ), where VGL(0) is identical to DHP. VGL has presented convergence and other desirable properties, but it is primarily useful for batch learning. Online learning requires an eligibility-trace-work-space matrix, which is not required for the batch learning version of VGL. Since online learning is desirable for many applications, it is important to remove this computational and memory impediment. This paper introduces a dual-critic version of VGL, called N-step VGL (NSVGL), that does not need the eligibility-trace-workspace matrix, thereby allowing online learning. Furthermore, this combination of critic networks allows an NSVGL algorithm to learn faster. The first critic is similar to DHP, which is adapted based on TD(0) learning, while the second critic is adapted based on a gradient of n-step TD(λ) learning. Both networks are combined to train an actor network. The combination of feedback signals from both critic networks provides an optimal decision faster than traditional adaptive dynamic programming (ADP) via mixing current information and event history. Convergence proofs are provided. Gradients of one-and n-step value functions are monotonically nondecreasing and converge to the optimum. Two simulation case studies are presented for NSVGL to show their superior performance.
In problems with complex dynamics and challenging state spaces, the dual heuristic programming (DHP) algorithm has been shown theoretically and experimentally to perform well. This was recently extended by an approach called value gradient learning (VGL). VGL was inspired by a version of temporal difference (TD) learning that uses eligibility traces. The eligibility traces create an exponential decay of older observations with a decay parameter ( λ ). This approach is known as TD( λ ), and its DHP extension is known as VGL( λ ), where VGL(0) is identical to DHP. VGL has presented convergence and other desirable properties, but it is primarily useful for batch learning. Online learning requires an eligibility-trace-work-space matrix, which is not required for the batch learning version of VGL. Since online learning is desirable for many applications, it is important to remove this computational and memory impediment. This paper introduces a dual-critic version of VGL, called N -step VGL (NSVGL), that does not need the eligibility-trace-work-space matrix, thereby allowing online learning. Furthermore, this combination of critic networks allows an NSVGL algorithm to learn faster. The first critic is similar to DHP, which is adapted based on TD(0) learning, while the second critic is adapted based on a gradient of n -step TD( λ ) learning. Both networks are combined to train an actor network. The combination of feedback signals from both critic networks provides an optimal decision faster than traditional adaptive dynamic programming (ADP) via mixing current information and event history. Convergence proofs are provided. Gradients of one- and n -step value functions are monotonically nondecreasing and converge to the optimum. Two simulation case studies are presented for NSVGL to show their superior performance.In problems with complex dynamics and challenging state spaces, the dual heuristic programming (DHP) algorithm has been shown theoretically and experimentally to perform well. This was recently extended by an approach called value gradient learning (VGL). VGL was inspired by a version of temporal difference (TD) learning that uses eligibility traces. The eligibility traces create an exponential decay of older observations with a decay parameter ( λ ). This approach is known as TD( λ ), and its DHP extension is known as VGL( λ ), where VGL(0) is identical to DHP. VGL has presented convergence and other desirable properties, but it is primarily useful for batch learning. Online learning requires an eligibility-trace-work-space matrix, which is not required for the batch learning version of VGL. Since online learning is desirable for many applications, it is important to remove this computational and memory impediment. This paper introduces a dual-critic version of VGL, called N -step VGL (NSVGL), that does not need the eligibility-trace-work-space matrix, thereby allowing online learning. Furthermore, this combination of critic networks allows an NSVGL algorithm to learn faster. The first critic is similar to DHP, which is adapted based on TD(0) learning, while the second critic is adapted based on a gradient of n -step TD( λ ) learning. Both networks are combined to train an actor network. The combination of feedback signals from both critic networks provides an optimal decision faster than traditional adaptive dynamic programming (ADP) via mixing current information and event history. Convergence proofs are provided. Gradients of one- and n -step value functions are monotonically nondecreasing and converge to the optimum. Two simulation case studies are presented for NSVGL to show their superior performance.
Author Al-Dabooni, Seaar
Wunsch, Donald C.
Author_xml – sequence: 1
  givenname: Seaar
  orcidid: 0000-0001-5200-7587
  surname: Al-Dabooni
  fullname: Al-Dabooni, Seaar
  email: cr7@ieee.org
  organization: Applied Computational Intelligence Laboratory (ACIL), Missouri University of Science and Technology, Rolla, MO, USA
– sequence: 2
  givenname: Donald C.
  orcidid: 0000-0002-9726-9051
  surname: Wunsch
  fullname: Wunsch, Donald C.
  email: wunsch@ieee.org
  organization: Department of Electrical and Computer Engineering, Applied Computational Intelligence Laboratory (ACIL), Missouri University of Science and Technology, Rolla, MO, USA
BackLink https://www.ncbi.nlm.nih.gov/pubmed/31247567$$D View this record in MEDLINE/PubMed
BookMark eNp9kV1rFDEUhoNUbK39AwoS8MabWfMx-bpcqtbCshVaxbuQmTmzpswk20ym0H9v2t3uRS8MhATyPIeT875FRyEGQOg9JQtKiflys16vrheMULNghhrO9St0wqhkFeNaHx3u6s8xOpumW1KWJELW5g065pTVSkh1gmAZ8OW4TfEeOryurjNs8W83zIAvkus8hIxX4FLwYYOXndtmfw_460Nwo2_xzxQ3yY3j0-OwicnnvyPuY8JXYfABDuo79Lp3wwRn-_MU_fr-7eb8R7W6urg8X66qlguaKzC80WUDCM5r07e0b5hrWO-aTrjG1AzqVtSUON0DcGFkI0knOuKgY1Izfoo-7-qWD93NMGU7-qmFYXAB4jxZxgSRVDFVF_TTC_Q2zimU7mwZoCKKyFoX6uOempsROrtNfnTpwT4PsAB6B7QpTlOC3rY-u-xjyMn5wVJiH-OyT3HZx7jsPq6ishfqc_X_Sh92kgeAg6BVzZQh_B-qfaAa
CODEN ITNNAL
CitedBy_id crossref_primary_10_1109_TNNLS_2021_3116189
crossref_primary_10_1109_TNNLS_2024_3453385
crossref_primary_10_1109_TCYB_2023_3241344
crossref_primary_10_1109_TASE_2025_3585484
crossref_primary_10_1007_s11071_024_09524_9
crossref_primary_10_1002_rnc_6569
crossref_primary_10_1007_s10462_023_10497_1
crossref_primary_10_1109_TFUZZ_2023_3256441
crossref_primary_10_1109_ACCESS_2020_3043775
crossref_primary_10_1002_acs_3761
crossref_primary_10_1007_s10489_024_05933_w
crossref_primary_10_1109_TNNLS_2023_3245102
crossref_primary_10_1109_TNNLS_2023_3245630
crossref_primary_10_1109_TCYB_2021_3107801
crossref_primary_10_1016_j_neucom_2021_10_065
crossref_primary_10_3390_robotics11050116
crossref_primary_10_1109_TCYB_2022_3198078
crossref_primary_10_1109_TNNLS_2022_3152268
crossref_primary_10_1002_rnc_7710
crossref_primary_10_1016_j_neucom_2024_129311
crossref_primary_10_1109_TCYB_2025_3562172
crossref_primary_10_1109_TNNLS_2021_3117790
crossref_primary_10_1007_s11071_024_10493_2
Cites_doi 10.1109/MCAS.2009.933854
10.1016/j.neunet.2012.02.005
10.1002/9781118025604
10.1109/TNNLS.2015.2424971
10.1109/TNNLS.2017.2654324
10.1109/TNNLS.2015.2490698
10.1109/TNNLS.2013.2247627
10.1109/TNN.2011.2147797
10.1109/TNNLS.2018.2875870
10.1007/BF00114726
10.1109/TNNLS.2016.2585520
10.3182/20060517-3-FR-2903.00330
10.1109/MCI.2009.932261
10.1109/TASE.2013.2284545
10.1109/72.701173
10.1109/TNNLS.2013.2292704
10.1162/089976600300015961
10.1109/TNN.1998.712192
10.1109/72.623201
10.1109/72.914523
10.1109/5.58337
10.1109/TNNLS.2013.2271454
10.1109/TNN.2009.2027233
10.1109/TIE.2014.2301770
10.1109/IJCNN.2017.7966204
10.1007/BF00115009
10.1109/TNNLS.2013.2283574
10.1109/TFUZZ.2015.2505327
10.1002/9781118122631
10.1016/j.automatica.2015.06.001
10.1002/9781118453988.ch3
10.1109/TSMCB.2008.926614
10.1109/IJCNN.2016.7727679
10.1109/TSMCB.2009.2025508
10.1109/TCYB.2014.2357896
10.1086/209106
10.1109/IJCNN.2012.6252791
10.1049/iet-cta:20050341
10.1109/TNNLS.2014.2329942
10.1007/s11768-011-1005-3
10.1016/0893-6080(90)90005-6
10.1109/TSMCB.2008.924141
10.1109/TNN.2008.2000396
10.1109/TNNLS.2013.2281663
10.1109/TSMCB.2012.2216523
10.1109/TNNLS.2013.2271778
10.1109/TNNLS.2019.2919614
10.1109/TAC.2016.2616644
10.1016/S0377-2217(98)00051-4
10.1016/j.neunet.2006.08.010
10.23919/ACC.1989.4790360
10.1109/TIA.2003.809438
10.1109/9780470544785
10.1016/j.neucom.2015.04.014
10.1016/j.neucom.2011.05.031
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
DBID 97E
RIA
RIE
AAYXX
CITATION
NPM
7QF
7QO
7QP
7QQ
7QR
7SC
7SE
7SP
7SR
7TA
7TB
7TK
7U5
8BQ
8FD
F28
FR3
H8D
JG9
JQ2
KR7
L7M
L~C
L~D
P64
7X8
DOI 10.1109/TNNLS.2019.2919338
DatabaseName IEEE Xplore (IEEE)
IEEE All-Society Periodicals Package (ASPP) 1998-Present
IEEE Electronic Library (IEL)
CrossRef
PubMed
Aluminium Industry Abstracts
Biotechnology Research Abstracts
Calcium & Calcified Tissue Abstracts
Ceramic Abstracts
Chemoreception Abstracts
Computer and Information Systems Abstracts
Corrosion Abstracts
Electronics & Communications Abstracts
Engineered Materials Abstracts
Materials Business File
Mechanical & Transportation Engineering Abstracts
Neurosciences Abstracts
Solid State and Superconductivity Abstracts
METADEX
Technology Research Database
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
Aerospace Database
Materials Research Database
ProQuest Computer Science Collection
Civil Engineering Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Biotechnology and BioEngineering Abstracts
MEDLINE - Academic
DatabaseTitle CrossRef
PubMed
Materials Research Database
Technology Research Database
Computer and Information Systems Abstracts – Academic
Mechanical & Transportation Engineering Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Materials Business File
Aerospace Database
Engineered Materials Abstracts
Biotechnology Research Abstracts
Chemoreception Abstracts
Advanced Technologies Database with Aerospace
ANTE: Abstracts in New Technology & Engineering
Civil Engineering Abstracts
Aluminium Industry Abstracts
Electronics & Communications Abstracts
Ceramic Abstracts
Neurosciences Abstracts
METADEX
Biotechnology and BioEngineering Abstracts
Computer and Information Systems Abstracts Professional
Solid State and Superconductivity Abstracts
Engineering Research Database
Calcium & Calcified Tissue Abstracts
Corrosion Abstracts
MEDLINE - Academic
DatabaseTitleList PubMed
Materials Research Database

MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
– sequence: 3
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2162-2388
EndPage 1169
ExternalDocumentID 31247567
10_1109_TNNLS_2019_2919338
8742790
Genre orig-research
Journal Article
GrantInformation_xml – fundername: Cooperative Agreement (The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein)
  grantid: W911NF-18-2-0260
– fundername: National Science Foundation
  funderid: 10.13039/501100008982
– fundername: Basra Oil Company (BOC), Iraq
– fundername: Missouri University of Science and Technology Intelligent Systems Center
  funderid: 10.13039/100011535
– fundername: Lifelong Learning Machines Program from the DARPA/Microsystems Technology Office
  funderid: 10.13039/100000185
– fundername: Higher Committee for Educational Development (HCED)
– fundername: Mary K. Finley Missouri Endowment
– fundername: Army Research Laboratory (ARL)
  funderid: 10.13039/100006754
GroupedDBID 0R~
4.4
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACIWK
ACPRK
AENEX
AFRAH
AGQYO
AGSQL
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
IFIPE
IPLJI
JAVBF
M43
MS~
O9-
OCL
PQQKQ
RIA
RIE
RNS
AAYXX
CITATION
NPM
RIG
7QF
7QO
7QP
7QQ
7QR
7SC
7SE
7SP
7SR
7TA
7TB
7TK
7U5
8BQ
8FD
F28
FR3
H8D
JG9
JQ2
KR7
L7M
L~C
L~D
P64
7X8
ID FETCH-LOGICAL-c351t-e93b893bee53349fc1fb2ab2fabd5ab942e4c5410a8fee3596b60d5d0aed26823
IEDL.DBID RIE
ISICitedReferencesCount 31
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000525351800008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2162-237X
2162-2388
IngestDate Thu Oct 02 10:27:01 EDT 2025
Sun Nov 09 06:20:29 EST 2025
Thu Jan 02 22:59:03 EST 2025
Sat Nov 29 01:40:03 EST 2025
Tue Nov 18 22:30:52 EST 2025
Wed Aug 27 02:42:21 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Issue 4
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c351t-e93b893bee53349fc1fb2ab2fabd5ab942e4c5410a8fee3596b60d5d0aed26823
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0002-9726-9051
0000-0001-5200-7587
PMID 31247567
PQID 2387070648
PQPubID 85436
PageCount 15
ParticipantIDs ieee_primary_8742790
proquest_journals_2387070648
pubmed_primary_31247567
proquest_miscellaneous_2250617274
crossref_citationtrail_10_1109_TNNLS_2019_2919338
crossref_primary_10_1109_TNNLS_2019_2919338
PublicationCentury 2000
PublicationDate 2020-04-01
PublicationDateYYYYMMDD 2020-04-01
PublicationDate_xml – month: 04
  year: 2020
  text: 2020-04-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: Piscataway
PublicationTitle IEEE transaction on neural networks and learning systems
PublicationTitleAbbrev TNNLS
PublicationTitleAlternate IEEE Trans Neural Netw Learn Syst
PublicationYear 2020
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref57
seijen (ref50) 2016; 145
ref56
ref12
ref59
ref15
ref58
ref14
ref53
ref52
ref55
ref10
ref17
ref16
sutton (ref41) 2016; 73
ref19
ref18
ref51
lewis (ref8) 2013
ref46
ref45
ref48
ref47
ref43
fu (ref5) 2011; 22
ref49
ref7
lewis (ref44) 2013
ref9
ref4
ref3
ref6
ref34
ref37
ref36
ref31
ref30
ref33
ref32
ref2
zhang (ref23) 2003
ref1
ref39
ref38
fairbank (ref60) 2011
liu (ref35) 2013; 43
bellman (ref11) 1957
ref24
seijen (ref42) 2014
werbos (ref13) 1992
ref26
xu (ref25) 2014; 25
ref64
ref20
van seijen (ref40) 2016; 145
ref63
ref22
ref65
ref21
ref28
ref27
ref29
ref62
ref61
ni (ref54) 2013; 24
References_xml – ident: ref2
  doi: 10.1109/MCAS.2009.933854
– ident: ref32
  doi: 10.1016/j.neunet.2012.02.005
– ident: ref29
  doi: 10.1002/9781118025604
– ident: ref18
  doi: 10.1109/TNNLS.2015.2424971
– ident: ref36
  doi: 10.1109/TNNLS.2017.2654324
– ident: ref10
  doi: 10.1109/TNNLS.2015.2490698
– ident: ref6
  doi: 10.1109/TNNLS.2013.2247627
– volume: 22
  start-page: 1133
  year: 2011
  ident: ref5
  article-title: Adaptive learning and control for MIMO system based on adaptive dynamic programming
  publication-title: IEEE Trans Neural Netw
  doi: 10.1109/TNN.2011.2147797
– ident: ref53
  doi: 10.1109/TNNLS.2018.2875870
– ident: ref49
  doi: 10.1007/BF00114726
– ident: ref4
  doi: 10.1109/TNNLS.2016.2585520
– ident: ref57
  doi: 10.3182/20060517-3-FR-2903.00330
– ident: ref14
  doi: 10.1109/MCI.2009.932261
– ident: ref9
  doi: 10.1109/TASE.2013.2284545
– ident: ref63
  doi: 10.1109/72.701173
– volume: 25
  start-page: 635
  year: 2014
  ident: ref25
  article-title: Reinforcement learning output feedback NN control using deterministic learning technique
  publication-title: IEEE Trans Neural Netw Learn Syst
  doi: 10.1109/TNNLS.2013.2292704
– ident: ref51
  doi: 10.1162/089976600300015961
– ident: ref48
  doi: 10.1109/TNN.1998.712192
– volume: 145
  start-page: 1
  year: 2016
  ident: ref40
  article-title: True Online Temporal-Difference Learning
  publication-title: J Mach Learn Res
– ident: ref12
  doi: 10.1109/72.623201
– ident: ref17
  doi: 10.1109/72.914523
– ident: ref55
  doi: 10.1109/5.58337
– volume: 24
  start-page: 2038
  year: 2013
  ident: ref54
  article-title: Goal representation heuristic dynamic programming on maze navigation
  publication-title: IEEE Trans Neural Netw Learn Syst
  doi: 10.1109/TNNLS.2013.2271454
– ident: ref16
  doi: 10.1109/TNN.2009.2027233
– year: 2013
  ident: ref8
  publication-title: Reinforcement Learning and Approximate Dynamic Programming for Feedback Control
– ident: ref26
  doi: 10.1109/TIE.2014.2301770
– volume: 145
  start-page: 1
  year: 2016
  ident: ref50
  article-title: True online temporal-difference learning
  publication-title: J Mach Learn Res
– ident: ref45
  doi: 10.1109/IJCNN.2017.7966204
– ident: ref39
  doi: 10.1007/BF00115009
– ident: ref61
  doi: 10.1109/TNNLS.2013.2283574
– ident: ref33
  doi: 10.1109/TFUZZ.2015.2505327
– volume: 73
  start-page: 1
  year: 2016
  ident: ref41
  article-title: An emphatic approach to the problem of off-policy temporal-difference learning
  publication-title: J Mach Learn Res
– ident: ref46
  doi: 10.1002/9781118122631
– year: 1957
  ident: ref11
  publication-title: Dynamic Programming
– ident: ref31
  doi: 10.1016/j.automatica.2015.06.001
– ident: ref38
  doi: 10.1002/9781118453988.ch3
– ident: ref34
  doi: 10.1109/TSMCB.2008.926614
– ident: ref24
  doi: 10.1109/IJCNN.2016.7727679
– year: 1992
  ident: ref13
  article-title: Approximate dynamic programming for real-time control and neural modeling
  publication-title: Handbook of Intelligent Control Neural Fuzzy and Adaptive Approaches
– ident: ref65
  doi: 10.1109/TSMCB.2009.2025508
– ident: ref28
  doi: 10.1109/TCYB.2014.2357896
– ident: ref58
  doi: 10.1086/209106
– ident: ref43
  doi: 10.1109/IJCNN.2012.6252791
– ident: ref64
  doi: 10.1049/iet-cta:20050341
– ident: ref21
  doi: 10.1109/TNNLS.2014.2329942
– ident: ref1
  doi: 10.1007/s11768-011-1005-3
– ident: ref59
  doi: 10.1016/0893-6080(90)90005-6
– ident: ref15
  doi: 10.1109/TSMCB.2008.924141
– ident: ref62
  doi: 10.1109/TNN.2008.2000396
– ident: ref30
  doi: 10.1109/TNNLS.2013.2281663
– start-page: 248
  year: 2003
  ident: ref23
  article-title: A comparison of dual heuristic programming (DHP) and neural network based stochastic optimization approach on collective robotic search problem
  publication-title: Proc Int Joint Conf Neural Netw (IJCNN)
– volume: 43
  start-page: 779
  year: 2013
  ident: ref35
  article-title: Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems
  publication-title: IEEE Trans Cybern
  doi: 10.1109/TSMCB.2012.2216523
– ident: ref47
  doi: 10.1109/TNNLS.2013.2271778
– ident: ref52
  doi: 10.1109/TNNLS.2019.2919614
– ident: ref27
  doi: 10.1109/TAC.2016.2616644
– year: 2013
  ident: ref44
  publication-title: Reinforcement Learning and Approximate Dynamic Programming for Feedback Control
– ident: ref56
  doi: 10.1016/S0377-2217(98)00051-4
– ident: ref3
  doi: 10.1016/j.neunet.2006.08.010
– start-page: 692
  year: 2014
  ident: ref42
  article-title: True online TD( $\lambda$ )
  publication-title: Proc 31st Int Conf Mach Learn
– year: 2011
  ident: ref60
  article-title: The local optimality of reinforcement learning by value gradients, and its relationship to policy gradient learning
  publication-title: arXiv 1101 0428
– ident: ref37
  doi: 10.23919/ACC.1989.4790360
– ident: ref22
  doi: 10.1109/TIA.2003.809438
– ident: ref7
  doi: 10.1109/9780470544785
– ident: ref20
  doi: 10.1016/j.neucom.2015.04.014
– ident: ref19
  doi: 10.1016/j.neucom.2011.05.031
SSID ssj0000605649
Score 2.482273
Snippet In problems with complex dynamics and challenging state spaces, the dual heuristic programming (DHP) algorithm has been shown theoretically and experimentally...
SourceID proquest
pubmed
crossref
ieee
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 1155
SubjectTerms Adaptive algorithms
Adaptive dynamic programming (ADP)
Adaptive learning
Algorithms
Computer applications
Computer simulation
Convergence
convergence analysis
Decay
Distance learning
Dynamic programming
eligibility traces
Heuristic algorithms
Internet
Learning systems
Machine learning
Networks
online learning
Optimal control
Optimization
Programming
reinforcement learning
Stability analysis
temporal difference (TD)
value gradient learning (VGL)
Title An Improved N-Step Value Gradient Learning Adaptive Dynamic Programming Algorithm for Online Learning
URI https://ieeexplore.ieee.org/document/8742790
https://www.ncbi.nlm.nih.gov/pubmed/31247567
https://www.proquest.com/docview/2387070648
https://www.proquest.com/docview/2250617274
Volume 31
WOSCitedRecordID wos000525351800008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 2162-2388
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000605649
  issn: 2162-237X
  databaseCode: RIE
  dateStart: 20120101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwELZKxYELBcpjS6mMxA3SOnZix8cV9HFAq0oUtLfIj3FbaZtdbbP8_o7tbMQBkHqLZDuJ8s1MvrHnQcinytVSC6-LirNQoP-lChPQ5wGhwEhngw-52YSazZr5XF_ukC9jLgwApOAzOI6X6SzfL90mbpWdNOjHKY0O-hOlZM7VGvdTGPJymdguLyUvuFDzbY4M0ydXs9n3HzGQSx9zjZwlpqP88R9KjVX-zTHTv-Zs73Fv-YI8HzglnWYheEl2oHtF9rb9GuigvvsEph3NmwjgKVrSHlb0l1lsgJ6vU-RXT4dyq9d06s0qWkL6Lbesp5c5kOsuDS6ul-vb_uaOIuWluVrpuPQ1-Xl2evX1ohjaLBRO1GVfgBYWWYsFiGm5OrgyWG4sD8b62lhdcUBEq5KZJgCIWksrma89M-C5bLh4Q3a7ZQfvCNXS8CZoVDzPKmBKe99YIdHiaua8cRNSbj9664Ya5LEVxqJNvgjTbQKqjUC1A1AT8nlcs8oVOP47ez8iMs4cwJiQwy227aCk9y2yFYUWT1a46uM4jOoVz0xMB8sNzkGKmEheNSFvs0yM9xbIjVQt1cHfn_mePOPROU9hPodkt19v4AN56n73t_frI5TheXOUZPgBd0ntEA
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwELaqggQXChTKQgEjcYO0jp3Y8XEFlCKWqBIL2lvk2ONSaZtdbbP8fsZ2NuIASNwi-ZEon2f8jT0PQl4XtpRaOJ0VnPkM7S-VGY82DwgFRtrWO5-KTai6rhYLfbFH3o6xMAAQnc_gJDzGu3y3sttwVHZaoR2nNBrot8oC507RWuOJCkNmLiPf5bnkGRdqsYuSYfp0Xtezr8GVS59wjawlBKT8thPF0ip_Z5lxtzk7-L_vvE_uDaySTtMyeED2oHtIDnYVG-ggwIcEph1NxwjgKOrSHtb0u1lugX7cRN-vng4JVy_p1Jl10IX0fSpaTy-SK9d1bFxerjZX_Y9riqSXpnyl49BH5NvZh_m782wotJBZUeZ9Blq0yFtagBCYq73NfctNy71pXWlaXXBATIucmcoDiFLLVjJXOmbAcVlx8Zjsd6sOnhCqpeGV1yh6jhXAlHauaoVEnauZdcZOSL776Y0dspCHYhjLJlojTDcRqCYA1QxATcibccw65eD4Z-_DgMjYcwBjQo532DaDmN40yFcU6jxZ4KhXYzMKWLg1MR2sttgHSWKkecWEHKU1Mc4tkB2pUqqnf37nS3LnfP5l1sw-1Z-fkbs8mOrR6eeY7PebLTwnt-3P_upm8yKu5F9Y4-9v
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Improved+N+-Step+Value+Gradient+Learning+Adaptive+Dynamic+Programming+Algorithm+for+Online+Learning&rft.jtitle=IEEE+transaction+on+neural+networks+and+learning+systems&rft.au=Al-Dabooni%2C+Seaar&rft.au=Wunsch%2C+Donald+C.&rft.date=2020-04-01&rft.issn=2162-237X&rft.eissn=2162-2388&rft.volume=31&rft.issue=4&rft.spage=1155&rft.epage=1169&rft_id=info:doi/10.1109%2FTNNLS.2019.2919338&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TNNLS_2019_2919338
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2162-237X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2162-237X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2162-237X&client=summon