An Improved N-Step Value Gradient Learning Adaptive Dynamic Programming Algorithm for Online Learning
In problems with complex dynamics and challenging state spaces, the dual heuristic programming (DHP) algorithm has been shown theoretically and experimentally to perform well. This was recently extended by an approach called value gradient learning (VGL). VGL was inspired by a version of temporal di...
Saved in:
| Published in: | IEEE transaction on neural networks and learning systems Vol. 31; no. 4; pp. 1155 - 1169 |
|---|---|
| Main Authors: | , |
| Format: | Journal Article |
| Language: | English |
| Published: |
United States
IEEE
01.04.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 2162-237X, 2162-2388, 2162-2388 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | In problems with complex dynamics and challenging state spaces, the dual heuristic programming (DHP) algorithm has been shown theoretically and experimentally to perform well. This was recently extended by an approach called value gradient learning (VGL). VGL was inspired by a version of temporal difference (TD) learning that uses eligibility traces. The eligibility traces create an exponential decay of older observations with a decay parameter (λ). This approach is known as TD(λ), and its DHP extension is known as VGL(λ), where VGL(0) is identical to DHP. VGL has presented convergence and other desirable properties, but it is primarily useful for batch learning. Online learning requires an eligibility-trace-work-space matrix, which is not required for the batch learning version of VGL. Since online learning is desirable for many applications, it is important to remove this computational and memory impediment. This paper introduces a dual-critic version of VGL, called N-step VGL (NSVGL), that does not need the eligibility-trace-workspace matrix, thereby allowing online learning. Furthermore, this combination of critic networks allows an NSVGL algorithm to learn faster. The first critic is similar to DHP, which is adapted based on TD(0) learning, while the second critic is adapted based on a gradient of n-step TD(λ) learning. Both networks are combined to train an actor network. The combination of feedback signals from both critic networks provides an optimal decision faster than traditional adaptive dynamic programming (ADP) via mixing current information and event history. Convergence proofs are provided. Gradients of one-and n-step value functions are monotonically nondecreasing and converge to the optimum. Two simulation case studies are presented for NSVGL to show their superior performance. |
|---|---|
| AbstractList | In problems with complex dynamics and challenging state spaces, the dual heuristic programming (DHP) algorithm has been shown theoretically and experimentally to perform well. This was recently extended by an approach called value gradient learning (VGL). VGL was inspired by a version of temporal difference (TD) learning that uses eligibility traces. The eligibility traces create an exponential decay of older observations with a decay parameter (λ). This approach is known as TD(λ), and its DHP extension is known as VGL(λ), where VGL(0) is identical to DHP. VGL has presented convergence and other desirable properties, but it is primarily useful for batch learning. Online learning requires an eligibility-trace-work-space matrix, which is not required for the batch learning version of VGL. Since online learning is desirable for many applications, it is important to remove this computational and memory impediment. This paper introduces a dual-critic version of VGL, called N-step VGL (NSVGL), that does not need the eligibility-trace-workspace matrix, thereby allowing online learning. Furthermore, this combination of critic networks allows an NSVGL algorithm to learn faster. The first critic is similar to DHP, which is adapted based on TD(0) learning, while the second critic is adapted based on a gradient of n-step TD(λ) learning. Both networks are combined to train an actor network. The combination of feedback signals from both critic networks provides an optimal decision faster than traditional adaptive dynamic programming (ADP) via mixing current information and event history. Convergence proofs are provided. Gradients of one-and n-step value functions are monotonically nondecreasing and converge to the optimum. Two simulation case studies are presented for NSVGL to show their superior performance. In problems with complex dynamics and challenging state spaces, the dual heuristic programming (DHP) algorithm has been shown theoretically and experimentally to perform well. This was recently extended by an approach called value gradient learning (VGL). VGL was inspired by a version of temporal difference (TD) learning that uses eligibility traces. The eligibility traces create an exponential decay of older observations with a decay parameter ([Formula Omitted]). This approach is known as TD([Formula Omitted]), and its DHP extension is known as VGL([Formula Omitted]), where VGL(0) is identical to DHP. VGL has presented convergence and other desirable properties, but it is primarily useful for batch learning. Online learning requires an eligibility-trace-work-space matrix, which is not required for the batch learning version of VGL. Since online learning is desirable for many applications, it is important to remove this computational and memory impediment. This paper introduces a dual-critic version of VGL, called [Formula Omitted]-step VGL (NSVGL), that does not need the eligibility-trace-work-space matrix, thereby allowing online learning. Furthermore, this combination of critic networks allows an NSVGL algorithm to learn faster. The first critic is similar to DHP, which is adapted based on TD(0) learning, while the second critic is adapted based on a gradient of [Formula Omitted]-step TD([Formula Omitted]) learning. Both networks are combined to train an actor network. The combination of feedback signals from both critic networks provides an optimal decision faster than traditional adaptive dynamic programming (ADP) via mixing current information and event history. Convergence proofs are provided. Gradients of one- and [Formula Omitted]-step value functions are monotonically nondecreasing and converge to the optimum. Two simulation case studies are presented for NSVGL to show their superior performance. In problems with complex dynamics and challenging state spaces, the dual heuristic programming (DHP) algorithm has been shown theoretically and experimentally to perform well. This was recently extended by an approach called value gradient learning (VGL). VGL was inspired by a version of temporal difference (TD) learning that uses eligibility traces. The eligibility traces create an exponential decay of older observations with a decay parameter ( λ ). This approach is known as TD( λ ), and its DHP extension is known as VGL( λ ), where VGL(0) is identical to DHP. VGL has presented convergence and other desirable properties, but it is primarily useful for batch learning. Online learning requires an eligibility-trace-work-space matrix, which is not required for the batch learning version of VGL. Since online learning is desirable for many applications, it is important to remove this computational and memory impediment. This paper introduces a dual-critic version of VGL, called N -step VGL (NSVGL), that does not need the eligibility-trace-work-space matrix, thereby allowing online learning. Furthermore, this combination of critic networks allows an NSVGL algorithm to learn faster. The first critic is similar to DHP, which is adapted based on TD(0) learning, while the second critic is adapted based on a gradient of n -step TD( λ ) learning. Both networks are combined to train an actor network. The combination of feedback signals from both critic networks provides an optimal decision faster than traditional adaptive dynamic programming (ADP) via mixing current information and event history. Convergence proofs are provided. Gradients of one- and n -step value functions are monotonically nondecreasing and converge to the optimum. Two simulation case studies are presented for NSVGL to show their superior performance. In problems with complex dynamics and challenging state spaces, the dual heuristic programming (DHP) algorithm has been shown theoretically and experimentally to perform well. This was recently extended by an approach called value gradient learning (VGL). VGL was inspired by a version of temporal difference (TD) learning that uses eligibility traces. The eligibility traces create an exponential decay of older observations with a decay parameter ( λ ). This approach is known as TD( λ ), and its DHP extension is known as VGL( λ ), where VGL(0) is identical to DHP. VGL has presented convergence and other desirable properties, but it is primarily useful for batch learning. Online learning requires an eligibility-trace-work-space matrix, which is not required for the batch learning version of VGL. Since online learning is desirable for many applications, it is important to remove this computational and memory impediment. This paper introduces a dual-critic version of VGL, called N -step VGL (NSVGL), that does not need the eligibility-trace-work-space matrix, thereby allowing online learning. Furthermore, this combination of critic networks allows an NSVGL algorithm to learn faster. The first critic is similar to DHP, which is adapted based on TD(0) learning, while the second critic is adapted based on a gradient of n -step TD( λ ) learning. Both networks are combined to train an actor network. The combination of feedback signals from both critic networks provides an optimal decision faster than traditional adaptive dynamic programming (ADP) via mixing current information and event history. Convergence proofs are provided. Gradients of one- and n -step value functions are monotonically nondecreasing and converge to the optimum. Two simulation case studies are presented for NSVGL to show their superior performance.In problems with complex dynamics and challenging state spaces, the dual heuristic programming (DHP) algorithm has been shown theoretically and experimentally to perform well. This was recently extended by an approach called value gradient learning (VGL). VGL was inspired by a version of temporal difference (TD) learning that uses eligibility traces. The eligibility traces create an exponential decay of older observations with a decay parameter ( λ ). This approach is known as TD( λ ), and its DHP extension is known as VGL( λ ), where VGL(0) is identical to DHP. VGL has presented convergence and other desirable properties, but it is primarily useful for batch learning. Online learning requires an eligibility-trace-work-space matrix, which is not required for the batch learning version of VGL. Since online learning is desirable for many applications, it is important to remove this computational and memory impediment. This paper introduces a dual-critic version of VGL, called N -step VGL (NSVGL), that does not need the eligibility-trace-work-space matrix, thereby allowing online learning. Furthermore, this combination of critic networks allows an NSVGL algorithm to learn faster. The first critic is similar to DHP, which is adapted based on TD(0) learning, while the second critic is adapted based on a gradient of n -step TD( λ ) learning. Both networks are combined to train an actor network. The combination of feedback signals from both critic networks provides an optimal decision faster than traditional adaptive dynamic programming (ADP) via mixing current information and event history. Convergence proofs are provided. Gradients of one- and n -step value functions are monotonically nondecreasing and converge to the optimum. Two simulation case studies are presented for NSVGL to show their superior performance. |
| Author | Al-Dabooni, Seaar Wunsch, Donald C. |
| Author_xml | – sequence: 1 givenname: Seaar orcidid: 0000-0001-5200-7587 surname: Al-Dabooni fullname: Al-Dabooni, Seaar email: cr7@ieee.org organization: Applied Computational Intelligence Laboratory (ACIL), Missouri University of Science and Technology, Rolla, MO, USA – sequence: 2 givenname: Donald C. orcidid: 0000-0002-9726-9051 surname: Wunsch fullname: Wunsch, Donald C. email: wunsch@ieee.org organization: Department of Electrical and Computer Engineering, Applied Computational Intelligence Laboratory (ACIL), Missouri University of Science and Technology, Rolla, MO, USA |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/31247567$$D View this record in MEDLINE/PubMed |
| BookMark | eNp9kV1rFDEUhoNUbK39AwoS8MabWfMx-bpcqtbCshVaxbuQmTmzpswk20ym0H9v2t3uRS8MhATyPIeT875FRyEGQOg9JQtKiflys16vrheMULNghhrO9St0wqhkFeNaHx3u6s8xOpumW1KWJELW5g065pTVSkh1gmAZ8OW4TfEeOryurjNs8W83zIAvkus8hIxX4FLwYYOXndtmfw_460Nwo2_xzxQ3yY3j0-OwicnnvyPuY8JXYfABDuo79Lp3wwRn-_MU_fr-7eb8R7W6urg8X66qlguaKzC80WUDCM5r07e0b5hrWO-aTrjG1AzqVtSUON0DcGFkI0knOuKgY1Izfoo-7-qWD93NMGU7-qmFYXAB4jxZxgSRVDFVF_TTC_Q2zimU7mwZoCKKyFoX6uOempsROrtNfnTpwT4PsAB6B7QpTlOC3rY-u-xjyMn5wVJiH-OyT3HZx7jsPq6ishfqc_X_Sh92kgeAg6BVzZQh_B-qfaAa |
| CODEN | ITNNAL |
| CitedBy_id | crossref_primary_10_1109_TNNLS_2021_3116189 crossref_primary_10_1109_TNNLS_2024_3453385 crossref_primary_10_1109_TCYB_2023_3241344 crossref_primary_10_1109_TASE_2025_3585484 crossref_primary_10_1007_s11071_024_09524_9 crossref_primary_10_1002_rnc_6569 crossref_primary_10_1007_s10462_023_10497_1 crossref_primary_10_1109_TFUZZ_2023_3256441 crossref_primary_10_1109_ACCESS_2020_3043775 crossref_primary_10_1002_acs_3761 crossref_primary_10_1007_s10489_024_05933_w crossref_primary_10_1109_TNNLS_2023_3245102 crossref_primary_10_1109_TNNLS_2023_3245630 crossref_primary_10_1109_TCYB_2021_3107801 crossref_primary_10_1016_j_neucom_2021_10_065 crossref_primary_10_3390_robotics11050116 crossref_primary_10_1109_TCYB_2022_3198078 crossref_primary_10_1109_TNNLS_2022_3152268 crossref_primary_10_1002_rnc_7710 crossref_primary_10_1016_j_neucom_2024_129311 crossref_primary_10_1109_TCYB_2025_3562172 crossref_primary_10_1109_TNNLS_2021_3117790 crossref_primary_10_1007_s11071_024_10493_2 |
| Cites_doi | 10.1109/MCAS.2009.933854 10.1016/j.neunet.2012.02.005 10.1002/9781118025604 10.1109/TNNLS.2015.2424971 10.1109/TNNLS.2017.2654324 10.1109/TNNLS.2015.2490698 10.1109/TNNLS.2013.2247627 10.1109/TNN.2011.2147797 10.1109/TNNLS.2018.2875870 10.1007/BF00114726 10.1109/TNNLS.2016.2585520 10.3182/20060517-3-FR-2903.00330 10.1109/MCI.2009.932261 10.1109/TASE.2013.2284545 10.1109/72.701173 10.1109/TNNLS.2013.2292704 10.1162/089976600300015961 10.1109/TNN.1998.712192 10.1109/72.623201 10.1109/72.914523 10.1109/5.58337 10.1109/TNNLS.2013.2271454 10.1109/TNN.2009.2027233 10.1109/TIE.2014.2301770 10.1109/IJCNN.2017.7966204 10.1007/BF00115009 10.1109/TNNLS.2013.2283574 10.1109/TFUZZ.2015.2505327 10.1002/9781118122631 10.1016/j.automatica.2015.06.001 10.1002/9781118453988.ch3 10.1109/TSMCB.2008.926614 10.1109/IJCNN.2016.7727679 10.1109/TSMCB.2009.2025508 10.1109/TCYB.2014.2357896 10.1086/209106 10.1109/IJCNN.2012.6252791 10.1049/iet-cta:20050341 10.1109/TNNLS.2014.2329942 10.1007/s11768-011-1005-3 10.1016/0893-6080(90)90005-6 10.1109/TSMCB.2008.924141 10.1109/TNN.2008.2000396 10.1109/TNNLS.2013.2281663 10.1109/TSMCB.2012.2216523 10.1109/TNNLS.2013.2271778 10.1109/TNNLS.2019.2919614 10.1109/TAC.2016.2616644 10.1016/S0377-2217(98)00051-4 10.1016/j.neunet.2006.08.010 10.23919/ACC.1989.4790360 10.1109/TIA.2003.809438 10.1109/9780470544785 10.1016/j.neucom.2015.04.014 10.1016/j.neucom.2011.05.031 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020 |
| DBID | 97E RIA RIE AAYXX CITATION NPM 7QF 7QO 7QP 7QQ 7QR 7SC 7SE 7SP 7SR 7TA 7TB 7TK 7U5 8BQ 8FD F28 FR3 H8D JG9 JQ2 KR7 L7M L~C L~D P64 7X8 |
| DOI | 10.1109/TNNLS.2019.2919338 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE/IET Electronic Library (IEL) (UW System Shared) CrossRef PubMed Aluminium Industry Abstracts Biotechnology Research Abstracts Calcium & Calcified Tissue Abstracts Ceramic Abstracts Chemoreception Abstracts Computer and Information Systems Abstracts Corrosion Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts Materials Business File Mechanical & Transportation Engineering Abstracts Neurosciences Abstracts Solid State and Superconductivity Abstracts METADEX Technology Research Database ANTE: Abstracts in New Technology & Engineering Engineering Research Database Aerospace Database Materials Research Database ProQuest Computer Science Collection Civil Engineering Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Biotechnology and BioEngineering Abstracts MEDLINE - Academic |
| DatabaseTitle | CrossRef PubMed Materials Research Database Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Materials Business File Aerospace Database Engineered Materials Abstracts Biotechnology Research Abstracts Chemoreception Abstracts Advanced Technologies Database with Aerospace ANTE: Abstracts in New Technology & Engineering Civil Engineering Abstracts Aluminium Industry Abstracts Electronics & Communications Abstracts Ceramic Abstracts Neurosciences Abstracts METADEX Biotechnology and BioEngineering Abstracts Computer and Information Systems Abstracts Professional Solid State and Superconductivity Abstracts Engineering Research Database Calcium & Calcified Tissue Abstracts Corrosion Abstracts MEDLINE - Academic |
| DatabaseTitleList | Materials Research Database PubMed MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEEE/IET Electronic Library (IEL) (UW System Shared) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher – sequence: 3 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2162-2388 |
| EndPage | 1169 |
| ExternalDocumentID | 31247567 10_1109_TNNLS_2019_2919338 8742790 |
| Genre | orig-research Journal Article |
| GrantInformation_xml | – fundername: Cooperative Agreement (The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein) grantid: W911NF-18-2-0260 – fundername: National Science Foundation funderid: 10.13039/501100008982 – fundername: Basra Oil Company (BOC), Iraq – fundername: Missouri University of Science and Technology Intelligent Systems Center funderid: 10.13039/100011535 – fundername: Lifelong Learning Machines Program from the DARPA/Microsystems Technology Office funderid: 10.13039/100000185 – fundername: Higher Committee for Educational Development (HCED) – fundername: Mary K. Finley Missouri Endowment – fundername: Army Research Laboratory (ARL) funderid: 10.13039/100006754 |
| GroupedDBID | 0R~ 4.4 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACIWK ACPRK AENEX AFRAH AGQYO AGSQL AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD IFIPE IPLJI JAVBF M43 MS~ O9- OCL PQQKQ RIA RIE RNS AAYXX CITATION NPM RIG 7QF 7QO 7QP 7QQ 7QR 7SC 7SE 7SP 7SR 7TA 7TB 7TK 7U5 8BQ 8FD F28 FR3 H8D JG9 JQ2 KR7 L7M L~C L~D P64 7X8 |
| ID | FETCH-LOGICAL-c351t-e93b893bee53349fc1fb2ab2fabd5ab942e4c5410a8fee3596b60d5d0aed26823 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 31 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000525351800008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2162-237X 2162-2388 |
| IngestDate | Thu Oct 02 10:27:01 EDT 2025 Sun Nov 09 06:20:29 EST 2025 Thu Jan 02 22:59:03 EST 2025 Sat Nov 29 01:40:03 EST 2025 Tue Nov 18 22:30:52 EST 2025 Wed Aug 27 02:42:21 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | true |
| Issue | 4 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c351t-e93b893bee53349fc1fb2ab2fabd5ab942e4c5410a8fee3596b60d5d0aed26823 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ORCID | 0000-0002-9726-9051 0000-0001-5200-7587 |
| PMID | 31247567 |
| PQID | 2387070648 |
| PQPubID | 85436 |
| PageCount | 15 |
| ParticipantIDs | ieee_primary_8742790 proquest_journals_2387070648 pubmed_primary_31247567 proquest_miscellaneous_2250617274 crossref_citationtrail_10_1109_TNNLS_2019_2919338 crossref_primary_10_1109_TNNLS_2019_2919338 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-04-01 |
| PublicationDateYYYYMMDD | 2020-04-01 |
| PublicationDate_xml | – month: 04 year: 2020 text: 2020-04-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States – name: Piscataway |
| PublicationTitle | IEEE transaction on neural networks and learning systems |
| PublicationTitleAbbrev | TNNLS |
| PublicationTitleAlternate | IEEE Trans Neural Netw Learn Syst |
| PublicationYear | 2020 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref57 seijen (ref50) 2016; 145 ref56 ref12 ref59 ref15 ref58 ref14 ref53 ref52 ref55 ref10 ref17 ref16 sutton (ref41) 2016; 73 ref19 ref18 ref51 lewis (ref8) 2013 ref46 ref45 ref48 ref47 ref43 fu (ref5) 2011; 22 ref49 ref7 lewis (ref44) 2013 ref9 ref4 ref3 ref6 ref34 ref37 ref36 ref31 ref30 ref33 ref32 ref2 zhang (ref23) 2003 ref1 ref39 ref38 fairbank (ref60) 2011 liu (ref35) 2013; 43 bellman (ref11) 1957 ref24 seijen (ref42) 2014 werbos (ref13) 1992 ref26 xu (ref25) 2014; 25 ref64 ref20 van seijen (ref40) 2016; 145 ref63 ref22 ref65 ref21 ref28 ref27 ref29 ref62 ref61 ni (ref54) 2013; 24 |
| References_xml | – ident: ref2 doi: 10.1109/MCAS.2009.933854 – ident: ref32 doi: 10.1016/j.neunet.2012.02.005 – ident: ref29 doi: 10.1002/9781118025604 – ident: ref18 doi: 10.1109/TNNLS.2015.2424971 – ident: ref36 doi: 10.1109/TNNLS.2017.2654324 – ident: ref10 doi: 10.1109/TNNLS.2015.2490698 – ident: ref6 doi: 10.1109/TNNLS.2013.2247627 – volume: 22 start-page: 1133 year: 2011 ident: ref5 article-title: Adaptive learning and control for MIMO system based on adaptive dynamic programming publication-title: IEEE Trans Neural Netw doi: 10.1109/TNN.2011.2147797 – ident: ref53 doi: 10.1109/TNNLS.2018.2875870 – ident: ref49 doi: 10.1007/BF00114726 – ident: ref4 doi: 10.1109/TNNLS.2016.2585520 – ident: ref57 doi: 10.3182/20060517-3-FR-2903.00330 – ident: ref14 doi: 10.1109/MCI.2009.932261 – ident: ref9 doi: 10.1109/TASE.2013.2284545 – ident: ref63 doi: 10.1109/72.701173 – volume: 25 start-page: 635 year: 2014 ident: ref25 article-title: Reinforcement learning output feedback NN control using deterministic learning technique publication-title: IEEE Trans Neural Netw Learn Syst doi: 10.1109/TNNLS.2013.2292704 – ident: ref51 doi: 10.1162/089976600300015961 – ident: ref48 doi: 10.1109/TNN.1998.712192 – volume: 145 start-page: 1 year: 2016 ident: ref40 article-title: True Online Temporal-Difference Learning publication-title: J Mach Learn Res – ident: ref12 doi: 10.1109/72.623201 – ident: ref17 doi: 10.1109/72.914523 – ident: ref55 doi: 10.1109/5.58337 – volume: 24 start-page: 2038 year: 2013 ident: ref54 article-title: Goal representation heuristic dynamic programming on maze navigation publication-title: IEEE Trans Neural Netw Learn Syst doi: 10.1109/TNNLS.2013.2271454 – ident: ref16 doi: 10.1109/TNN.2009.2027233 – year: 2013 ident: ref8 publication-title: Reinforcement Learning and Approximate Dynamic Programming for Feedback Control – ident: ref26 doi: 10.1109/TIE.2014.2301770 – volume: 145 start-page: 1 year: 2016 ident: ref50 article-title: True online temporal-difference learning publication-title: J Mach Learn Res – ident: ref45 doi: 10.1109/IJCNN.2017.7966204 – ident: ref39 doi: 10.1007/BF00115009 – ident: ref61 doi: 10.1109/TNNLS.2013.2283574 – ident: ref33 doi: 10.1109/TFUZZ.2015.2505327 – volume: 73 start-page: 1 year: 2016 ident: ref41 article-title: An emphatic approach to the problem of off-policy temporal-difference learning publication-title: J Mach Learn Res – ident: ref46 doi: 10.1002/9781118122631 – year: 1957 ident: ref11 publication-title: Dynamic Programming – ident: ref31 doi: 10.1016/j.automatica.2015.06.001 – ident: ref38 doi: 10.1002/9781118453988.ch3 – ident: ref34 doi: 10.1109/TSMCB.2008.926614 – ident: ref24 doi: 10.1109/IJCNN.2016.7727679 – year: 1992 ident: ref13 article-title: Approximate dynamic programming for real-time control and neural modeling publication-title: Handbook of Intelligent Control Neural Fuzzy and Adaptive Approaches – ident: ref65 doi: 10.1109/TSMCB.2009.2025508 – ident: ref28 doi: 10.1109/TCYB.2014.2357896 – ident: ref58 doi: 10.1086/209106 – ident: ref43 doi: 10.1109/IJCNN.2012.6252791 – ident: ref64 doi: 10.1049/iet-cta:20050341 – ident: ref21 doi: 10.1109/TNNLS.2014.2329942 – ident: ref1 doi: 10.1007/s11768-011-1005-3 – ident: ref59 doi: 10.1016/0893-6080(90)90005-6 – ident: ref15 doi: 10.1109/TSMCB.2008.924141 – ident: ref62 doi: 10.1109/TNN.2008.2000396 – ident: ref30 doi: 10.1109/TNNLS.2013.2281663 – start-page: 248 year: 2003 ident: ref23 article-title: A comparison of dual heuristic programming (DHP) and neural network based stochastic optimization approach on collective robotic search problem publication-title: Proc Int Joint Conf Neural Netw (IJCNN) – volume: 43 start-page: 779 year: 2013 ident: ref35 article-title: Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems publication-title: IEEE Trans Cybern doi: 10.1109/TSMCB.2012.2216523 – ident: ref47 doi: 10.1109/TNNLS.2013.2271778 – ident: ref52 doi: 10.1109/TNNLS.2019.2919614 – ident: ref27 doi: 10.1109/TAC.2016.2616644 – year: 2013 ident: ref44 publication-title: Reinforcement Learning and Approximate Dynamic Programming for Feedback Control – ident: ref56 doi: 10.1016/S0377-2217(98)00051-4 – ident: ref3 doi: 10.1016/j.neunet.2006.08.010 – start-page: 692 year: 2014 ident: ref42 article-title: True online TD( $\lambda$ ) publication-title: Proc 31st Int Conf Mach Learn – year: 2011 ident: ref60 article-title: The local optimality of reinforcement learning by value gradients, and its relationship to policy gradient learning publication-title: arXiv 1101 0428 – ident: ref37 doi: 10.23919/ACC.1989.4790360 – ident: ref22 doi: 10.1109/TIA.2003.809438 – ident: ref7 doi: 10.1109/9780470544785 – ident: ref20 doi: 10.1016/j.neucom.2015.04.014 – ident: ref19 doi: 10.1016/j.neucom.2011.05.031 |
| SSID | ssj0000605649 |
| Score | 2.482198 |
| Snippet | In problems with complex dynamics and challenging state spaces, the dual heuristic programming (DHP) algorithm has been shown theoretically and experimentally... |
| SourceID | proquest pubmed crossref ieee |
| SourceType | Aggregation Database Index Database Enrichment Source Publisher |
| StartPage | 1155 |
| SubjectTerms | Adaptive algorithms Adaptive dynamic programming (ADP) Adaptive learning Algorithms Computer applications Computer simulation Convergence convergence analysis Decay Distance learning Dynamic programming eligibility traces Heuristic algorithms Internet Learning systems Machine learning Networks online learning Optimal control Optimization Programming reinforcement learning Stability analysis temporal difference (TD) value gradient learning (VGL) |
| Title | An Improved N-Step Value Gradient Learning Adaptive Dynamic Programming Algorithm for Online Learning |
| URI | https://ieeexplore.ieee.org/document/8742790 https://www.ncbi.nlm.nih.gov/pubmed/31247567 https://www.proquest.com/docview/2387070648 https://www.proquest.com/docview/2250617274 |
| Volume | 31 |
| WOSCitedRecordID | wos000525351800008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE/IET Electronic Library (IEL) (UW System Shared) customDbUrl: eissn: 2162-2388 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000605649 issn: 2162-237X databaseCode: RIE dateStart: 20120101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB4B6oELtOWVQpErcSuBxHHi-LhqS3tAKyQo2lvkxwSQluxqyfb3M3ayEQdA6i2K7djKjMffjOcBcMId2pRgcmyl47FAZ2OTCIw5N3UtTOpUHQKFL-V4XE4m6moNTodYGEQMzmd45h_DXb6b2aU3lZ2XpMdJRQr6upRFF6s12FMSmrAIaJenBY95JierGJlEnd-Mx5fX3pFLnXFFmMWHo7w4h0JhlbcxZjhrLrb_b5UfYavHlGzUMcEnWMPmM2yv6jWwfvvuAI4a1hkR0DGSpC3O2a2eLpH9XgTPr5b16Vbv2MjpuZeE7GdXsp5ddY5cj6FxejdbPLT3j4wgL-uylQ5Dd-Hvxa-bH3_ivsxCbLM8bWNUmSHUYhB9WK6qbVobrg2vtXG5NkpwFDYXaaLLGjHLVWGKxOUu0eh4UfJsDzaaWYMHwOi9y6whIaCcSNGa0hqiSpkVGmtRmgjS1U-vbJ-D3JfCmFZBF0lUFQhVeUJVPaEi-D6MmXcZON7tveMpMvTsiRHB0Yq2Vb9JnypCK5IkHq0vgm9DM20vf2eiG5wtqQ9BxADyRAT7HU8M384IG8m8kF9en_MQNrlXzoObzxFstIslfoUP9l_78LQ4Jh6elMeBh58BOMzuSg |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB5VBQkutFBoQ1tqJG6QbuI4Dx9XLaUVS1SJBe0t8mPSVtpmV9tsfz9jJxtxACRuUWzHVj57_I09D4AP3KKJiSaHJrc8FGhNqCOBIee6roWOray9o_AkL8tiNpPXW_Bp8IVBRG98hqfu0d_l24VZu6OyUUF6XC5JQX-SCsGjzltrOFGJqMvM810eZzzkST7beMlEcjQty8l3Z8olT7kk1uIcUn7biXxqlb-zTL_bXOz83zh34UXPKtm4mwYvYQubV7CzydjA-gW8BzhuWHeMgJaRLG1xyX6q-RrZl5W3_WpZH3D1ho2tWjpZyM67pPXsujPluveF85vF6q69vWdEelkXr3Ro-hp-XHyenl2GfaKF0CRp3IYoE028RSM6x1xZm7jWXGleK21TpaXgKEwq4kgVNWKSykxnkU1tpNDyrODJG9huFg0eAKP3NjGaxIC0IkajC6MJlSLJFNai0AHEm59emT4KuUuGMa-8NhLJygNVOaCqHqgAPg5tll0Mjn_W3nOIDDV7MAI42mBb9cv0oSK-kpPMo_EF8H4opgXmbk1Ug4s11SGS6GmeCGC_mxPDtxNiR3ma5W__3OcJPLucfptUk6vy6yE8505V90Y_R7DdrtZ4DE_NY3v3sHrnZ_IvIxTwqQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Improved+N+-Step+Value+Gradient+Learning+Adaptive+Dynamic+Programming+Algorithm+for+Online+Learning&rft.jtitle=IEEE+transaction+on+neural+networks+and+learning+systems&rft.au=Al-Dabooni%2C+Seaar&rft.au=Wunsch%2C+Donald+C&rft.date=2020-04-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=2162-237X&rft.eissn=2162-2388&rft.volume=31&rft.issue=4&rft.spage=1155&rft_id=info:doi/10.1109%2FTNNLS.2019.2919338&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2162-237X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2162-237X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2162-237X&client=summon |