An Equivalence Between Adaptive Dynamic Programming With a Critic and Backpropagation Through Time

We consider the adaptive dynamic programming technique called Dual Heuristic Programming (DHP), which is designed to learn a critic function, when using learned model functions of the environment. DHP is designed for optimizing control problems in large and continuous state spaces. We extend DHP int...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transaction on neural networks and learning systems Jg. 24; H. 12; S. 2088 - 2100
Hauptverfasser: Fairbank, Michael, Alonso, Eduardo, Prokhorov, Danil
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York, NY IEEE 01.12.2013
Institute of Electrical and Electronics Engineers
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:
ISSN:2162-237X, 2162-2388, 2162-2388
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We consider the adaptive dynamic programming technique called Dual Heuristic Programming (DHP), which is designed to learn a critic function, when using learned model functions of the environment. DHP is designed for optimizing control problems in large and continuous state spaces. We extend DHP into a new algorithm that we call Value-Gradient Learning, VGL(λ), and prove equivalence of an instance of the new algorithm to Backpropagation Through Time for Control with a greedy policy. Not only does this equivalence provide a link between these two different approaches, but it also enables our variant of DHP to have guaranteed convergence, under certain smoothness conditions and a greedy policy, when using a general smooth nonlinear function approximator for the critic. We consider several experimental scenarios including some that prove divergence of DHP under a greedy policy, which contrasts against our proven-convergent algorithm.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ObjectType-Article-2
ObjectType-Feature-1
content type line 23
ISSN:2162-237X
2162-2388
2162-2388
DOI:10.1109/TNNLS.2013.2271778