Model-Free Dual Heuristic Dynamic Programming

Model-based dual heuristic dynamic programming (MB-DHP) is a popular approach in approximating optimal solutions in control problems. Yet, it usually requires offline training for the model network, and thus resulting in extra computational cost. In this brief, we propose a model-free DHP (MF-DHP) d...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE transaction on neural networks and learning systems Ročník 26; číslo 8; s. 1834 - 1839
Hlavní autoři:	Zhen Ni, Haibo He, Xiangnan Zhong, Prokhorov, Danil V.
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	United States IEEE 01.08.2015
Témata:	Action-dependent dual heuristic dynamic programming (DHP) adaptive critic designs (ACDs) adaptive dynamic programming (ADP) Approximation methods Computational modeling Convergence Dynamic programming Learning systems Linear programming Mathematical model online learning reinforcement learning adaptive critic designs (ACDs) Action-dependent dual heuristic dynamic programming (DHP) adaptive dynamic programming (ADP) online learning reinforcement learning
ISSN:	2162-237X, 2162-2388, 2162-2388
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Model-based dual heuristic dynamic programming (MB-DHP) is a popular approach in approximating optimal solutions in control problems. Yet, it usually requires offline training for the model network, and thus resulting in extra computational cost. In this brief, we propose a model-free DHP (MF-DHP) design based on finite-difference technique. In particular, we adopt multilayer perceptron with one hidden layer for both the action and the critic networks design, and use delayed objective functions to train both the action and the critic networks online over time. We test both the MF-DHP and MB-DHP approaches with a discrete time example and a continuous time example under the same parameter settings. Our simulation results demonstrate that the MF-DHP approach can obtain a control performance competitive with that of the traditional MB-DHP approach while requiring less computational resources.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2015.2424971