Natural actor–critic algorithms

We present four new reinforcement learning algorithms based on actor–critic, natural-gradient and function-approximation ideas, and we provide their convergence proofs. Actor–critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters ar...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Automatica (Oxford) Jg. 45; H. 11; S. 2471 - 2482
Hauptverfasser:	Bhatnagar, Shalabh, Sutton, Richard S., Ghavamzadeh, Mohammad, Lee, Mark
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Kidlington Elsevier Ltd 01.11.2009 Elsevier
Schlagworte:	Actor–critic reinforcement learning algorithms Algorithms Applied sciences Approximate dynamic programming Artificial intelligence Cognitive science Computer science Computer science; control theory; systems Convergence Exact sciences and technology Function approximation Learning Natural gradient Parametrization Policies Policy-gradient methods Reinforcement Temporal difference learning Temporal logic Two-timescale stochastic approximation Variance Two-timescale stochastic approximation Temporal difference learning Approximate dynamic programming Policy-gradient methods Actor–critic reinforcement learning algorithms Function approximation Natural gradient algorithms Probabilistic approach Reinforcement learning Empirical method Stochastic approximation State space method Parameterization Variance Interest Gradient descent Value function Actor-critic reinforcement learning Dynamic programming Compatibility Learning algorithm Artificial intelligence Gradient method
ISSN:	0005-1098, 1873-2836
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!