Kernel-Based Reinforcement Learning

We present a kernel-based approach to reinforcement learning that overcomes the stability problems of temporal-difference learning in continuous state-spaces. First, our algorithm converges to a unique solution of an approximate Bellman's equation regardless of its initialization values. Second...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Machine learning Ročník 49; číslo 2-3; s. 161 - 178
Hlavní autoři: Ormoneit, Dirk, Sen, Śaunak
Médium: Journal Article
Jazyk:angličtina
Vydáno: Dordrecht Springer Nature B.V 01.11.2002
Témata:
ISSN:0885-6125, 1573-0565
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:We present a kernel-based approach to reinforcement learning that overcomes the stability problems of temporal-difference learning in continuous state-spaces. First, our algorithm converges to a unique solution of an approximate Bellman's equation regardless of its initialization values. Second, the method is consistent in the sense that the resulting policy converges asymptotically to the optimal policy. Parametric value function estimates such as neural networks do not possess this property. Our kernel-based approach also allows us to show that the limiting distribution of the value function estimate is a Gaussian process. This information is useful in studying the bias-variance tradeoff in reinforcement learning. We find that all reinforcement learning approaches to estimating the value function, parametric or non-parametric, are subject to a bias. This bias is typically larger in reinforcement learning than in a comparable regression problem.[PUBLICATION ABSTRACT]
Bibliografie:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-2
content type line 23
ISSN:0885-6125
1573-0565
DOI:10.1023/A:1017928328829