Policy gradient in Lipschitz Markov Decision Processes

This paper is about the exploitation of Lipschitz continuity properties for Markov Decision Processes to safely speed up policy-gradient algorithms. Starting from assumptions about the Lipschitz continuity of the state-transition model, the reward function, and the policies considered in the learnin...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Machine learning Ročník 100; číslo 2-3; s. 255 - 283
Hlavní autori:	Pirotta, Matteo, Restelli, Marcello, Bascetta, Luca
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	New York Springer US 01.09.2015 Springer Nature B.V
Predmet:	Algorithms Artificial Intelligence Computer Science Continuity Control Control systems Decision making models Learning Machine learning Markov analysis Markov processes Mathematical models Mechatronics Natural Language Processing (NLP) Policies Robotics Simulation and Modeling Policy gradient algorithm Markov Decision Process Reinforcement learning Lipschitz continuity
ISSN:	0885-6125, 1573-0565
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	This paper is about the exploitation of Lipschitz continuity properties for Markov Decision Processes to safely speed up policy-gradient algorithms. Starting from assumptions about the Lipschitz continuity of the state-transition model, the reward function, and the policies considered in the learning process, we show that both the expected return of a policy and its gradient are Lipschitz continuous w.r.t. policy parameters. By leveraging such properties, we define policy-parameter updates that guarantee a performance improvement at each iteration. The proposed methods are empirically evaluated and compared to other related approaches using different configurations of three popular control scenarios: the linear quadratic regulator, the mass-spring-damper system and the ship-steering control.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	0885-6125 1573-0565
DOI:	10.1007/s10994-015-5484-1