A novel Q-learning algorithm with function approximation for constrained Markov decision processes

We present a novel multi-timescale Q-learning algorithm for average cost control in a Markov decision process subject to multiple inequality constraints. We formulate a relaxed version of this problem through the Lagrange multiplier method. Our algorithm is different from Q-learning in that it updat...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2012 50th Annual Allerton Conference on Communication, Control, and Computing s. 400 - 405
Hlavní autoři:	Lakshmanan, K., Bhatnagar, S.
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 01.10.2012
Témata:	Approximation algorithms Constrained MDP Function approximation Lagrange multiplier method Markov processes Minimization multi-stage stochastic shortest path problem Q-learning with linear function approximation reinforcement learning Routing Vectors Zinc
ISBN:	9781467345378, 1467345377
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	We present a novel multi-timescale Q-learning algorithm for average cost control in a Markov decision process subject to multiple inequality constraints. We formulate a relaxed version of this problem through the Lagrange multiplier method. Our algorithm is different from Q-learning in that it updates two parameters - a Q-value parameter and a policy parameter. The Q-value parameter is updated on a slower time scale as compared to the policy parameter. Whereas Q-learning with function approximation can diverge in some cases, our algorithm is seen to be convergent as a result of the aforementioned timescale separation. We show the results of experiments on a problem of constrained routing in a multistage queueing network. Our algorithm is seen to exhibit good performance and the various inequality constraints are seen to be satisfied upon convergence of the algorithm.
ISBN:	9781467345378 1467345377
DOI:	10.1109/Allerton.2012.6483246