A novel Q-learning algorithm with function approximation for constrained Markov decision processes

We present a novel multi-timescale Q-learning algorithm for average cost control in a Markov decision process subject to multiple inequality constraints. We formulate a relaxed version of this problem through the Lagrange multiplier method. Our algorithm is different from Q-learning in that it updat...

Full description

Saved in:

Bibliographic Details
Published in:	2012 50th Annual Allerton Conference on Communication, Control, and Computing pp. 400 - 405
Main Authors:	Lakshmanan, K., Bhatnagar, S.
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01.10.2012
Subjects:	Approximation algorithms Constrained MDP Function approximation Lagrange multiplier method Markov processes Minimization multi-stage stochastic shortest path problem Q-learning with linear function approximation reinforcement learning Routing Vectors Zinc
ISBN:	9781467345378, 1467345377
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Be the first to leave a comment!