A novel Q-learning algorithm with function approximation for constrained Markov decision processes
We present a novel multi-timescale Q-learning algorithm for average cost control in a Markov decision process subject to multiple inequality constraints. We formulate a relaxed version of this problem through the Lagrange multiplier method. Our algorithm is different from Q-learning in that it updat...
Saved in:
| Published in: | 2012 50th Annual Allerton Conference on Communication, Control, and Computing pp. 400 - 405 |
|---|---|
| Main Authors: | , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
01.10.2012
|
| Subjects: | |
| ISBN: | 9781467345378, 1467345377 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Be the first to leave a comment!

