An Adaptive Sampling Algorithm for Solving Markov Decision Processes
Based on recent results for multiarmed bandit problems, we propose an adaptive sampling algorithm that approximates the optimal value of a finite-horizon Markov decision process (MDP) with finite state and action spaces. The algorithm adaptively chooses which action to sample as the sampling process...
Saved in:
| Published in: | Operations research Vol. 53; no. 1; pp. 126 - 139 |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Linthicum, MD
INFORMS
01.01.2005
Institute for Operations Research and the Management Sciences |
| Subjects: | |
| ISSN: | 0030-364X, 1526-5463 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Be the first to leave a comment!