An Adaptive Sampling Algorithm for Solving Markov Decision Processes

Based on recent results for multiarmed bandit problems, we propose an adaptive sampling algorithm that approximates the optimal value of a finite-horizon Markov decision process (MDP) with finite state and action spaces. The algorithm adaptively chooses which action to sample as the sampling process...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Operations research Ročník 53; číslo 1; s. 126 - 139
Hlavní autori: Chang, Hyeong Soo, Fu, Michael C, Hu, Jiaqiao, Marcus, Steven I
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Linthicum, MD INFORMS 01.01.2005
Institute for Operations Research and the Management Sciences
Predmet:
ISSN:0030-364X, 1526-5463
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Based on recent results for multiarmed bandit problems, we propose an adaptive sampling algorithm that approximates the optimal value of a finite-horizon Markov decision process (MDP) with finite state and action spaces. The algorithm adaptively chooses which action to sample as the sampling process proceeds and generates an asymptotically unbiased estimator, whose bias is bounded by a quantity that converges to zero at rate (ln N )/ N , where N is the total number of samples that are used per state sampled in each stage. The worst-case running-time complexity of the algorithm is O (( |A|N ) H ), independent of the size of the state space, where | A | is the size of the action space and H is the horizon length. The algorithm can be used to create an approximate receding horizon control to solve infinite-horizon MDPs. To illustrate the algorithm, computational results are reported on simple examples from inventory control.
Bibliografia:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ISSN:0030-364X
1526-5463
DOI:10.1287/opre.1040.0145