Rank-1 transition uncertainties in constrained Markov decision processes
We consider an infinite-horizon discounted constrained Markov decision process (CMDP) with uncertain transition probabilities. We assume that the uncertainty in transition probabilities has a rank-1 matrix structure and the underlying uncertain parameters belong to a polytope. We formulate the uncer...
Uloženo v:
| Vydáno v: | European journal of operational research Ročník 318; číslo 1; s. 167 - 178 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier B.V
01.10.2024
|
| Témata: | |
| ISSN: | 0377-2217, 1872-6860 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | We consider an infinite-horizon discounted constrained Markov decision process (CMDP) with uncertain transition probabilities. We assume that the uncertainty in transition probabilities has a rank-1 matrix structure and the underlying uncertain parameters belong to a polytope. We formulate the uncertain CMDP problem using a robust optimization framework. To derive reformulation of the robust CMDP problem, we restrict to the class of stationary policies and show that it is equivalent to a bilinear programming problem. We provide a simple example where a Markov policy performs better than the optimal policy in the class of stationary policies, implying that, unlike in classical CMDP problem, an optimal policy of the robust CMDP problem need not be present in the class of stationary policies. For the case of a single uncertain parameter, we propose sufficient conditions under which an optimal policy of the restricted robust CMDP problem is unaffected by uncertainty. The numerical experiments are performed on randomly generated instances of a machine replacement problem and a well-known class of problems called Garnets.
•We study robust constrained Markov decision process under uncertain transitions.•We show its equivalence to a bilinear programming problem.•Under certain conditions, optimal policy is unaffected by uncertain parameters.•Numerical experiments are performed on machine replacement problem and Garnets. |
|---|---|
| ISSN: | 0377-2217 1872-6860 |
| DOI: | 10.1016/j.ejor.2024.04.023 |