Approximate Constrained Discounted Dynamic Programming With Uniform Feasibility and Optimality
An important question about finite constrained Markov decision process (CMDP) problem is if there exists a condition under which a uniformly optimal and uniformly feasible policy exists in the set of deterministic, history-independent, and stationary policies that achieves the optimal value at all i...
Uloženo v:
| Vydáno v: | IEEE transactions on automatic control Ročník 70; číslo 6; s. 4031 - 4036 |
|---|---|
| Hlavní autor: | |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
New York
IEEE
01.06.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Témata: | |
| ISSN: | 0018-9286, 1558-2523 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | An important question about finite constrained Markov decision process (CMDP) problem is if there exists a condition under which a uniformly optimal and uniformly feasible policy exists in the set of deterministic, history-independent, and stationary policies that achieves the optimal value at all initial states and if the CMDP problem with the condition can be solved by dynamic programming (DP). This is because the crux of the unconstrained MDP theory developed by Bellman lies in the answer to the same existence question of such an optimal policy to MDP. Even if the topic of CMDP has been studied over the years, there has not been any relevant responsive work since the open question was raised about three decades ago in the literature. We establish (as some answer to this question) that any finite CMDP problem <inline-formula><tex-math notation="LaTeX"> \mathsf{M}^{c}</tex-math></inline-formula> "contains" inherently a DP-structure in its "subordinate" CMDP problem <inline-formula><tex-math notation="LaTeX">\hat{ \mathsf{M} }^{c}</tex-math></inline-formula> induced from the parameters of <inline-formula><tex-math notation="LaTeX"> \mathsf{M} ^{c}</tex-math></inline-formula> and <inline-formula><tex-math notation="LaTeX">\hat{\mathsf{M} }^{c}</tex-math></inline-formula> is DP-solvable. We drive a policy-iteration-type algorithm for solving <inline-formula><tex-math notation="LaTeX">\hat{\mathsf{M} }^{c}</tex-math></inline-formula> providing an approximate solution to <inline-formula><tex-math notation="LaTeX"> \mathsf{M}^{c}</tex-math></inline-formula> or <inline-formula><tex-math notation="LaTeX"> \mathsf{M} ^{c}</tex-math></inline-formula> with a fixed initial state. |
|---|---|
| Bibliografie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0018-9286 1558-2523 |
| DOI: | 10.1109/TAC.2024.3523847 |