Approximate Constrained Discounted Dynamic Programming With Uniform Feasibility and Optimality
An important question about finite constrained Markov decision process (CMDP) problem is if there exists a condition under which a uniformly optimal and uniformly feasible policy exists in the set of deterministic, history-independent, and stationary policies that achieves the optimal value at all i...
Saved in:
| Published in: | IEEE transactions on automatic control Vol. 70; no. 6; pp. 4031 - 4036 |
|---|---|
| Main Author: | |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York
IEEE
01.06.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 0018-9286, 1558-2523 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | An important question about finite constrained Markov decision process (CMDP) problem is if there exists a condition under which a uniformly optimal and uniformly feasible policy exists in the set of deterministic, history-independent, and stationary policies that achieves the optimal value at all initial states and if the CMDP problem with the condition can be solved by dynamic programming (DP). This is because the crux of the unconstrained MDP theory developed by Bellman lies in the answer to the same existence question of such an optimal policy to MDP. Even if the topic of CMDP has been studied over the years, there has not been any relevant responsive work since the open question was raised about three decades ago in the literature. We establish (as some answer to this question) that any finite CMDP problem <inline-formula><tex-math notation="LaTeX"> \mathsf{M}^{c}</tex-math></inline-formula> "contains" inherently a DP-structure in its "subordinate" CMDP problem <inline-formula><tex-math notation="LaTeX">\hat{ \mathsf{M} }^{c}</tex-math></inline-formula> induced from the parameters of <inline-formula><tex-math notation="LaTeX"> \mathsf{M} ^{c}</tex-math></inline-formula> and <inline-formula><tex-math notation="LaTeX">\hat{\mathsf{M} }^{c}</tex-math></inline-formula> is DP-solvable. We drive a policy-iteration-type algorithm for solving <inline-formula><tex-math notation="LaTeX">\hat{\mathsf{M} }^{c}</tex-math></inline-formula> providing an approximate solution to <inline-formula><tex-math notation="LaTeX"> \mathsf{M}^{c}</tex-math></inline-formula> or <inline-formula><tex-math notation="LaTeX"> \mathsf{M} ^{c}</tex-math></inline-formula> with a fixed initial state. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0018-9286 1558-2523 |
| DOI: | 10.1109/TAC.2024.3523847 |