Approximate Constrained Discounted Dynamic Programming With Uniform Feasibility and Optimality

An important question about finite constrained Markov decision process (CMDP) problem is if there exists a condition under which a uniformly optimal and uniformly feasible policy exists in the set of deterministic, history-independent, and stationary policies that achieves the optimal value at all i...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE transactions on automatic control Ročník 70; číslo 6; s. 4031 - 4036
Hlavní autor:	Chang, Hyeong Soo
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	New York IEEE 01.06.2025 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:	Algorithms Approximation algorithms Constraints Costs Dynamic programming Dynamic programming (DP) Feasibility Markov decision process Markov processes optimality equation Optimization Probability distribution Programming Reviews Scheduling Scheduling algorithms Throughput Trajectory uniform-optimality
ISSN:	0018-9286, 1558-2523
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	An important question about finite constrained Markov decision process (CMDP) problem is if there exists a condition under which a uniformly optimal and uniformly feasible policy exists in the set of deterministic, history-independent, and stationary policies that achieves the optimal value at all initial states and if the CMDP problem with the condition can be solved by dynamic programming (DP). This is because the crux of the unconstrained MDP theory developed by Bellman lies in the answer to the same existence question of such an optimal policy to MDP. Even if the topic of CMDP has been studied over the years, there has not been any relevant responsive work since the open question was raised about three decades ago in the literature. We establish (as some answer to this question) that any finite CMDP problem <inline-formula><tex-math notation="LaTeX"> \mathsf{M}^{c}</tex-math></inline-formula> "contains" inherently a DP-structure in its "subordinate" CMDP problem <inline-formula><tex-math notation="LaTeX">\hat{ \mathsf{M} }^{c}</tex-math></inline-formula> induced from the parameters of <inline-formula><tex-math notation="LaTeX"> \mathsf{M} ^{c}</tex-math></inline-formula> and <inline-formula><tex-math notation="LaTeX">\hat{\mathsf{M} }^{c}</tex-math></inline-formula> is DP-solvable. We drive a policy-iteration-type algorithm for solving <inline-formula><tex-math notation="LaTeX">\hat{\mathsf{M} }^{c}</tex-math></inline-formula> providing an approximate solution to <inline-formula><tex-math notation="LaTeX"> \mathsf{M}^{c}</tex-math></inline-formula> or <inline-formula><tex-math notation="LaTeX"> \mathsf{M} ^{c}</tex-math></inline-formula> with a fixed initial state.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0018-9286 1558-2523
DOI:	10.1109/TAC.2024.3523847