Faster algorithm and sharper analysis for constrained Markov decision process

The problem of constrained Markov decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated reward subject to constraints on its utilities/costs. We propose a new primal-dual approach with a novel integration of entropy regularization and Nesterov's accel...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Operations research letters Ročník 54; s. 107107
Hlavní autoři:	Li, Tianjiao, Guan, Ziwei, Zou, Shaofeng, Xu, Tengyu, Liang, Yingbin, Lan, Guanghui
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier B.V 01.05.2024
Témata:	Accelerated gradient method Constrained Markov decision process Entropy regularization Policy optimization Primal-dual algorithm Primal-dual algorithm Constrained Markov decision process Accelerated gradient method Entropy regularization Policy optimization
ISSN:	0167-6377, 1872-7468
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	The problem of constrained Markov decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated reward subject to constraints on its utilities/costs. We propose a new primal-dual approach with a novel integration of entropy regularization and Nesterov's accelerated gradient method. The proposed approach is shown to converge to the global optimum with a complexity of O˜(1/ϵ) in terms of the optimality gap and the constraint violation, which improves the complexity of the existing primal-dual approaches by a factor of O(1/ϵ).
ISSN:	0167-6377 1872-7468
DOI:	10.1016/j.orl.2024.107107