Faster algorithm and sharper analysis for constrained Markov decision process

The problem of constrained Markov decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated reward subject to constraints on its utilities/costs. We propose a new primal-dual approach with a novel integration of entropy regularization and Nesterov's accel...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Operations research letters Jg. 54; S. 107107
Hauptverfasser: Li, Tianjiao, Guan, Ziwei, Zou, Shaofeng, Xu, Tengyu, Liang, Yingbin, Lan, Guanghui
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier B.V 01.05.2024
Schlagworte:
ISSN:0167-6377, 1872-7468
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract The problem of constrained Markov decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated reward subject to constraints on its utilities/costs. We propose a new primal-dual approach with a novel integration of entropy regularization and Nesterov's accelerated gradient method. The proposed approach is shown to converge to the global optimum with a complexity of O˜(1/ϵ) in terms of the optimality gap and the constraint violation, which improves the complexity of the existing primal-dual approaches by a factor of O(1/ϵ).
AbstractList The problem of constrained Markov decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated reward subject to constraints on its utilities/costs. We propose a new primal-dual approach with a novel integration of entropy regularization and Nesterov's accelerated gradient method. The proposed approach is shown to converge to the global optimum with a complexity of O˜(1/ϵ) in terms of the optimality gap and the constraint violation, which improves the complexity of the existing primal-dual approaches by a factor of O(1/ϵ).
ArticleNumber 107107
Author Liang, Yingbin
Zou, Shaofeng
Lan, Guanghui
Guan, Ziwei
Xu, Tengyu
Li, Tianjiao
Author_xml – sequence: 1
  givenname: Tianjiao
  orcidid: 0000-0001-6660-0883
  surname: Li
  fullname: Li, Tianjiao
  email: tli432@gatech.edu
  organization: Georgia Institute of Technology, United States of America
– sequence: 2
  givenname: Ziwei
  surname: Guan
  fullname: Guan, Ziwei
  organization: The Ohio State University, United States of America
– sequence: 3
  givenname: Shaofeng
  surname: Zou
  fullname: Zou, Shaofeng
  organization: University at Buffalo, The State University of New York, United States of America
– sequence: 4
  givenname: Tengyu
  surname: Xu
  fullname: Xu, Tengyu
  organization: Meta, United States of America
– sequence: 5
  givenname: Yingbin
  surname: Liang
  fullname: Liang, Yingbin
  organization: The Ohio State University, United States of America
– sequence: 6
  givenname: Guanghui
  surname: Lan
  fullname: Lan, Guanghui
  organization: Georgia Institute of Technology, United States of America
BookMark eNp9kMFKAzEQQINUsFY_wFt-YGuy2Sa7eJJiVWjxoueQnSQ2dZuUJBT692apJw-FgWGGecPMu0UTH7xB6IGSOSWUP-7mIQ7zmtRNqUWJKzSlragr0fB2gqZlRlScCXGDblPaEUJES9sp2qxUyiZiNXyH6PJ2j5XXOG1VPIxdr4ZTcgnbEDEEn3JUzhuNNyr-hCPWBlxyweNDDGBSukPXVg3J3P_lGfpavXwu36r1x-v78nldAavrXPWs1UQrsO2CdS1XxDDe1Joves2t6EBzQqFbUCAMerANY1RZImrSa9WRXrAZEue9EENK0VgJLqtcLhkPHCQlcrQid7JYkaMVebZSSPqPPES3V_F0kXk6M6a8dHQmygTOeDDaRQNZ6uAu0L8_5332
CitedBy_id crossref_primary_10_1109_TPAMI_2024_3457538
crossref_primary_10_1109_TAC_2024_3523847
crossref_primary_10_3390_rs16122072
crossref_primary_10_1016_j_ejor_2025_08_038
Cites_doi 10.1016/0885-064X(92)90013-2
10.1109/TAC.2018.2876389
10.1239/jap/1134587812
10.1007/s10107-022-01816-5
10.1007/s10514-015-9467-7
10.1080/09540099108946587
ContentType Journal Article
Copyright 2024 Elsevier B.V.
Copyright_xml – notice: 2024 Elsevier B.V.
DBID AAYXX
CITATION
DOI 10.1016/j.orl.2024.107107
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Sciences (General)
EISSN 1872-7468
ExternalDocumentID 10_1016_j_orl_2024_107107
S0167637724000439
GroupedDBID --K
--M
-~X
.DC
.~1
0R~
123
1B1
1OL
1RT
1~.
1~5
29N
4.4
457
4G.
4R4
5VS
7-5
71M
8P~
9JN
9JO
AAAKF
AAAKG
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AARIN
AAXUO
ABAOU
ABFNM
ABJNI
ABMAC
ABUCO
ABXDB
ACDAQ
ACGFS
ACNCT
ACRLP
ADBBV
ADEZE
ADGUI
ADIYS
ADMBK
ADMUD
AEBSH
AEKER
AENEX
AFFNX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AIEXJ
AIGVJ
AIKHN
AITUG
AJOXV
AKRWK
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
APLSM
ARUGR
ASPBG
AVWKF
AXJTR
AZFZN
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EJD
EO8
EO9
EP2
EP3
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
HAMUX
HVGLF
HZ~
IHE
J1W
KOM
LY1
M41
MHUIS
MO0
MS~
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
ROL
RPZ
SDF
SDG
SDS
SES
SEW
SPC
SPCBC
SSB
SSD
SSW
SSZ
T5K
TN5
WH7
WUQ
XPP
XSW
~G-
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKYEP
ANKPU
APXCP
CITATION
EFKBS
EFLBG
~HD
ID FETCH-LOGICAL-c322t-b38d0dacf853986a0e3642d65bd6f79cd601c951c03cbcf4331af0720bda90b73
ISICitedReferencesCount 8
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001218051000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0167-6377
IngestDate Sat Nov 29 03:00:02 EST 2025
Tue Nov 18 21:14:53 EST 2025
Sat Jun 01 15:41:28 EDT 2024
IsPeerReviewed true
IsScholarly true
Keywords Primal-dual algorithm
Constrained Markov decision process
Accelerated gradient method
Entropy regularization
Policy optimization
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c322t-b38d0dacf853986a0e3642d65bd6f79cd601c951c03cbcf4331af0720bda90b73
ORCID 0000-0001-6660-0883
ParticipantIDs crossref_citationtrail_10_1016_j_orl_2024_107107
crossref_primary_10_1016_j_orl_2024_107107
elsevier_sciencedirect_doi_10_1016_j_orl_2024_107107
PublicationCentury 2000
PublicationDate 2024-05-01
PublicationDateYYYYMMDD 2024-05-01
PublicationDate_xml – month: 05
  year: 2024
  text: 2024-05-01
  day: 01
PublicationDecade 2020
PublicationTitle Operations research letters
PublicationYear 2024
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Duan, Chen, Houthooft, Schulman, Abbeel (br0130) 2016
Wang, Cai, Yang, Wang (br0450) 2019
Mei, Xiao, Szepesvari, Schuurmans (br0270) 2020
Mei, Xiao, Szepesvari, Schuurmans (br0280) 2020
Bhandari, Russo (br0050) 2020
Xiao, Huang, Mei, Schuurmans, Müller (br0470) 2019
Haarnoja, Zhou, Abbeel, Levine (br0160) 2018
Williams, Peng (br0460) 1991; 3
Tessler, Mankowitz, Mannor (br0430) 2018
Kakade (br0190) 2001; 14
Sutton, Barto (br0410) 2018
Garcıa, Fernández (br0150) 2015; 16
Cen, Cheng, Chen, Wei, Chi (br0060) 2020
Agarwal, Kakade, Lee, Mahajan (br0020) 2019
Liu, Cai, Yang, Wang (br0250) 2019
Shani, Efroni, Mannor (br0390) 2020
Ding, Zhang, Basar, Jovanovic (br0120) 2020
Mitrophanov (br0290) 2005; 42
Mnih, Badia, Mirza, Graves, Lillicrap, Harley, Silver, Kavukcuoglu (br0300) 2016
Levin, Peres (br0230) 2017
Ding, Wei, Yang, Wang, Jovanović (br0110) 2020
Nemirovsky (br0320) 1992; 8
Hazan, Kakade, Singh, Van Soest (br0170) 2019
Lan (br0210) 2020
Agarwal, Kakade, Lee, Mahajan (br0030) 2020
Chow, Nachum, Faust, Duenez-Guzman, Ghavamzadeh (br0090) 2019
Li, Lan (br0240) 2023
Schulman, Levine, Abbeel, Jordan, Moritz (br0380) 2015
Lan (br0200) 2019
Ono, Pavone, Kuwata, Balaram (br0330) 2015; 39
Xu, Liang, Lan (br0480) 2020
Stooke, Achiam, Abbeel (br0400) 2020
Achiam, Held, Tamar, Abbeel (br0010) 2017
Jin, Sidford (br0180) 2020
Ouyang, Xu (br0340) 2019
Chow, Nachum, Duenez-Guzman, Ghavamzadeh (br0080) 2018
Nachum, Norouzi, Xu, Schuurmans (br0310) 2017
Dalal, Dvijotham, Vecerik, Hester, Paduraru, Tassa (br0100) 2018
Wainwright (br0440) 2019
Chow, Ghavamzadeh, Janson, Pavone (br0070) 2017; 18
Xu, Wang, Liang (br0490) 2020
Altman (br0040) 1999
Paternain, Chamon, Calvo-Fullana, Ribeiro (br0360) 2019
Lan (br0220) 2023; 198
Fisac, Akametalu, Zeilinger, Kaynama, Gillula, Tomlin (br0140) 2018; 64
Liu, Ding, Liu (br0260) 2019
Peters, Mulling, Altun (br0370) 2010
Yang, Rosca, Narasimhan, Ramadge (br0500) 2019
Yu, Yang, Kolar, Wang (br0510) 2019
Yang (10.1016/j.orl.2024.107107_br0500) 2019
Cen (10.1016/j.orl.2024.107107_br0060)
Dalal (10.1016/j.orl.2024.107107_br0100)
Ouyang (10.1016/j.orl.2024.107107_br0340) 2019
Wainwright (10.1016/j.orl.2024.107107_br0440) 2019
Paternain (10.1016/j.orl.2024.107107_br0360) 2019
Achiam (10.1016/j.orl.2024.107107_br0010) 2017
Chow (10.1016/j.orl.2024.107107_br0070) 2017; 18
Sutton (10.1016/j.orl.2024.107107_br0410) 2018
Altman (10.1016/j.orl.2024.107107_br0040) 1999
Li (10.1016/j.orl.2024.107107_br0240)
Fisac (10.1016/j.orl.2024.107107_br0140) 2018; 64
Tessler (10.1016/j.orl.2024.107107_br0430) 2018
Bhandari (10.1016/j.orl.2024.107107_br0050)
Szepesvári (10.1016/j.orl.2024.107107_br0420)
Liu (10.1016/j.orl.2024.107107_br0250) 2019
Schulman (10.1016/j.orl.2024.107107_br0380) 2015
Chow (10.1016/j.orl.2024.107107_br0080) 2018
Zou (10.1016/j.orl.2024.107107_br0530) 2019
Xu (10.1016/j.orl.2024.107107_br0490)
Agarwal (10.1016/j.orl.2024.107107_br0030) 2020
Ding (10.1016/j.orl.2024.107107_br0120) 2020
Nemirovsky (10.1016/j.orl.2024.107107_br0320) 1992; 8
Agarwal (10.1016/j.orl.2024.107107_br0020)
Mei (10.1016/j.orl.2024.107107_br0270)
Duan (10.1016/j.orl.2024.107107_br0130) 2016
Peters (10.1016/j.orl.2024.107107_br0370) 2010
Jin (10.1016/j.orl.2024.107107_br0180) 2020
Paternain (10.1016/j.orl.2024.107107_br0350)
Lan (10.1016/j.orl.2024.107107_br0210) 2020
Shani (10.1016/j.orl.2024.107107_br0390) 2020
Kakade (10.1016/j.orl.2024.107107_br0190) 2001; 14
Lan (10.1016/j.orl.2024.107107_br0200) 2019
Nachum (10.1016/j.orl.2024.107107_br0310) 2017
Hazan (10.1016/j.orl.2024.107107_br0170) 2019
Levin (10.1016/j.orl.2024.107107_br0230) 2017
Liu (10.1016/j.orl.2024.107107_br0260)
Stooke (10.1016/j.orl.2024.107107_br0400) 2020
Mei (10.1016/j.orl.2024.107107_br0280)
Mitrophanov (10.1016/j.orl.2024.107107_br0290) 2005; 42
Mnih (10.1016/j.orl.2024.107107_br0300) 2016
Yu (10.1016/j.orl.2024.107107_br0510) 2019
Xu (10.1016/j.orl.2024.107107_br0480)
Lan (10.1016/j.orl.2024.107107_br0220) 2023; 198
Haarnoja (10.1016/j.orl.2024.107107_br0160) 2018
Xiao (10.1016/j.orl.2024.107107_br0470) 2019
Chow (10.1016/j.orl.2024.107107_br0090)
Ding (10.1016/j.orl.2024.107107_br0110)
Williams (10.1016/j.orl.2024.107107_br0460) 1991; 3
Ono (10.1016/j.orl.2024.107107_br0330) 2015; 39
Garcıa (10.1016/j.orl.2024.107107_br0150) 2015; 16
Wang (10.1016/j.orl.2024.107107_br0450)
References_xml – year: 2017
  ident: br0230
  article-title: Markov Chains and Mixing Times, vol. 107
– year: 2018
  ident: br0430
  article-title: Reward constrained policy optimization
  publication-title: Proc. International Conference on Learning Representations (ICLR)
– year: 2020
  ident: br0270
  article-title: On the global convergence rates of softmax policy gradient methods
– year: 2019
  ident: br0500
  article-title: Projection-based constrained policy optimization
  publication-title: Proc. International Conference on Learning Representations (ICLR)
– year: 2020
  ident: br0110
  article-title: Provably efficient safe exploration via primal-dual policy optimization
– volume: 8
  start-page: 153
  year: 1992
  end-page: 175
  ident: br0320
  article-title: Information-based complexity of linear operator equations
  publication-title: J. Complex.
– year: 2019
  ident: br0200
  article-title: Lectures on Optimization Methods for Machine Learning
– year: 2019
  ident: br0260
  article-title: IPO: interior-point policy optimization under constraints
– volume: 198
  start-page: 1059
  year: 2023
  end-page: 1106
  ident: br0220
  article-title: Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes
  publication-title: Math. Program.
– year: 2019
  ident: br0470
  article-title: Maximum entropy Monte-Carlo planning
  publication-title: Proc. Advances in Neural Information Processing Systems (NeurIPS)
– year: 2019
  ident: br0450
  article-title: Neural policy gradient methods: global optimality and rates of convergence
– start-page: 1928
  year: 2016
  end-page: 1937
  ident: br0300
  article-title: Asynchronous methods for deep reinforcement learning
  publication-title: International Conference on Machine Learning
– volume: 42
  start-page: 1003
  year: 2005
  end-page: 1014
  ident: br0290
  article-title: Sensitivity and convergence of uniformly ergodic Markov chains
  publication-title: J. Appl. Probab.
– start-page: 1329
  year: 2016
  end-page: 1338
  ident: br0130
  article-title: Benchmarking deep reinforcement learning for continuous control
  publication-title: International Conference on Machine Learning
– start-page: 4890
  year: 2020
  end-page: 4900
  ident: br0180
  article-title: Efficiently solving MDPs with stochastic mirror descent
  publication-title: International Conference on Machine Learning
– volume: 3
  start-page: 241
  year: 1991
  end-page: 268
  ident: br0460
  article-title: Function optimization using connectionist reinforcement learning algorithms
  publication-title: Connect. Sci.
– year: 2017
  ident: br0310
  article-title: Bridging the gap between value and policy based reinforcement learning
  publication-title: Proc. Advances in Neural Information Processing Systems (NeurIPS)
– year: 2020
  ident: br0480
  article-title: A primal approach to constrained policy optimization: global optimality and finite-time analysis
– start-page: 22
  year: 2017
  end-page: 31
  ident: br0010
  article-title: Constrained policy optimization
  publication-title: International Conference on Machine Learning (ICML)
– year: 2018
  ident: br0410
  article-title: Reinforcement Learning: An Introduction
– year: 2020
  ident: br0060
  article-title: Fast global convergence of natural policy gradient methods with entropy regularization
– volume: 18
  start-page: 6070
  year: 2017
  end-page: 6120
  ident: br0070
  article-title: Risk-constrained reinforcement learning with percentile risk criteria
  publication-title: J. Mach. Learn. Res.
– start-page: 1889
  year: 2015
  end-page: 1897
  ident: br0380
  article-title: Trust region policy optimization
  publication-title: Proc. International Conference on Machine Learning (ICML)
– year: 2019
  ident: br0020
  article-title: Optimality and approximation with policy gradient methods in Markov decision processes
– year: 1999
  ident: br0040
  article-title: Constrained Markov Decision Processes, vol. 7
– year: 2020
  ident: br0210
  article-title: First-Order and Stochastic Optimization Methods for Machine Learning
– year: 2020
  ident: br0050
  article-title: A note on the linear convergence of policy gradient methods
– start-page: 3127
  year: 2019
  end-page: 3139
  ident: br0510
  article-title: Convergent policy optimization for safe reinforcement learning
  publication-title: Proc. Advances in Neural Information Processing Systems (NeurIPS)
– volume: 64
  start-page: 2737
  year: 2018
  end-page: 2752
  ident: br0140
  article-title: A general safety framework for learning-based control in uncertain robotic systems
  publication-title: IEEE Trans. Autom. Control
– year: 2020
  ident: br0030
  article-title: Optimality and approximation with policy gradient methods in Markov decision processes
  publication-title: Proc. Annual Conference on Learning Theory (COLT)
– year: 2020
  ident: br0390
  article-title: Adaptive trust region policy optimization: global convergence and faster rates for regularized MDPs
  publication-title: Proc. AAAI Conference on Artificial Intelligence (AAAI)
– year: 2020
  ident: br0280
  article-title: On the global convergence rates of softmax policy gradient methods
– year: 2019
  ident: br0440
  article-title: High-Dimensional Statistics: A Non-asymptotic Viewpoint, vol. 48
– year: 2020
  ident: br0490
  article-title: Improving sample complexity bounds for actor-critic algorithms
– year: 2023
  ident: br0240
  article-title: A simple uniformly optimal method without line search for convex optimization
– year: 2019
  ident: br0250
  article-title: Neural proximal/trust region policy optimization attains globally optimal policy
  publication-title: Proc. Advances in Neural Information Processing Systems (NeurIPS), Volume 32
– year: 2018
  ident: br0100
  article-title: Safe exploration in continuous action spaces
– year: 2019
  ident: br0090
  article-title: Lyapunov-based safe policy optimization for continuous control
– year: 2020
  ident: br0400
  article-title: Responsive safety in reinforcement learning by PID Lagrangian methods
  publication-title: Proc. International Conference on Machine Learning (ICML)
– start-page: 2681
  year: 2019
  end-page: 2691
  ident: br0170
  article-title: Provably efficient maximum entropy exploration
  publication-title: International Conference on Machine Learning
– start-page: 8092
  year: 2018
  end-page: 8101
  ident: br0080
  article-title: A Lyapunov-based approach to safe reinforcement learning
  publication-title: Proc. Advances in Neural Information Processing Systems (NeurIPS)
– volume: 16
  start-page: 1437
  year: 2015
  end-page: 1480
  ident: br0150
  article-title: A comprehensive survey on safe reinforcement learning
  publication-title: J. Mach. Learn. Res.
– volume: 14
  start-page: 1531
  year: 2001
  end-page: 1538
  ident: br0190
  article-title: A natural policy gradient
  publication-title: Adv. Neural Inf. Process. Syst.
– start-page: 1861
  year: 2018
  end-page: 1870
  ident: br0160
  article-title: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor
  publication-title: International Conference on Machine Learning
– year: 2019
  ident: br0360
  article-title: Constrained reinforcement learning has zero duality gap
  publication-title: Proc. Advances in Neural Information Processing Systems (NeurIPS)
– year: 2020
  ident: br0120
  article-title: Natural policy gradient primal-dual method for constrained Markov decision processes
  publication-title: Proc. Advances in Neural Information Processing Systems (NeurIPS), Volume 33
– volume: 39
  start-page: 555
  year: 2015
  end-page: 571
  ident: br0330
  article-title: Chance-constrained dynamic programming with application to risk-aware robotic space exploration
  publication-title: Auton. Robots
– start-page: 1
  year: 2019
  end-page: 35
  ident: br0340
  article-title: Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems
  publication-title: Math. Program.
– year: 2010
  ident: br0370
  article-title: Relative entropy policy search
  publication-title: Proceedings of the AAAI Conference on Artificial Intelligence, Volume 24
– ident: 10.1016/j.orl.2024.107107_br0350
– start-page: 8092
  year: 2018
  ident: 10.1016/j.orl.2024.107107_br0080
  article-title: A Lyapunov-based approach to safe reinforcement learning
– ident: 10.1016/j.orl.2024.107107_br0450
– year: 2010
  ident: 10.1016/j.orl.2024.107107_br0370
  article-title: Relative entropy policy search
– volume: 14
  start-page: 1531
  year: 2001
  ident: 10.1016/j.orl.2024.107107_br0190
  article-title: A natural policy gradient
  publication-title: Adv. Neural Inf. Process. Syst.
– year: 2017
  ident: 10.1016/j.orl.2024.107107_br0310
  article-title: Bridging the gap between value and policy based reinforcement learning
– year: 2018
  ident: 10.1016/j.orl.2024.107107_br0430
  article-title: Reward constrained policy optimization
– start-page: 1889
  year: 2015
  ident: 10.1016/j.orl.2024.107107_br0380
  article-title: Trust region policy optimization
– volume: 8
  start-page: 153
  issue: 2
  year: 1992
  ident: 10.1016/j.orl.2024.107107_br0320
  article-title: Information-based complexity of linear operator equations
  publication-title: J. Complex.
  doi: 10.1016/0885-064X(92)90013-2
– volume: 64
  start-page: 2737
  issue: 7
  year: 2018
  ident: 10.1016/j.orl.2024.107107_br0140
  article-title: A general safety framework for learning-based control in uncertain robotic systems
  publication-title: IEEE Trans. Autom. Control
  doi: 10.1109/TAC.2018.2876389
– ident: 10.1016/j.orl.2024.107107_br0100
– volume: 42
  start-page: 1003
  issue: 4
  year: 2005
  ident: 10.1016/j.orl.2024.107107_br0290
  article-title: Sensitivity and convergence of uniformly ergodic Markov chains
  publication-title: J. Appl. Probab.
  doi: 10.1239/jap/1134587812
– volume: 198
  start-page: 1059
  issue: 1
  year: 2023
  ident: 10.1016/j.orl.2024.107107_br0220
  article-title: Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes
  publication-title: Math. Program.
  doi: 10.1007/s10107-022-01816-5
– ident: 10.1016/j.orl.2024.107107_br0090
– year: 2020
  ident: 10.1016/j.orl.2024.107107_br0120
  article-title: Natural policy gradient primal-dual method for constrained Markov decision processes
– ident: 10.1016/j.orl.2024.107107_br0260
– start-page: 2681
  year: 2019
  ident: 10.1016/j.orl.2024.107107_br0170
  article-title: Provably efficient maximum entropy exploration
– year: 2020
  ident: 10.1016/j.orl.2024.107107_br0210
– ident: 10.1016/j.orl.2024.107107_br0270
– volume: 39
  start-page: 555
  issue: 4
  year: 2015
  ident: 10.1016/j.orl.2024.107107_br0330
  article-title: Chance-constrained dynamic programming with application to risk-aware robotic space exploration
  publication-title: Auton. Robots
  doi: 10.1007/s10514-015-9467-7
– year: 2019
  ident: 10.1016/j.orl.2024.107107_br0360
  article-title: Constrained reinforcement learning has zero duality gap
– year: 2018
  ident: 10.1016/j.orl.2024.107107_br0410
– ident: 10.1016/j.orl.2024.107107_br0110
– year: 2020
  ident: 10.1016/j.orl.2024.107107_br0390
  article-title: Adaptive trust region policy optimization: global convergence and faster rates for regularized MDPs
– year: 1999
  ident: 10.1016/j.orl.2024.107107_br0040
– start-page: 8665
  year: 2019
  ident: 10.1016/j.orl.2024.107107_br0530
  article-title: Finite-sample analysis for SARSA with linear function approximation
– start-page: 22
  year: 2017
  ident: 10.1016/j.orl.2024.107107_br0010
  article-title: Constrained policy optimization
– year: 2019
  ident: 10.1016/j.orl.2024.107107_br0500
  article-title: Projection-based constrained policy optimization
– start-page: 1928
  year: 2016
  ident: 10.1016/j.orl.2024.107107_br0300
  article-title: Asynchronous methods for deep reinforcement learning
– year: 2017
  ident: 10.1016/j.orl.2024.107107_br0230
– ident: 10.1016/j.orl.2024.107107_br0050
– start-page: 4890
  year: 2020
  ident: 10.1016/j.orl.2024.107107_br0180
  article-title: Efficiently solving MDPs with stochastic mirror descent
– year: 2020
  ident: 10.1016/j.orl.2024.107107_br0030
  article-title: Optimality and approximation with policy gradient methods in Markov decision processes
– year: 2019
  ident: 10.1016/j.orl.2024.107107_br0200
– start-page: 1861
  year: 2018
  ident: 10.1016/j.orl.2024.107107_br0160
  article-title: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor
– ident: 10.1016/j.orl.2024.107107_br0020
– year: 2020
  ident: 10.1016/j.orl.2024.107107_br0400
  article-title: Responsive safety in reinforcement learning by PID Lagrangian methods
– year: 2019
  ident: 10.1016/j.orl.2024.107107_br0440
– year: 2019
  ident: 10.1016/j.orl.2024.107107_br0250
  article-title: Neural proximal/trust region policy optimization attains globally optimal policy
– volume: 18
  start-page: 6070
  issue: 1
  year: 2017
  ident: 10.1016/j.orl.2024.107107_br0070
  article-title: Risk-constrained reinforcement learning with percentile risk criteria
  publication-title: J. Mach. Learn. Res.
– ident: 10.1016/j.orl.2024.107107_br0240
– start-page: 1
  year: 2019
  ident: 10.1016/j.orl.2024.107107_br0340
  article-title: Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems
  publication-title: Math. Program.
– volume: 3
  start-page: 241
  issue: 3
  year: 1991
  ident: 10.1016/j.orl.2024.107107_br0460
  article-title: Function optimization using connectionist reinforcement learning algorithms
  publication-title: Connect. Sci.
  doi: 10.1080/09540099108946587
– ident: 10.1016/j.orl.2024.107107_br0420
– start-page: 3127
  year: 2019
  ident: 10.1016/j.orl.2024.107107_br0510
  article-title: Convergent policy optimization for safe reinforcement learning
– year: 2019
  ident: 10.1016/j.orl.2024.107107_br0470
  article-title: Maximum entropy Monte-Carlo planning
– start-page: 1329
  year: 2016
  ident: 10.1016/j.orl.2024.107107_br0130
  article-title: Benchmarking deep reinforcement learning for continuous control
– ident: 10.1016/j.orl.2024.107107_br0480
– ident: 10.1016/j.orl.2024.107107_br0280
– volume: 16
  start-page: 1437
  issue: 1
  year: 2015
  ident: 10.1016/j.orl.2024.107107_br0150
  article-title: A comprehensive survey on safe reinforcement learning
  publication-title: J. Mach. Learn. Res.
– ident: 10.1016/j.orl.2024.107107_br0060
– ident: 10.1016/j.orl.2024.107107_br0490
SSID ssj0007818
Score 2.4794602
Snippet The problem of constrained Markov decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated reward subject to...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 107107
SubjectTerms Accelerated gradient method
Constrained Markov decision process
Entropy regularization
Policy optimization
Primal-dual algorithm
Title Faster algorithm and sharper analysis for constrained Markov decision process
URI https://dx.doi.org/10.1016/j.orl.2024.107107
Volume 54
WOSCitedRecordID wos001218051000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: ScienceDirect Freedom Collection - Elsevier
  customDbUrl:
  eissn: 1872-7468
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0007818
  issn: 0167-6377
  databaseCode: AIEXJ
  dateStart: 19950201
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwELag5QAH1BYQ5VH5wAFYpcrDG8fHCrVqERQOi7TiEjm2w2a1Slb76OPfM44n3lC1iB64RKtRMrF2vow_j8czhLxLWVroiJUBfGhlwArGAiGjKIiTUmgZF0oZ3Tab4Ofn2XgsvuNG-7JtJ8DrOru6EvP_amqQgbHt0dl7mNsrBQH8BqPDFcwO138y_Im0tQ8GcvargYX_xLXAWE7sPovNl8QaJDa7UFluaFtEGN2e2WkuBhpb7gzm7gBBn7t-m5sFJs5hiaDJYNaeBvK8_Evl7C_raSUbn9uzdlHWn9WlqXykulm3kdeJbEqD0yeIx610BJLrdT8iEbNN_l8XpATnmybYngW97JD13GRkiQ2_1YO7YML0sFnYjaGYHW7u_bNa9o1ZzOcWdmlr0xxU5FZF7lQ8JNsxHwrw3ttHZ8fjz37C5lkbBvbD7ja_2zTAG-O4nb70KMlohzzFtQQ9chjYJQ9MvUee9CpM7pFd9N1L-h4LjH94Rr46iFAPEQoQoQgR2kGEAkRoDyLUQYR2EKEIkefkx8nx6NNpgG01AgXeexUUSaZDLVUJTE1kqQxNAotQnQ4LnZZcKA1rdAXEW4WJKlRpj9TJMuRxWGgpwoInL8hW3dTmJaEJF5ng0kQKmGCWmowNYcUZM25YaEDjPgm7fytXWHPeDnqW32mlffLRPzJ3BVf-djPrTJAjY3RMMAc43f3Yq_u84zV5vAH5G7K1WqzNW_JIXayq5eIAsfQbEG-MXw
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Faster+algorithm+and+sharper+analysis+for+constrained+Markov+decision+process&rft.jtitle=Operations+research+letters&rft.au=Li%2C+Tianjiao&rft.au=Guan%2C+Ziwei&rft.au=Zou%2C+Shaofeng&rft.au=Xu%2C+Tengyu&rft.date=2024-05-01&rft.issn=0167-6377&rft.volume=54&rft.spage=107107&rft_id=info:doi/10.1016%2Fj.orl.2024.107107&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_orl_2024_107107
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0167-6377&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0167-6377&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0167-6377&client=summon