Faster algorithm and sharper analysis for constrained Markov decision process
The problem of constrained Markov decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated reward subject to constraints on its utilities/costs. We propose a new primal-dual approach with a novel integration of entropy regularization and Nesterov's accel...
Gespeichert in:
| Veröffentlicht in: | Operations research letters Jg. 54; S. 107107 |
|---|---|
| Hauptverfasser: | , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Elsevier B.V
01.05.2024
|
| Schlagworte: | |
| ISSN: | 0167-6377, 1872-7468 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | The problem of constrained Markov decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated reward subject to constraints on its utilities/costs. We propose a new primal-dual approach with a novel integration of entropy regularization and Nesterov's accelerated gradient method. The proposed approach is shown to converge to the global optimum with a complexity of O˜(1/ϵ) in terms of the optimality gap and the constraint violation, which improves the complexity of the existing primal-dual approaches by a factor of O(1/ϵ). |
|---|---|
| AbstractList | The problem of constrained Markov decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated reward subject to constraints on its utilities/costs. We propose a new primal-dual approach with a novel integration of entropy regularization and Nesterov's accelerated gradient method. The proposed approach is shown to converge to the global optimum with a complexity of O˜(1/ϵ) in terms of the optimality gap and the constraint violation, which improves the complexity of the existing primal-dual approaches by a factor of O(1/ϵ). |
| ArticleNumber | 107107 |
| Author | Liang, Yingbin Zou, Shaofeng Lan, Guanghui Guan, Ziwei Xu, Tengyu Li, Tianjiao |
| Author_xml | – sequence: 1 givenname: Tianjiao orcidid: 0000-0001-6660-0883 surname: Li fullname: Li, Tianjiao email: tli432@gatech.edu organization: Georgia Institute of Technology, United States of America – sequence: 2 givenname: Ziwei surname: Guan fullname: Guan, Ziwei organization: The Ohio State University, United States of America – sequence: 3 givenname: Shaofeng surname: Zou fullname: Zou, Shaofeng organization: University at Buffalo, The State University of New York, United States of America – sequence: 4 givenname: Tengyu surname: Xu fullname: Xu, Tengyu organization: Meta, United States of America – sequence: 5 givenname: Yingbin surname: Liang fullname: Liang, Yingbin organization: The Ohio State University, United States of America – sequence: 6 givenname: Guanghui surname: Lan fullname: Lan, Guanghui organization: Georgia Institute of Technology, United States of America |
| BookMark | eNp9kMFKAzEQQINUsFY_wFt-YGuy2Sa7eJJiVWjxoueQnSQ2dZuUJBT692apJw-FgWGGecPMu0UTH7xB6IGSOSWUP-7mIQ7zmtRNqUWJKzSlragr0fB2gqZlRlScCXGDblPaEUJES9sp2qxUyiZiNXyH6PJ2j5XXOG1VPIxdr4ZTcgnbEDEEn3JUzhuNNyr-hCPWBlxyweNDDGBSukPXVg3J3P_lGfpavXwu36r1x-v78nldAavrXPWs1UQrsO2CdS1XxDDe1Joves2t6EBzQqFbUCAMerANY1RZImrSa9WRXrAZEue9EENK0VgJLqtcLhkPHCQlcrQid7JYkaMVebZSSPqPPES3V_F0kXk6M6a8dHQmygTOeDDaRQNZ6uAu0L8_5332 |
| CitedBy_id | crossref_primary_10_1109_TPAMI_2024_3457538 crossref_primary_10_1109_TAC_2024_3523847 crossref_primary_10_3390_rs16122072 crossref_primary_10_1016_j_ejor_2025_08_038 |
| Cites_doi | 10.1016/0885-064X(92)90013-2 10.1109/TAC.2018.2876389 10.1239/jap/1134587812 10.1007/s10107-022-01816-5 10.1007/s10514-015-9467-7 10.1080/09540099108946587 |
| ContentType | Journal Article |
| Copyright | 2024 Elsevier B.V. |
| Copyright_xml | – notice: 2024 Elsevier B.V. |
| DBID | AAYXX CITATION |
| DOI | 10.1016/j.orl.2024.107107 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Sciences (General) |
| EISSN | 1872-7468 |
| ExternalDocumentID | 10_1016_j_orl_2024_107107 S0167637724000439 |
| GroupedDBID | --K --M -~X .DC .~1 0R~ 123 1B1 1OL 1RT 1~. 1~5 29N 4.4 457 4G. 4R4 5VS 7-5 71M 8P~ 9JN 9JO AAAKF AAAKG AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AARIN AAXUO ABAOU ABFNM ABJNI ABMAC ABUCO ABXDB ACDAQ ACGFS ACNCT ACRLP ADBBV ADEZE ADGUI ADIYS ADMBK ADMUD AEBSH AEKER AENEX AFFNX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AIEXJ AIGVJ AIKHN AITUG AJOXV AKRWK ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ APLSM ARUGR ASPBG AVWKF AXJTR AZFZN BKOJK BLXMC CS3 DU5 EBS EFJIC EJD EO8 EO9 EP2 EP3 FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q GBLVA HAMUX HVGLF HZ~ IHE J1W KOM LY1 M41 MHUIS MO0 MS~ N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SDF SDG SDS SES SEW SPC SPCBC SSB SSD SSW SSZ T5K TN5 WH7 WUQ XPP XSW ~G- 9DU AATTM AAXKI AAYWO AAYXX ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKYEP ANKPU APXCP CITATION EFKBS EFLBG ~HD |
| ID | FETCH-LOGICAL-c322t-b38d0dacf853986a0e3642d65bd6f79cd601c951c03cbcf4331af0720bda90b73 |
| ISICitedReferencesCount | 8 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001218051000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0167-6377 |
| IngestDate | Sat Nov 29 03:00:02 EST 2025 Tue Nov 18 21:14:53 EST 2025 Sat Jun 01 15:41:28 EDT 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Primal-dual algorithm Constrained Markov decision process Accelerated gradient method Entropy regularization Policy optimization |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c322t-b38d0dacf853986a0e3642d65bd6f79cd601c951c03cbcf4331af0720bda90b73 |
| ORCID | 0000-0001-6660-0883 |
| ParticipantIDs | crossref_citationtrail_10_1016_j_orl_2024_107107 crossref_primary_10_1016_j_orl_2024_107107 elsevier_sciencedirect_doi_10_1016_j_orl_2024_107107 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-05-01 |
| PublicationDateYYYYMMDD | 2024-05-01 |
| PublicationDate_xml | – month: 05 year: 2024 text: 2024-05-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationTitle | Operations research letters |
| PublicationYear | 2024 |
| Publisher | Elsevier B.V |
| Publisher_xml | – name: Elsevier B.V |
| References | Duan, Chen, Houthooft, Schulman, Abbeel (br0130) 2016 Wang, Cai, Yang, Wang (br0450) 2019 Mei, Xiao, Szepesvari, Schuurmans (br0270) 2020 Mei, Xiao, Szepesvari, Schuurmans (br0280) 2020 Bhandari, Russo (br0050) 2020 Xiao, Huang, Mei, Schuurmans, Müller (br0470) 2019 Haarnoja, Zhou, Abbeel, Levine (br0160) 2018 Williams, Peng (br0460) 1991; 3 Tessler, Mankowitz, Mannor (br0430) 2018 Kakade (br0190) 2001; 14 Sutton, Barto (br0410) 2018 Garcıa, Fernández (br0150) 2015; 16 Cen, Cheng, Chen, Wei, Chi (br0060) 2020 Agarwal, Kakade, Lee, Mahajan (br0020) 2019 Liu, Cai, Yang, Wang (br0250) 2019 Shani, Efroni, Mannor (br0390) 2020 Ding, Zhang, Basar, Jovanovic (br0120) 2020 Mitrophanov (br0290) 2005; 42 Mnih, Badia, Mirza, Graves, Lillicrap, Harley, Silver, Kavukcuoglu (br0300) 2016 Levin, Peres (br0230) 2017 Ding, Wei, Yang, Wang, Jovanović (br0110) 2020 Nemirovsky (br0320) 1992; 8 Hazan, Kakade, Singh, Van Soest (br0170) 2019 Lan (br0210) 2020 Agarwal, Kakade, Lee, Mahajan (br0030) 2020 Chow, Nachum, Faust, Duenez-Guzman, Ghavamzadeh (br0090) 2019 Li, Lan (br0240) 2023 Schulman, Levine, Abbeel, Jordan, Moritz (br0380) 2015 Lan (br0200) 2019 Ono, Pavone, Kuwata, Balaram (br0330) 2015; 39 Xu, Liang, Lan (br0480) 2020 Stooke, Achiam, Abbeel (br0400) 2020 Achiam, Held, Tamar, Abbeel (br0010) 2017 Jin, Sidford (br0180) 2020 Ouyang, Xu (br0340) 2019 Chow, Nachum, Duenez-Guzman, Ghavamzadeh (br0080) 2018 Nachum, Norouzi, Xu, Schuurmans (br0310) 2017 Dalal, Dvijotham, Vecerik, Hester, Paduraru, Tassa (br0100) 2018 Wainwright (br0440) 2019 Chow, Ghavamzadeh, Janson, Pavone (br0070) 2017; 18 Xu, Wang, Liang (br0490) 2020 Altman (br0040) 1999 Paternain, Chamon, Calvo-Fullana, Ribeiro (br0360) 2019 Lan (br0220) 2023; 198 Fisac, Akametalu, Zeilinger, Kaynama, Gillula, Tomlin (br0140) 2018; 64 Liu, Ding, Liu (br0260) 2019 Peters, Mulling, Altun (br0370) 2010 Yang, Rosca, Narasimhan, Ramadge (br0500) 2019 Yu, Yang, Kolar, Wang (br0510) 2019 Yang (10.1016/j.orl.2024.107107_br0500) 2019 Cen (10.1016/j.orl.2024.107107_br0060) Dalal (10.1016/j.orl.2024.107107_br0100) Ouyang (10.1016/j.orl.2024.107107_br0340) 2019 Wainwright (10.1016/j.orl.2024.107107_br0440) 2019 Paternain (10.1016/j.orl.2024.107107_br0360) 2019 Achiam (10.1016/j.orl.2024.107107_br0010) 2017 Chow (10.1016/j.orl.2024.107107_br0070) 2017; 18 Sutton (10.1016/j.orl.2024.107107_br0410) 2018 Altman (10.1016/j.orl.2024.107107_br0040) 1999 Li (10.1016/j.orl.2024.107107_br0240) Fisac (10.1016/j.orl.2024.107107_br0140) 2018; 64 Tessler (10.1016/j.orl.2024.107107_br0430) 2018 Bhandari (10.1016/j.orl.2024.107107_br0050) Szepesvári (10.1016/j.orl.2024.107107_br0420) Liu (10.1016/j.orl.2024.107107_br0250) 2019 Schulman (10.1016/j.orl.2024.107107_br0380) 2015 Chow (10.1016/j.orl.2024.107107_br0080) 2018 Zou (10.1016/j.orl.2024.107107_br0530) 2019 Xu (10.1016/j.orl.2024.107107_br0490) Agarwal (10.1016/j.orl.2024.107107_br0030) 2020 Ding (10.1016/j.orl.2024.107107_br0120) 2020 Nemirovsky (10.1016/j.orl.2024.107107_br0320) 1992; 8 Agarwal (10.1016/j.orl.2024.107107_br0020) Mei (10.1016/j.orl.2024.107107_br0270) Duan (10.1016/j.orl.2024.107107_br0130) 2016 Peters (10.1016/j.orl.2024.107107_br0370) 2010 Jin (10.1016/j.orl.2024.107107_br0180) 2020 Paternain (10.1016/j.orl.2024.107107_br0350) Lan (10.1016/j.orl.2024.107107_br0210) 2020 Shani (10.1016/j.orl.2024.107107_br0390) 2020 Kakade (10.1016/j.orl.2024.107107_br0190) 2001; 14 Lan (10.1016/j.orl.2024.107107_br0200) 2019 Nachum (10.1016/j.orl.2024.107107_br0310) 2017 Hazan (10.1016/j.orl.2024.107107_br0170) 2019 Levin (10.1016/j.orl.2024.107107_br0230) 2017 Liu (10.1016/j.orl.2024.107107_br0260) Stooke (10.1016/j.orl.2024.107107_br0400) 2020 Mei (10.1016/j.orl.2024.107107_br0280) Mitrophanov (10.1016/j.orl.2024.107107_br0290) 2005; 42 Mnih (10.1016/j.orl.2024.107107_br0300) 2016 Yu (10.1016/j.orl.2024.107107_br0510) 2019 Xu (10.1016/j.orl.2024.107107_br0480) Lan (10.1016/j.orl.2024.107107_br0220) 2023; 198 Haarnoja (10.1016/j.orl.2024.107107_br0160) 2018 Xiao (10.1016/j.orl.2024.107107_br0470) 2019 Chow (10.1016/j.orl.2024.107107_br0090) Ding (10.1016/j.orl.2024.107107_br0110) Williams (10.1016/j.orl.2024.107107_br0460) 1991; 3 Ono (10.1016/j.orl.2024.107107_br0330) 2015; 39 Garcıa (10.1016/j.orl.2024.107107_br0150) 2015; 16 Wang (10.1016/j.orl.2024.107107_br0450) |
| References_xml | – year: 2017 ident: br0230 article-title: Markov Chains and Mixing Times, vol. 107 – year: 2018 ident: br0430 article-title: Reward constrained policy optimization publication-title: Proc. International Conference on Learning Representations (ICLR) – year: 2020 ident: br0270 article-title: On the global convergence rates of softmax policy gradient methods – year: 2019 ident: br0500 article-title: Projection-based constrained policy optimization publication-title: Proc. International Conference on Learning Representations (ICLR) – year: 2020 ident: br0110 article-title: Provably efficient safe exploration via primal-dual policy optimization – volume: 8 start-page: 153 year: 1992 end-page: 175 ident: br0320 article-title: Information-based complexity of linear operator equations publication-title: J. Complex. – year: 2019 ident: br0200 article-title: Lectures on Optimization Methods for Machine Learning – year: 2019 ident: br0260 article-title: IPO: interior-point policy optimization under constraints – volume: 198 start-page: 1059 year: 2023 end-page: 1106 ident: br0220 article-title: Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes publication-title: Math. Program. – year: 2019 ident: br0470 article-title: Maximum entropy Monte-Carlo planning publication-title: Proc. Advances in Neural Information Processing Systems (NeurIPS) – year: 2019 ident: br0450 article-title: Neural policy gradient methods: global optimality and rates of convergence – start-page: 1928 year: 2016 end-page: 1937 ident: br0300 article-title: Asynchronous methods for deep reinforcement learning publication-title: International Conference on Machine Learning – volume: 42 start-page: 1003 year: 2005 end-page: 1014 ident: br0290 article-title: Sensitivity and convergence of uniformly ergodic Markov chains publication-title: J. Appl. Probab. – start-page: 1329 year: 2016 end-page: 1338 ident: br0130 article-title: Benchmarking deep reinforcement learning for continuous control publication-title: International Conference on Machine Learning – start-page: 4890 year: 2020 end-page: 4900 ident: br0180 article-title: Efficiently solving MDPs with stochastic mirror descent publication-title: International Conference on Machine Learning – volume: 3 start-page: 241 year: 1991 end-page: 268 ident: br0460 article-title: Function optimization using connectionist reinforcement learning algorithms publication-title: Connect. Sci. – year: 2017 ident: br0310 article-title: Bridging the gap between value and policy based reinforcement learning publication-title: Proc. Advances in Neural Information Processing Systems (NeurIPS) – year: 2020 ident: br0480 article-title: A primal approach to constrained policy optimization: global optimality and finite-time analysis – start-page: 22 year: 2017 end-page: 31 ident: br0010 article-title: Constrained policy optimization publication-title: International Conference on Machine Learning (ICML) – year: 2018 ident: br0410 article-title: Reinforcement Learning: An Introduction – year: 2020 ident: br0060 article-title: Fast global convergence of natural policy gradient methods with entropy regularization – volume: 18 start-page: 6070 year: 2017 end-page: 6120 ident: br0070 article-title: Risk-constrained reinforcement learning with percentile risk criteria publication-title: J. Mach. Learn. Res. – start-page: 1889 year: 2015 end-page: 1897 ident: br0380 article-title: Trust region policy optimization publication-title: Proc. International Conference on Machine Learning (ICML) – year: 2019 ident: br0020 article-title: Optimality and approximation with policy gradient methods in Markov decision processes – year: 1999 ident: br0040 article-title: Constrained Markov Decision Processes, vol. 7 – year: 2020 ident: br0210 article-title: First-Order and Stochastic Optimization Methods for Machine Learning – year: 2020 ident: br0050 article-title: A note on the linear convergence of policy gradient methods – start-page: 3127 year: 2019 end-page: 3139 ident: br0510 article-title: Convergent policy optimization for safe reinforcement learning publication-title: Proc. Advances in Neural Information Processing Systems (NeurIPS) – volume: 64 start-page: 2737 year: 2018 end-page: 2752 ident: br0140 article-title: A general safety framework for learning-based control in uncertain robotic systems publication-title: IEEE Trans. Autom. Control – year: 2020 ident: br0030 article-title: Optimality and approximation with policy gradient methods in Markov decision processes publication-title: Proc. Annual Conference on Learning Theory (COLT) – year: 2020 ident: br0390 article-title: Adaptive trust region policy optimization: global convergence and faster rates for regularized MDPs publication-title: Proc. AAAI Conference on Artificial Intelligence (AAAI) – year: 2020 ident: br0280 article-title: On the global convergence rates of softmax policy gradient methods – year: 2019 ident: br0440 article-title: High-Dimensional Statistics: A Non-asymptotic Viewpoint, vol. 48 – year: 2020 ident: br0490 article-title: Improving sample complexity bounds for actor-critic algorithms – year: 2023 ident: br0240 article-title: A simple uniformly optimal method without line search for convex optimization – year: 2019 ident: br0250 article-title: Neural proximal/trust region policy optimization attains globally optimal policy publication-title: Proc. Advances in Neural Information Processing Systems (NeurIPS), Volume 32 – year: 2018 ident: br0100 article-title: Safe exploration in continuous action spaces – year: 2019 ident: br0090 article-title: Lyapunov-based safe policy optimization for continuous control – year: 2020 ident: br0400 article-title: Responsive safety in reinforcement learning by PID Lagrangian methods publication-title: Proc. International Conference on Machine Learning (ICML) – start-page: 2681 year: 2019 end-page: 2691 ident: br0170 article-title: Provably efficient maximum entropy exploration publication-title: International Conference on Machine Learning – start-page: 8092 year: 2018 end-page: 8101 ident: br0080 article-title: A Lyapunov-based approach to safe reinforcement learning publication-title: Proc. Advances in Neural Information Processing Systems (NeurIPS) – volume: 16 start-page: 1437 year: 2015 end-page: 1480 ident: br0150 article-title: A comprehensive survey on safe reinforcement learning publication-title: J. Mach. Learn. Res. – volume: 14 start-page: 1531 year: 2001 end-page: 1538 ident: br0190 article-title: A natural policy gradient publication-title: Adv. Neural Inf. Process. Syst. – start-page: 1861 year: 2018 end-page: 1870 ident: br0160 article-title: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor publication-title: International Conference on Machine Learning – year: 2019 ident: br0360 article-title: Constrained reinforcement learning has zero duality gap publication-title: Proc. Advances in Neural Information Processing Systems (NeurIPS) – year: 2020 ident: br0120 article-title: Natural policy gradient primal-dual method for constrained Markov decision processes publication-title: Proc. Advances in Neural Information Processing Systems (NeurIPS), Volume 33 – volume: 39 start-page: 555 year: 2015 end-page: 571 ident: br0330 article-title: Chance-constrained dynamic programming with application to risk-aware robotic space exploration publication-title: Auton. Robots – start-page: 1 year: 2019 end-page: 35 ident: br0340 article-title: Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems publication-title: Math. Program. – year: 2010 ident: br0370 article-title: Relative entropy policy search publication-title: Proceedings of the AAAI Conference on Artificial Intelligence, Volume 24 – ident: 10.1016/j.orl.2024.107107_br0350 – start-page: 8092 year: 2018 ident: 10.1016/j.orl.2024.107107_br0080 article-title: A Lyapunov-based approach to safe reinforcement learning – ident: 10.1016/j.orl.2024.107107_br0450 – year: 2010 ident: 10.1016/j.orl.2024.107107_br0370 article-title: Relative entropy policy search – volume: 14 start-page: 1531 year: 2001 ident: 10.1016/j.orl.2024.107107_br0190 article-title: A natural policy gradient publication-title: Adv. Neural Inf. Process. Syst. – year: 2017 ident: 10.1016/j.orl.2024.107107_br0310 article-title: Bridging the gap between value and policy based reinforcement learning – year: 2018 ident: 10.1016/j.orl.2024.107107_br0430 article-title: Reward constrained policy optimization – start-page: 1889 year: 2015 ident: 10.1016/j.orl.2024.107107_br0380 article-title: Trust region policy optimization – volume: 8 start-page: 153 issue: 2 year: 1992 ident: 10.1016/j.orl.2024.107107_br0320 article-title: Information-based complexity of linear operator equations publication-title: J. Complex. doi: 10.1016/0885-064X(92)90013-2 – volume: 64 start-page: 2737 issue: 7 year: 2018 ident: 10.1016/j.orl.2024.107107_br0140 article-title: A general safety framework for learning-based control in uncertain robotic systems publication-title: IEEE Trans. Autom. Control doi: 10.1109/TAC.2018.2876389 – ident: 10.1016/j.orl.2024.107107_br0100 – volume: 42 start-page: 1003 issue: 4 year: 2005 ident: 10.1016/j.orl.2024.107107_br0290 article-title: Sensitivity and convergence of uniformly ergodic Markov chains publication-title: J. Appl. Probab. doi: 10.1239/jap/1134587812 – volume: 198 start-page: 1059 issue: 1 year: 2023 ident: 10.1016/j.orl.2024.107107_br0220 article-title: Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes publication-title: Math. Program. doi: 10.1007/s10107-022-01816-5 – ident: 10.1016/j.orl.2024.107107_br0090 – year: 2020 ident: 10.1016/j.orl.2024.107107_br0120 article-title: Natural policy gradient primal-dual method for constrained Markov decision processes – ident: 10.1016/j.orl.2024.107107_br0260 – start-page: 2681 year: 2019 ident: 10.1016/j.orl.2024.107107_br0170 article-title: Provably efficient maximum entropy exploration – year: 2020 ident: 10.1016/j.orl.2024.107107_br0210 – ident: 10.1016/j.orl.2024.107107_br0270 – volume: 39 start-page: 555 issue: 4 year: 2015 ident: 10.1016/j.orl.2024.107107_br0330 article-title: Chance-constrained dynamic programming with application to risk-aware robotic space exploration publication-title: Auton. Robots doi: 10.1007/s10514-015-9467-7 – year: 2019 ident: 10.1016/j.orl.2024.107107_br0360 article-title: Constrained reinforcement learning has zero duality gap – year: 2018 ident: 10.1016/j.orl.2024.107107_br0410 – ident: 10.1016/j.orl.2024.107107_br0110 – year: 2020 ident: 10.1016/j.orl.2024.107107_br0390 article-title: Adaptive trust region policy optimization: global convergence and faster rates for regularized MDPs – year: 1999 ident: 10.1016/j.orl.2024.107107_br0040 – start-page: 8665 year: 2019 ident: 10.1016/j.orl.2024.107107_br0530 article-title: Finite-sample analysis for SARSA with linear function approximation – start-page: 22 year: 2017 ident: 10.1016/j.orl.2024.107107_br0010 article-title: Constrained policy optimization – year: 2019 ident: 10.1016/j.orl.2024.107107_br0500 article-title: Projection-based constrained policy optimization – start-page: 1928 year: 2016 ident: 10.1016/j.orl.2024.107107_br0300 article-title: Asynchronous methods for deep reinforcement learning – year: 2017 ident: 10.1016/j.orl.2024.107107_br0230 – ident: 10.1016/j.orl.2024.107107_br0050 – start-page: 4890 year: 2020 ident: 10.1016/j.orl.2024.107107_br0180 article-title: Efficiently solving MDPs with stochastic mirror descent – year: 2020 ident: 10.1016/j.orl.2024.107107_br0030 article-title: Optimality and approximation with policy gradient methods in Markov decision processes – year: 2019 ident: 10.1016/j.orl.2024.107107_br0200 – start-page: 1861 year: 2018 ident: 10.1016/j.orl.2024.107107_br0160 article-title: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor – ident: 10.1016/j.orl.2024.107107_br0020 – year: 2020 ident: 10.1016/j.orl.2024.107107_br0400 article-title: Responsive safety in reinforcement learning by PID Lagrangian methods – year: 2019 ident: 10.1016/j.orl.2024.107107_br0440 – year: 2019 ident: 10.1016/j.orl.2024.107107_br0250 article-title: Neural proximal/trust region policy optimization attains globally optimal policy – volume: 18 start-page: 6070 issue: 1 year: 2017 ident: 10.1016/j.orl.2024.107107_br0070 article-title: Risk-constrained reinforcement learning with percentile risk criteria publication-title: J. Mach. Learn. Res. – ident: 10.1016/j.orl.2024.107107_br0240 – start-page: 1 year: 2019 ident: 10.1016/j.orl.2024.107107_br0340 article-title: Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems publication-title: Math. Program. – volume: 3 start-page: 241 issue: 3 year: 1991 ident: 10.1016/j.orl.2024.107107_br0460 article-title: Function optimization using connectionist reinforcement learning algorithms publication-title: Connect. Sci. doi: 10.1080/09540099108946587 – ident: 10.1016/j.orl.2024.107107_br0420 – start-page: 3127 year: 2019 ident: 10.1016/j.orl.2024.107107_br0510 article-title: Convergent policy optimization for safe reinforcement learning – year: 2019 ident: 10.1016/j.orl.2024.107107_br0470 article-title: Maximum entropy Monte-Carlo planning – start-page: 1329 year: 2016 ident: 10.1016/j.orl.2024.107107_br0130 article-title: Benchmarking deep reinforcement learning for continuous control – ident: 10.1016/j.orl.2024.107107_br0480 – ident: 10.1016/j.orl.2024.107107_br0280 – volume: 16 start-page: 1437 issue: 1 year: 2015 ident: 10.1016/j.orl.2024.107107_br0150 article-title: A comprehensive survey on safe reinforcement learning publication-title: J. Mach. Learn. Res. – ident: 10.1016/j.orl.2024.107107_br0060 – ident: 10.1016/j.orl.2024.107107_br0490 |
| SSID | ssj0007818 |
| Score | 2.4794602 |
| Snippet | The problem of constrained Markov decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated reward subject to... |
| SourceID | crossref elsevier |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 107107 |
| SubjectTerms | Accelerated gradient method Constrained Markov decision process Entropy regularization Policy optimization Primal-dual algorithm |
| Title | Faster algorithm and sharper analysis for constrained Markov decision process |
| URI | https://dx.doi.org/10.1016/j.orl.2024.107107 |
| Volume | 54 |
| WOSCitedRecordID | wos001218051000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: ScienceDirect Freedom Collection - Elsevier customDbUrl: eissn: 1872-7468 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0007818 issn: 0167-6377 databaseCode: AIEXJ dateStart: 19950201 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwELag5QAH1BYQ5VH5wAFYpcrDG8fHCrVqERQOi7TiEjm2w2a1Slb76OPfM44n3lC1iB64RKtRMrF2vow_j8czhLxLWVroiJUBfGhlwArGAiGjKIiTUmgZF0oZ3Tab4Ofn2XgsvuNG-7JtJ8DrOru6EvP_amqQgbHt0dl7mNsrBQH8BqPDFcwO138y_Im0tQ8GcvargYX_xLXAWE7sPovNl8QaJDa7UFluaFtEGN2e2WkuBhpb7gzm7gBBn7t-m5sFJs5hiaDJYNaeBvK8_Evl7C_raSUbn9uzdlHWn9WlqXykulm3kdeJbEqD0yeIx610BJLrdT8iEbNN_l8XpATnmybYngW97JD13GRkiQ2_1YO7YML0sFnYjaGYHW7u_bNa9o1ZzOcWdmlr0xxU5FZF7lQ8JNsxHwrw3ttHZ8fjz37C5lkbBvbD7ja_2zTAG-O4nb70KMlohzzFtQQ9chjYJQ9MvUee9CpM7pFd9N1L-h4LjH94Rr46iFAPEQoQoQgR2kGEAkRoDyLUQYR2EKEIkefkx8nx6NNpgG01AgXeexUUSaZDLVUJTE1kqQxNAotQnQ4LnZZcKA1rdAXEW4WJKlRpj9TJMuRxWGgpwoInL8hW3dTmJaEJF5ng0kQKmGCWmowNYcUZM25YaEDjPgm7fytXWHPeDnqW32mlffLRPzJ3BVf-djPrTJAjY3RMMAc43f3Yq_u84zV5vAH5G7K1WqzNW_JIXayq5eIAsfQbEG-MXw |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Faster+algorithm+and+sharper+analysis+for+constrained+Markov+decision+process&rft.jtitle=Operations+research+letters&rft.au=Li%2C+Tianjiao&rft.au=Guan%2C+Ziwei&rft.au=Zou%2C+Shaofeng&rft.au=Xu%2C+Tengyu&rft.date=2024-05-01&rft.issn=0167-6377&rft.volume=54&rft.spage=107107&rft_id=info:doi/10.1016%2Fj.orl.2024.107107&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_orl_2024_107107 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0167-6377&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0167-6377&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0167-6377&client=summon |