An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes
We develop an online actor–critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable po...
Uložené v:
| Vydané v: | Journal of optimization theory and applications Ročník 153; číslo 3; s. 688 - 708 |
|---|---|
| Hlavní autori: | , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Boston
Springer US
01.06.2012
Springer Nature B.V |
| Predmet: | |
| ISSN: | 0022-3239, 1573-2878 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | We develop an online actor–critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point. |
|---|---|
| AbstractList | We develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.[PUBLICATION ABSTRACT] We develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point. |
| Author | Lakshmanan, K. Bhatnagar, Shalabh |
| Author_xml | – sequence: 1 givenname: Shalabh surname: Bhatnagar fullname: Bhatnagar, Shalabh email: shalabh@csa.iisc.ernet.in organization: Department of Computer Science and Automation, Indian Institute of Science – sequence: 2 givenname: K. surname: Lakshmanan fullname: Lakshmanan, K. organization: Department of Computer Science and Automation, Indian Institute of Science |
| BookMark | eNp9kU1OwzAQhS0EEm3hAOwssWETGNtJHC-jQgGpqCxgHSWO3aakdrFTfnbcgRtyEtyWBaoEG4-t-d7ojV8f7RtrFEInBM4JAL_wBETCIyA0EiITUbKHeiThLKIZz_ZRD4DSiFEmDlHf-zkAiIzHPTTLDZ6YtjEK57Kz7uvjc-iarpE4b6c23GYL_BpOPFoZ2TXW4Hy5dPatWZSbl7YOD63xnSvDjBrfle7JvuBLJRu_7t87K5X3yh-hA122Xh3_1AF6HF09DG-i8eT6dpiPI8li0UUxT6uEpImWkHCuYyEZ8JqWTGteCq6qtOZ1xbkQSmtdVRSkinUNjNUshVqxATrbzg0un1fKd8Wi8VK1bWmUXfmCACMMIGUsoKc76NyunAnuAkVYzEmS0UDxLSWd9d4pXcim2yy_3rkNaLFOoNgmUIQEinUCRRKUZEe5dOHf3Pu_GrrV-MCaqXK_Pf0l-gZF7Zyl |
| CitedBy_id | crossref_primary_10_1007_s10957_020_01652_7 crossref_primary_10_3390_en18112959 crossref_primary_10_1016_j_rser_2025_116022 crossref_primary_10_1109_TAC_2023_3319070 crossref_primary_10_1109_JPROC_2021_3053601 crossref_primary_10_1007_s10994_016_5569_5 crossref_primary_10_1109_JSAC_2020_3018804 crossref_primary_10_1016_j_ejor_2025_08_038 crossref_primary_10_1109_TNNLS_2023_3315598 crossref_primary_10_1109_TNSM_2024_3485196 crossref_primary_10_1007_s10994_024_06653_5 crossref_primary_10_1007_s11590_019_01403_2 crossref_primary_10_1109_TAC_2018_2890756 crossref_primary_10_1007_s10626_015_0216_z crossref_primary_10_1109_TNNLS_2023_3339885 crossref_primary_10_1109_TSMC_2024_3516377 crossref_primary_10_1109_TAC_2022_3152724 crossref_primary_10_1109_TAC_2024_3379246 crossref_primary_10_1109_TPAMI_2024_3457538 crossref_primary_10_1007_s10817_022_09657_9 crossref_primary_10_1109_TAI_2024_3379109 crossref_primary_10_1287_mnsc_2022_03736 crossref_primary_10_1016_j_jprocont_2022_05_004 crossref_primary_10_1109_TNNLS_2023_3348422 crossref_primary_10_1109_TNNLS_2024_3496492 |
| Cites_doi | 10.1016/S0005-1098(99)00099-0 10.1137/S0363012995282784 10.2307/3212261 10.1016/j.sysconle.2004.08.007 10.1007/978-93-86279-38-5 10.1137/S0363012997331639 10.1109/9.905687 10.1016/j.sysconle.2010.08.013 10.1137/S0363012901385691 10.1016/j.sysconle.2011.04.002 10.1109/TAC.1983.1103166 10.1109/9.119632 10.1016/j.automatica.2009.07.008 10.1007/978-3-642-75894-2 10.1016/0893-6080(89)90018-X |
| ContentType | Journal Article |
| Copyright | Springer Science+Business Media, LLC 2012 |
| Copyright_xml | – notice: Springer Science+Business Media, LLC 2012 |
| DBID | AAYXX CITATION 3V. 7SC 7TB 7WY 7WZ 7XB 87Z 88I 8AO 8FD 8FE 8FG 8FK 8FL 8G5 ABJCF ABUWG AFKRA ARAPS AZQEC BENPR BEZIV BGLVJ CCPQU DWQXO FR3 FRNLG F~G GNUQQ GUQSH HCIFZ JQ2 K60 K6~ K7- KR7 L.- L.0 L6V L7M L~C L~D M0C M2O M2P M7S MBDVC P5Z P62 PHGZM PHGZT PKEHL PQBIZ PQBZA PQEST PQGLB PQQKQ PQUKI PRINS PTHSS Q9U |
| DOI | 10.1007/s10957-012-9989-5 |
| DatabaseName | CrossRef ProQuest Central (Corporate) Computer and Information Systems Abstracts Mechanical & Transportation Engineering Abstracts ABI/INFORM Collection ABI/INFORM Global (PDF only) ProQuest Central (purchase pre-March 2016) ABI/INFORM Collection Science Database (Alumni Edition) ProQuest Pharma Collection Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) (purchase pre-March 2016) ABI/INFORM Collection (Alumni Edition) Research Library (Alumni Edition) Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland Advanced Technologies & Computer Science Collection ProQuest Central Essentials ProQuest Central Business Premium Collection Technology Collection ProQuest One Community College ProQuest Central Engineering Research Database Business Premium Collection (Alumni) ABI/INFORM Global (Corporate) ProQuest Central Student Research Library Prep SciTech Premium Collection ProQuest Computer Science Collection ProQuest Business Collection (Alumni Edition) ProQuest Business Collection Computer Science Database Civil Engineering Abstracts ABI/INFORM Professional Advanced ABI/INFORM Professional Standard ProQuest Engineering Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional ABI/INFORM Global (OCUL) Research Library Science Database Engineering Database Research Library (Corporate) Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic (New) ProQuest One Academic Middle East (New) ProQuest One Business ProQuest One Business (Alumni) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection ProQuest Central Basic |
| DatabaseTitle | CrossRef ProQuest Business Collection (Alumni Edition) Research Library Prep Computer Science Database ProQuest Central Student ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection Computer and Information Systems Abstracts SciTech Premium Collection ProQuest Central China ABI/INFORM Complete ProQuest One Applied & Life Sciences ProQuest Central (New) Engineering Collection Advanced Technologies & Aerospace Collection Business Premium Collection ABI/INFORM Global Engineering Database ProQuest Science Journals (Alumni Edition) ProQuest One Academic Eastern Edition ProQuest Technology Collection ProQuest Business Collection ProQuest One Academic UKI Edition Engineering Research Database ProQuest One Academic ProQuest One Academic (New) ABI/INFORM Global (Corporate) ProQuest One Business Technology Collection Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest One Academic Middle East (New) Mechanical & Transportation Engineering Abstracts ProQuest Central (Alumni Edition) ProQuest One Community College Research Library (Alumni Edition) ProQuest Pharma Collection ProQuest Central ABI/INFORM Professional Advanced ProQuest Engineering Collection ABI/INFORM Professional Standard ProQuest Central Korea ProQuest Research Library Advanced Technologies Database with Aerospace ABI/INFORM Complete (Alumni Edition) Civil Engineering Abstracts ABI/INFORM Global (Alumni Edition) ProQuest Central Basic ProQuest Science Journals ProQuest SciTech Collection Computer and Information Systems Abstracts Professional Advanced Technologies & Aerospace Database Materials Science & Engineering Collection ProQuest One Business (Alumni) ProQuest Central (Alumni) Business Premium Collection (Alumni) |
| DatabaseTitleList | ProQuest Business Collection (Alumni Edition) Civil Engineering Abstracts |
| Database_xml | – sequence: 1 dbid: BENPR name: ProQuest Central url: https://www.proquest.com/central sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Mathematics |
| EISSN | 1573-2878 |
| EndPage | 708 |
| ExternalDocumentID | 2660434381 10_1007_s10957_012_9989_5 |
| Genre | Feature |
| GroupedDBID | -52 -5D -5G -BR -EM -Y2 -~C -~X .4S .86 .DC .VR 06D 0R~ 0VY 199 1N0 1SB 2.D 203 28- 29L 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 30V 3V. 4.4 406 408 409 40D 40E 5GY 5QI 5VS 67Z 6NX 78A 7WY 88I 8AO 8FE 8FG 8FL 8G5 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AANZL AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDPE ABDZT ABECU ABFTV ABHLI ABHQN ABJCF ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTAH ABTEG ABTHY ABTKH ABTMW ABULA ABUWG ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACGOD ACHSB ACHXU ACIWK ACKNC ACMDZ ACMLO ACOKC ACOMO ACPIV ACZOJ ADHHG ADHIR ADINQ ADKNI ADKPE ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFKRA AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHKAY AHSBF AHYZX AI. AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARAPS ARCSS ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN AZQEC B-. BA0 BAPOH BBWZM BDATZ BENPR BEZIV BGLVJ BGNMA BPHCQ BSONS CAG CCPQU COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 DWQXO EBLON EBS EDO EIOEI EJD ESBYG FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRNLG FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNUQQ GNWQR GQ6 GQ7 GQ8 GROUPED_ABI_INFORM_COMPLETE GROUPED_ABI_INFORM_RESEARCH GUQSH GXS H13 HCIFZ HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF H~9 I-F I09 IHE IJ- IKXTQ ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ K60 K6V K6~ K7- KDC KOV KOW L6V LAK LLZTM M0C M2O M2P M4Y M7S MA- N2Q N9A NB0 NDZJH NPVJJ NQJWS NU0 O93 O9G O9I O9J OAM OVD P19 P2P P62 P9R PF0 PKN PQBIZ PQBZA PQQKQ PROAC PT4 PT5 PTHSS Q2X QOK QOS R4E R89 R9I RHV RNI RNS ROL RPX RSV RZC RZE RZK S16 S1Z S26 S27 S28 S3B SAP SCLPG SDD SDH SDM SHX SISQX SJYHP SMT SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TEORI TN5 TSG TSK TSV TUC TUS TWZ U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW VH1 VOH W23 W48 WH7 WK8 YLTOR YQT Z45 Z7R Z7S Z7U Z7X Z7Y Z7Z Z81 Z83 Z86 Z88 Z8M Z8N Z8R Z8S Z8T Z8U Z8W Z92 ZCG ZMTXR ZWQNP ZY4 ~EX AAPKM AAYXX ABBRH ABDBE ABFSG ABRTQ ACSTC ADHKG ADXHL AEZWR AFDZB AFFHD AFHIU AFOHR AGQPQ AHPBZ AHWEU AIXLP AMVHM ATHPR AYFIA CITATION PHGZM PHGZT PQGLB 7SC 7TB 7XB 8FD 8FK FR3 JQ2 KR7 L.- L.0 L7M L~C L~D MBDVC PKEHL PQEST PQUKI PRINS Q9U PUEGO |
| ID | FETCH-LOGICAL-c349t-476b5165fc0577f49c307d2a3ff7a97eb6d7db7799efffbb20ce4fd033d360de3 |
| IEDL.DBID | 7WY |
| ISICitedReferencesCount | 67 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000303867400009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0022-3239 |
| IngestDate | Fri Sep 05 13:58:42 EDT 2025 Thu Dec 04 23:22:36 EST 2025 Sat Nov 29 06:02:24 EST 2025 Tue Nov 18 20:33:33 EST 2025 Fri Feb 21 02:34:20 EST 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 3 |
| Keywords | Constrained Markov decision processes Function approximation Actor–critic algorithm Long-run average cost criterion |
| Language | English |
| License | http://www.springer.com/tdm |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c349t-476b5165fc0577f49c307d2a3ff7a97eb6d7db7799efffbb20ce4fd033d360de3 |
| Notes | SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-2 content type line 23 |
| PQID | 1013471582 |
| PQPubID | 48247 |
| PageCount | 21 |
| ParticipantIDs | proquest_miscellaneous_1031300633 proquest_journals_1013471582 crossref_citationtrail_10_1007_s10957_012_9989_5 crossref_primary_10_1007_s10957_012_9989_5 springer_journals_10_1007_s10957_012_9989_5 |
| PublicationCentury | 2000 |
| PublicationDate | 20120600 2012-6-00 20120601 |
| PublicationDateYYYYMMDD | 2012-06-01 |
| PublicationDate_xml | – month: 6 year: 2012 text: 20120600 |
| PublicationDecade | 2010 |
| PublicationPlace | Boston |
| PublicationPlace_xml | – name: Boston – name: New York |
| PublicationTitle | Journal of optimization theory and applications |
| PublicationTitleAbbrev | J Optim Theory Appl |
| PublicationYear | 2012 |
| Publisher | Springer US Springer Nature B.V |
| Publisher_xml | – name: Springer US – name: Springer Nature B.V |
| References | WalrandJ.An Introduction to Queueing Networks1988New JerseyPrentice Hall0854.60090 KondaV.R.TsitsiklisJ.N.On actor–critic algorithmsSIAM J. Control Optim.20034241143116620447891049.9309510.1137/S0363012901385691 AltmanE.Constrained Markov Decision Processes1999LondonChapman and Hall/CRC Press0963.90068 LazarA.Optimal flow control of a class of queuing networks in equilibriumIEEE Trans. Autom. Control198328100110077223510526.9004110.1109/TAC.1983.1103166 BorkarV.S.MeynS.P.The O.D.E. method for convergence of stochastic approximation and reinforcement learningSIAM J. Control Optim.200038244746917411480990.6207110.1137/S0363012997331639 Mas-ColellA.WhinstonM.D.GreenJ.R.Microeconomic Theory1995OxfordOxford University Press BhatnagarS.The Borkar-Meyn theorem for asynchronous stochastic approximationsSyst. Control Lett.20116047247828494901222.9322910.1016/j.sysconle.2011.04.002 HirschM.W.Convergent activation dynamics in continuous time networksNeural Netw.1989233134910.1016/0893-6080(89)90018-X BenvenisteA.MétivierM.PriouretP.Adaptive Algorithms and Stochastic Approximations1990BerlinSpringer0752.93073 BertsekasD.P.TsitsiklisJ.N.Neuro-Dynamic Programming1996BelmontAthena Scientific0924.68163 SpallJ.C.Multivariate stochastic approximation using a simultaneous perturbation gradient approximationIEEE Trans. Autom. Control199237333234111487150745.6011010.1109/9.119632 BorkarV.S.Stochastic Approximation: A Dynamical Systems Viewpoint2008CambridgeCambridge University Press and Hindustan Book Agency MarbachP.TsitsiklisJ.N.Simulation-based optimization of Markov reward processesIEEE Trans. Autom. Control20014619120918145680992.9308810.1109/9.905687 BorkarV.S.Asynchronous stochastic approximationsSIAM J. Control Optim.199836384085116138690922.6208110.1137/S0363012995282784 SuttonR.S.BartoA.Reinforcement Learning: An Introduction1998CambridgeMIT Press TsitsiklisJ.N.Van RoyB.Average cost temporal-difference learningAutomatica199935179918080932.9308510.1016/S0005-1098(99)00099-0 SuttonR.S.McAllesterD.SinghS.MansourY.Policy gradient methods for reinforcement learning with function approximationAdvances in Neural Information Processing Systems (NIPS)2000CambridgeMIT Press10571063 BorkarV.S.An actor–critic algorithm for constrained Markov decision processesSyst. Control Lett.20055420721321155381129.9032210.1016/j.sysconle.2004.08.007 SchweitzerP.J.Perturbation theory and finite Markov chainsJ. Appl. Probab.196854014132345270196.1980310.2307/3212261 BhatnagarS.SuttonR.S.GhavamzadehM.LeeM.Natural actor–critic algorithmsAutomatica200945247124821183.9313010.1016/j.automatica.2009.07.008 BhatnagarS.An actor–critic algorithm with function approximation for discounted cost constrained Markov decision processesSyst. Control Lett.20105976076627797861209.9034410.1016/j.sysconle.2010.08.013 S. Bhatnagar (9989_CR5) 2009; 45 V.S. Borkar (9989_CR16) 2008 V.S. Borkar (9989_CR15) 1998; 36 V.R. Konda (9989_CR4) 2003; 42 R.S. Sutton (9989_CR3) 1998 R.S. Sutton (9989_CR7) 2000 S. Bhatnagar (9989_CR14) 2011; 60 J. Walrand (9989_CR13) 1988 E. Altman (9989_CR1) 1999 J.N. Tsitsiklis (9989_CR6) 1999; 35 V.S. Borkar (9989_CR18) 2000; 38 A. Benveniste (9989_CR17) 1990 P.J. Schweitzer (9989_CR19) 1968; 5 D.P. Bertsekas (9989_CR2) 1996 A. Lazar (9989_CR9) 1983; 28 S. Bhatnagar (9989_CR10) 2010; 59 V.S. Borkar (9989_CR12) 2005; 54 M.W. Hirsch (9989_CR20) 1989; 2 P. Marbach (9989_CR8) 2001; 46 A. Mas-Colell (9989_CR21) 1995 J.C. Spall (9989_CR11) 1992; 37 |
| References_xml | – reference: BhatnagarS.An actor–critic algorithm with function approximation for discounted cost constrained Markov decision processesSyst. Control Lett.20105976076627797861209.9034410.1016/j.sysconle.2010.08.013 – reference: BorkarV.S.MeynS.P.The O.D.E. method for convergence of stochastic approximation and reinforcement learningSIAM J. Control Optim.200038244746917411480990.6207110.1137/S0363012997331639 – reference: BhatnagarS.The Borkar-Meyn theorem for asynchronous stochastic approximationsSyst. Control Lett.20116047247828494901222.9322910.1016/j.sysconle.2011.04.002 – reference: AltmanE.Constrained Markov Decision Processes1999LondonChapman and Hall/CRC Press0963.90068 – reference: BhatnagarS.SuttonR.S.GhavamzadehM.LeeM.Natural actor–critic algorithmsAutomatica200945247124821183.9313010.1016/j.automatica.2009.07.008 – reference: TsitsiklisJ.N.Van RoyB.Average cost temporal-difference learningAutomatica199935179918080932.9308510.1016/S0005-1098(99)00099-0 – reference: BorkarV.S.Asynchronous stochastic approximationsSIAM J. Control Optim.199836384085116138690922.6208110.1137/S0363012995282784 – reference: LazarA.Optimal flow control of a class of queuing networks in equilibriumIEEE Trans. Autom. Control198328100110077223510526.9004110.1109/TAC.1983.1103166 – reference: WalrandJ.An Introduction to Queueing Networks1988New JerseyPrentice Hall0854.60090 – reference: KondaV.R.TsitsiklisJ.N.On actor–critic algorithmsSIAM J. Control Optim.20034241143116620447891049.9309510.1137/S0363012901385691 – reference: SchweitzerP.J.Perturbation theory and finite Markov chainsJ. Appl. Probab.196854014132345270196.1980310.2307/3212261 – reference: SuttonR.S.McAllesterD.SinghS.MansourY.Policy gradient methods for reinforcement learning with function approximationAdvances in Neural Information Processing Systems (NIPS)2000CambridgeMIT Press10571063 – reference: Mas-ColellA.WhinstonM.D.GreenJ.R.Microeconomic Theory1995OxfordOxford University Press – reference: BorkarV.S.Stochastic Approximation: A Dynamical Systems Viewpoint2008CambridgeCambridge University Press and Hindustan Book Agency – reference: BenvenisteA.MétivierM.PriouretP.Adaptive Algorithms and Stochastic Approximations1990BerlinSpringer0752.93073 – reference: BertsekasD.P.TsitsiklisJ.N.Neuro-Dynamic Programming1996BelmontAthena Scientific0924.68163 – reference: BorkarV.S.An actor–critic algorithm for constrained Markov decision processesSyst. Control Lett.20055420721321155381129.9032210.1016/j.sysconle.2004.08.007 – reference: SuttonR.S.BartoA.Reinforcement Learning: An Introduction1998CambridgeMIT Press – reference: SpallJ.C.Multivariate stochastic approximation using a simultaneous perturbation gradient approximationIEEE Trans. Autom. Control199237333234111487150745.6011010.1109/9.119632 – reference: HirschM.W.Convergent activation dynamics in continuous time networksNeural Netw.1989233134910.1016/0893-6080(89)90018-X – reference: MarbachP.TsitsiklisJ.N.Simulation-based optimization of Markov reward processesIEEE Trans. Autom. Control20014619120918145680992.9308810.1109/9.905687 – volume: 35 start-page: 1799 year: 1999 ident: 9989_CR6 publication-title: Automatica doi: 10.1016/S0005-1098(99)00099-0 – volume: 36 start-page: 840 issue: 3 year: 1998 ident: 9989_CR15 publication-title: SIAM J. Control Optim. doi: 10.1137/S0363012995282784 – volume: 5 start-page: 401 year: 1968 ident: 9989_CR19 publication-title: J. Appl. Probab. doi: 10.2307/3212261 – volume: 54 start-page: 207 year: 2005 ident: 9989_CR12 publication-title: Syst. Control Lett. doi: 10.1016/j.sysconle.2004.08.007 – volume-title: Stochastic Approximation: A Dynamical Systems Viewpoint year: 2008 ident: 9989_CR16 doi: 10.1007/978-93-86279-38-5 – volume: 38 start-page: 447 issue: 2 year: 2000 ident: 9989_CR18 publication-title: SIAM J. Control Optim. doi: 10.1137/S0363012997331639 – volume-title: Microeconomic Theory year: 1995 ident: 9989_CR21 – volume: 46 start-page: 191 year: 2001 ident: 9989_CR8 publication-title: IEEE Trans. Autom. Control doi: 10.1109/9.905687 – volume: 59 start-page: 760 year: 2010 ident: 9989_CR10 publication-title: Syst. Control Lett. doi: 10.1016/j.sysconle.2010.08.013 – volume-title: Reinforcement Learning: An Introduction year: 1998 ident: 9989_CR3 – volume: 42 start-page: 1143 issue: 4 year: 2003 ident: 9989_CR4 publication-title: SIAM J. Control Optim. doi: 10.1137/S0363012901385691 – volume: 60 start-page: 472 year: 2011 ident: 9989_CR14 publication-title: Syst. Control Lett. doi: 10.1016/j.sysconle.2011.04.002 – volume: 28 start-page: 1001 year: 1983 ident: 9989_CR9 publication-title: IEEE Trans. Autom. Control doi: 10.1109/TAC.1983.1103166 – volume-title: Constrained Markov Decision Processes year: 1999 ident: 9989_CR1 – volume: 37 start-page: 332 issue: 3 year: 1992 ident: 9989_CR11 publication-title: IEEE Trans. Autom. Control doi: 10.1109/9.119632 – start-page: 1057 volume-title: Advances in Neural Information Processing Systems (NIPS) year: 2000 ident: 9989_CR7 – volume: 45 start-page: 2471 year: 2009 ident: 9989_CR5 publication-title: Automatica doi: 10.1016/j.automatica.2009.07.008 – volume-title: Adaptive Algorithms and Stochastic Approximations year: 1990 ident: 9989_CR17 doi: 10.1007/978-3-642-75894-2 – volume-title: Neuro-Dynamic Programming year: 1996 ident: 9989_CR2 – volume: 2 start-page: 331 year: 1989 ident: 9989_CR20 publication-title: Neural Netw. doi: 10.1016/0893-6080(89)90018-X – volume-title: An Introduction to Queueing Networks year: 1988 ident: 9989_CR13 |
| SSID | ssj0009874 |
| Score | 2.2578378 |
| Snippet | We develop an online actor–critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We... We develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We... |
| SourceID | proquest crossref springer |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 688 |
| SubjectTerms | Algorithms Applications of Mathematics Approximation Asymptotic properties Calculus of Variations and Optimal Control; Optimization Costs Engineering Lagrange multiplier Markov analysis Markov processes Mathematical analysis Mathematical models Mathematics Mathematics and Statistics On-line systems Online Operations Research/Decision Theory Optimization Queues Random variables Theory of Computation |
| SummonAdditionalLinks | – databaseName: SpringerLINK dbid: RSV link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3JTsMwELWgcIADO6JQkJE4gSKldhzHxwioOECFxKLeoji2oVJJUdNWHPkH_pAvwZOlLQiQ4JbFcSyPnZnM8h5CRyqJm7Ei0mHMKMfThjgS3BycMd_Ya1ap5oXCl7zdDjodcV3WcWdVtnsVksy_1DPFboJBmiRxBOT5sHm0wABsBn7Rb-6nSLtBBb1MHEqoqEKZ33XxWRlNLcwvQdFc17RW_zXKNbRSmpY4LNbCOprT6QZangEctGdXE5TWbBM9hikuoEZxCL7799e3gvkAh72Hvj16fMLgp8Utq_xAgDgEBPKXblHuiK29i4HwM6eZ0ApD3U9_jM9K2h5cFiHobAvdtc5vTy-cknnBSagnho7HfcmaPjOJNee48URiPwWKxNQYHgugUVFcSc6F0MYYKYmbaM8ol1JFfVdpuo1qaT_VOwhLNzBakIRrKoHWQyoueays5WF7MQGtI7cSQZSUsOQw7F40BVSGKY3slEYwpRGro-PJI88FJsdvjRuVXKNye2aQ10atVmYBqaPDyW27sSBaEqe6P4I2FEJ9PrVDPKlkPdvFDy_c_VPrPbREYLHkfp0Gqg0HI72PFpPxsJsNDvKV_QFT-fV5 priority: 102 providerName: Springer Nature |
| Title | An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes |
| URI | https://link.springer.com/article/10.1007/s10957-012-9989-5 https://www.proquest.com/docview/1013471582 https://www.proquest.com/docview/1031300633 |
| Volume | 153 |
| WOSCitedRecordID | wos000303867400009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVPQU databaseName: ABI/INFORM Collection customDbUrl: eissn: 1573-2878 dateEnd: 20171231 omitProxy: false ssIdentifier: ssj0009874 issn: 0022-3239 databaseCode: 7WY dateStart: 19970101 isFulltext: true titleUrlDefault: https://www.proquest.com/abicomplete providerName: ProQuest – providerCode: PRVPQU databaseName: ABI/INFORM Global (OCUL) customDbUrl: eissn: 1573-2878 dateEnd: 20171231 omitProxy: false ssIdentifier: ssj0009874 issn: 0022-3239 databaseCode: M0C dateStart: 19970101 isFulltext: true titleUrlDefault: https://search.proquest.com/abiglobal providerName: ProQuest – providerCode: PRVPQU databaseName: Advanced Technologies & Aerospace Database customDbUrl: eissn: 1573-2878 dateEnd: 20171231 omitProxy: false ssIdentifier: ssj0009874 issn: 0022-3239 databaseCode: P5Z dateStart: 19970101 isFulltext: true titleUrlDefault: https://search.proquest.com/hightechjournals providerName: ProQuest – providerCode: PRVPQU databaseName: Computer Science Database customDbUrl: eissn: 1573-2878 dateEnd: 20171231 omitProxy: false ssIdentifier: ssj0009874 issn: 0022-3239 databaseCode: K7- dateStart: 19970101 isFulltext: true titleUrlDefault: http://search.proquest.com/compscijour providerName: ProQuest – providerCode: PRVPQU databaseName: Engineering Database customDbUrl: eissn: 1573-2878 dateEnd: 20171231 omitProxy: false ssIdentifier: ssj0009874 issn: 0022-3239 databaseCode: M7S dateStart: 19970101 isFulltext: true titleUrlDefault: http://search.proquest.com providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: eissn: 1573-2878 dateEnd: 20171231 omitProxy: false ssIdentifier: ssj0009874 issn: 0022-3239 databaseCode: BENPR dateStart: 19970101 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: Research Library customDbUrl: eissn: 1573-2878 dateEnd: 20171231 omitProxy: false ssIdentifier: ssj0009874 issn: 0022-3239 databaseCode: M2O dateStart: 19970101 isFulltext: true titleUrlDefault: https://search.proquest.com/pqrl providerName: ProQuest – providerCode: PRVPQU databaseName: Science Database customDbUrl: eissn: 1573-2878 dateEnd: 20171231 omitProxy: false ssIdentifier: ssj0009874 issn: 0022-3239 databaseCode: M2P dateStart: 19970101 isFulltext: true titleUrlDefault: https://search.proquest.com/sciencejournals providerName: ProQuest – providerCode: PRVAVX databaseName: SpringerLINK customDbUrl: eissn: 1573-2878 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0009874 issn: 0022-3239 databaseCode: RSV dateStart: 19970101 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3dTxQxEG8QfNAHQNR4iqQmPGGa7LXb7faJrMCFBDku4Af6stl-KQnuIXsQ_3xndrt3aCIvvkzS_WibTNuZzHR-P0K2na2GleOGSRkcS33gzGCYQ0mZBXgGRrUtFH6vxuP8_FxPYsCtidcq-zOxPajd1GKMHHY3Fj0OZc53r34yZI3C7Gqk0HhAVsBQS2QwUJ-_LEB38x6FmTPBhe6zml3pnJZ46ZIzjbeG5J92aeFs_pUfbc3OaO1_J7xOVqPDSYtuhTwhS77eII_vwBBC63iO3do8JaGoaQdASguM6LOODYEWl9-g-9n3HxRjt3QEBhGVSgtEJf910ZVAUvCBKZKAttQT3lGsBZre0v1I5UNjYYJvnpGPo4MPe4cssjEwK1I9Y6nKjBxmMlhw8VRItYXjwfFKhKAqjdQqTjmjlNY-hGAMT6xPg0uEcCJLnBfPyXI9rf0LQk2SB6-5VV4YpPowThlVOfBGoJeQiwFJel2UNkKV47QvywXIMqqvBPWVqL5SDsjO_JerDqfjvo83e5WVccs25UJfA_Jm_ho2G2ZQqtpPb_Abgem_TMAU3_YL424X_xjw5f0DviKPOK7ENrizSZZn1zf-NXlob2cXzfVWu6a3yMq7g_HkFFpHioE8TvZQ8pNWTlCqM5AT-RXk6dmn3weSBl8 |
| linkProvider | ProQuest |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1LT9wwEB5RqNRygD6ougVaV2ovrSxl7TiODwhFwAq0y6oHKnFL4xcg0SyQhbZ_qr-xnjx2aSW4cegxL9uJv7EnHs_3AXywpugXlmkqhLc0dp5RjcscUojEh3NhUq0ThUdyPE6Pj9WXBfjd5cLgtspuTKwHajsxuEYerBuTHvsiZdsXlxRVozC62kloNLAYul8_wi9btXWwG_r3I2ODvaOdfdqqClDDYzWlsUy06CfCm-CqSB8rE2BuWcG9l4VCiRArrZZSKee915pFxsXeRpxbnkTW8VDuI1iKeZqgRQ0lnZP8ph3rM6OccdVFUZtUPSVwkyejCncpib_nwblz-088tp7mBqv_2wd6BiutQ02yxgKew4IrX8DyLZrFcHQ446atXoLPStIQrJIMIxa0UXsg2flJeJ3p6XeCa9NkECZ8BC3JkHX951mT4kmCj09Q5LSW1nCWYK7T5IbstlJFpE28cNUafH2Q134Fi-WkdK-B6Cj1TjEjHdcoZaKt1LKwwdsKpfiU9yDq-j43LRU7Nvs8n5NII1zyAJcc4ZKLHnyaPXLR8JDcd_NGB5G8HZKqfI6PHryfXQ6DCUaIitJNrvEejuHNhIcmfu6AeLuIOyp8c3-F7-DJ_tHhKB8djIfr8JShFdQLWRuwOL26dpvw2NxMz6qrt7U9Efj20Pj8A7FiXOQ |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Lb9QwEB6VLUJw4I1YKGAkuIAssnYcxweEAtsVVctqhUDqLcSxTSu12dJsC_w1fh0zeewWJHrrgWMSx4_ksz32eL4P4Jkri1HhhOVKBcdjHwS3tM2hlUoC3sNJtQkU3tHTabq7a2Zr8KuPhaFjlf2Y2AzUbl7SHjn2bgp6HKlUvArdsYjZePLm6BsnBSnytPZyGi1Etv3P77h8q19vjfFfPxdisvnp3XveKQzwUsZmwWOdWDVKVCjRbNEhNiVC3olChqALQ3IhTjurtTE-hGCtiEofBxdJ6WQSOS8x30uwriUuegaw_nZzOvu4ovxNew5owaWQpveptoF7RtGRT8ENnVlSf86KK1P3L-9sM-lNbvzPn-smXO9MbZa1feMWrPnqNlw7Q8CIVx-WrLX1HQhZxVrqVZaRL4O3OhAsO_iKzVnsHTLatWYTNAUIziwjPvYf-23wJ0Prn5H8aSO64R2jKKj5KRt3IkasC8nw9V34fCHNvgeDal75-8BslAZvRKm9tCRyYp22unBoh2EuIZVDiHoc5GVH0k7VPshX9NIEnRyhkxN0cjWEF8tXjlqGkvMSb_RwybvBqs5XWBnC0-VjHGbId1RUfn5CaSQ5PhOJVXzZg_JsFv8o8MH5BT6BKwjLfGdruv0QrgrqEM0O1wYMFscn_hFcLk8X-_Xx465zMfhy0QD9DSlJZzY |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Online+Actor-Critic+Algorithm+with+Function+Approximation+for+Constrained+Markov+Decision+Processes&rft.jtitle=Journal+of+optimization+theory+and+applications&rft.au=Bhatnagar%2C+Shalabh&rft.au=Lakshmanan%2C+K&rft.date=2012-06-01&rft.issn=0022-3239&rft.eissn=1573-2878&rft.volume=153&rft.issue=3&rft.spage=688&rft.epage=708&rft_id=info:doi/10.1007%2Fs10957-012-9989-5&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0022-3239&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0022-3239&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0022-3239&client=summon |