An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes

We develop an online actor–critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable po...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Journal of optimization theory and applications Ročník 153; číslo 3; s. 688 - 708
Hlavní autori: Bhatnagar, Shalabh, Lakshmanan, K.
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Boston Springer US 01.06.2012
Springer Nature B.V
Predmet:
ISSN:0022-3239, 1573-2878
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract We develop an online actor–critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.
AbstractList We develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.[PUBLICATION ABSTRACT]
We develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.
Author Lakshmanan, K.
Bhatnagar, Shalabh
Author_xml – sequence: 1
  givenname: Shalabh
  surname: Bhatnagar
  fullname: Bhatnagar, Shalabh
  email: shalabh@csa.iisc.ernet.in
  organization: Department of Computer Science and Automation, Indian Institute of Science
– sequence: 2
  givenname: K.
  surname: Lakshmanan
  fullname: Lakshmanan, K.
  organization: Department of Computer Science and Automation, Indian Institute of Science
BookMark eNp9kU1OwzAQhS0EEm3hAOwssWETGNtJHC-jQgGpqCxgHSWO3aakdrFTfnbcgRtyEtyWBaoEG4-t-d7ojV8f7RtrFEInBM4JAL_wBETCIyA0EiITUbKHeiThLKIZz_ZRD4DSiFEmDlHf-zkAiIzHPTTLDZ6YtjEK57Kz7uvjc-iarpE4b6c23GYL_BpOPFoZ2TXW4Hy5dPatWZSbl7YOD63xnSvDjBrfle7JvuBLJRu_7t87K5X3yh-hA122Xh3_1AF6HF09DG-i8eT6dpiPI8li0UUxT6uEpImWkHCuYyEZ8JqWTGteCq6qtOZ1xbkQSmtdVRSkinUNjNUshVqxATrbzg0un1fKd8Wi8VK1bWmUXfmCACMMIGUsoKc76NyunAnuAkVYzEmS0UDxLSWd9d4pXcim2yy_3rkNaLFOoNgmUIQEinUCRRKUZEe5dOHf3Pu_GrrV-MCaqXK_Pf0l-gZF7Zyl
CitedBy_id crossref_primary_10_1007_s10957_020_01652_7
crossref_primary_10_3390_en18112959
crossref_primary_10_1016_j_rser_2025_116022
crossref_primary_10_1109_TAC_2023_3319070
crossref_primary_10_1109_JPROC_2021_3053601
crossref_primary_10_1007_s10994_016_5569_5
crossref_primary_10_1109_JSAC_2020_3018804
crossref_primary_10_1016_j_ejor_2025_08_038
crossref_primary_10_1109_TNNLS_2023_3315598
crossref_primary_10_1109_TNSM_2024_3485196
crossref_primary_10_1007_s10994_024_06653_5
crossref_primary_10_1007_s11590_019_01403_2
crossref_primary_10_1109_TAC_2018_2890756
crossref_primary_10_1007_s10626_015_0216_z
crossref_primary_10_1109_TNNLS_2023_3339885
crossref_primary_10_1109_TSMC_2024_3516377
crossref_primary_10_1109_TAC_2022_3152724
crossref_primary_10_1109_TAC_2024_3379246
crossref_primary_10_1109_TPAMI_2024_3457538
crossref_primary_10_1007_s10817_022_09657_9
crossref_primary_10_1109_TAI_2024_3379109
crossref_primary_10_1287_mnsc_2022_03736
crossref_primary_10_1016_j_jprocont_2022_05_004
crossref_primary_10_1109_TNNLS_2023_3348422
crossref_primary_10_1109_TNNLS_2024_3496492
Cites_doi 10.1016/S0005-1098(99)00099-0
10.1137/S0363012995282784
10.2307/3212261
10.1016/j.sysconle.2004.08.007
10.1007/978-93-86279-38-5
10.1137/S0363012997331639
10.1109/9.905687
10.1016/j.sysconle.2010.08.013
10.1137/S0363012901385691
10.1016/j.sysconle.2011.04.002
10.1109/TAC.1983.1103166
10.1109/9.119632
10.1016/j.automatica.2009.07.008
10.1007/978-3-642-75894-2
10.1016/0893-6080(89)90018-X
ContentType Journal Article
Copyright Springer Science+Business Media, LLC 2012
Copyright_xml – notice: Springer Science+Business Media, LLC 2012
DBID AAYXX
CITATION
3V.
7SC
7TB
7WY
7WZ
7XB
87Z
88I
8AO
8FD
8FE
8FG
8FK
8FL
8G5
ABJCF
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BEZIV
BGLVJ
CCPQU
DWQXO
FR3
FRNLG
F~G
GNUQQ
GUQSH
HCIFZ
JQ2
K60
K6~
K7-
KR7
L.-
L.0
L6V
L7M
L~C
L~D
M0C
M2O
M2P
M7S
MBDVC
P5Z
P62
PHGZM
PHGZT
PKEHL
PQBIZ
PQBZA
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
Q9U
DOI 10.1007/s10957-012-9989-5
DatabaseName CrossRef
ProQuest Central (Corporate)
Computer and Information Systems Abstracts
Mechanical & Transportation Engineering Abstracts
ABI/INFORM Collection
ABI/INFORM Global (PDF only)
ProQuest Central (purchase pre-March 2016)
ABI/INFORM Collection
Science Database (Alumni Edition)
ProQuest Pharma Collection
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
ABI/INFORM Collection (Alumni Edition)
Research Library (Alumni Edition)
Materials Science & Engineering Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Advanced Technologies & Computer Science Collection
ProQuest Central Essentials
ProQuest Central
Business Premium Collection
Technology Collection
ProQuest One Community College
ProQuest Central
Engineering Research Database
Business Premium Collection (Alumni)
ABI/INFORM Global (Corporate)
ProQuest Central Student
Research Library Prep
SciTech Premium Collection
ProQuest Computer Science Collection
ProQuest Business Collection (Alumni Edition)
ProQuest Business Collection
Computer Science Database
Civil Engineering Abstracts
ABI/INFORM Professional Advanced
ABI/INFORM Professional Standard
ProQuest Engineering Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
ABI/INFORM Global (OCUL)
Research Library
Science Database
Engineering Database
Research Library (Corporate)
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic (New)
ProQuest One Academic Middle East (New)
ProQuest One Business
ProQuest One Business (Alumni)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering Collection
ProQuest Central Basic
DatabaseTitle CrossRef
ProQuest Business Collection (Alumni Edition)
Research Library Prep
Computer Science Database
ProQuest Central Student
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
SciTech Premium Collection
ProQuest Central China
ABI/INFORM Complete
ProQuest One Applied & Life Sciences
ProQuest Central (New)
Engineering Collection
Advanced Technologies & Aerospace Collection
Business Premium Collection
ABI/INFORM Global
Engineering Database
ProQuest Science Journals (Alumni Edition)
ProQuest One Academic Eastern Edition
ProQuest Technology Collection
ProQuest Business Collection
ProQuest One Academic UKI Edition
Engineering Research Database
ProQuest One Academic
ProQuest One Academic (New)
ABI/INFORM Global (Corporate)
ProQuest One Business
Technology Collection
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest One Academic Middle East (New)
Mechanical & Transportation Engineering Abstracts
ProQuest Central (Alumni Edition)
ProQuest One Community College
Research Library (Alumni Edition)
ProQuest Pharma Collection
ProQuest Central
ABI/INFORM Professional Advanced
ProQuest Engineering Collection
ABI/INFORM Professional Standard
ProQuest Central Korea
ProQuest Research Library
Advanced Technologies Database with Aerospace
ABI/INFORM Complete (Alumni Edition)
Civil Engineering Abstracts
ABI/INFORM Global (Alumni Edition)
ProQuest Central Basic
ProQuest Science Journals
ProQuest SciTech Collection
Computer and Information Systems Abstracts Professional
Advanced Technologies & Aerospace Database
Materials Science & Engineering Collection
ProQuest One Business (Alumni)
ProQuest Central (Alumni)
Business Premium Collection (Alumni)
DatabaseTitleList ProQuest Business Collection (Alumni Edition)
Civil Engineering Abstracts

Database_xml – sequence: 1
  dbid: BENPR
  name: ProQuest Central
  url: https://www.proquest.com/central
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Mathematics
EISSN 1573-2878
EndPage 708
ExternalDocumentID 2660434381
10_1007_s10957_012_9989_5
Genre Feature
GroupedDBID -52
-5D
-5G
-BR
-EM
-Y2
-~C
-~X
.4S
.86
.DC
.VR
06D
0R~
0VY
199
1N0
1SB
2.D
203
28-
29L
2J2
2JN
2JY
2KG
2KM
2LR
2P1
2VQ
2~H
30V
3V.
4.4
406
408
409
40D
40E
5GY
5QI
5VS
67Z
6NX
78A
7WY
88I
8AO
8FE
8FG
8FL
8G5
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBXA
ABDPE
ABDZT
ABECU
ABFTV
ABHLI
ABHQN
ABJCF
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTAH
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABUWG
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFS
ACGOD
ACHSB
ACHXU
ACIWK
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACPIV
ACZOJ
ADHHG
ADHIR
ADINQ
ADKNI
ADKPE
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEFIE
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFBBN
AFEXP
AFGCZ
AFKRA
AFLOW
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGGDS
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHSBF
AHYZX
AI.
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARAPS
ARCSS
ARMRJ
ASPBG
AVWKF
AXYYD
AYJHY
AZFZN
AZQEC
B-.
BA0
BAPOH
BBWZM
BDATZ
BENPR
BEZIV
BGLVJ
BGNMA
BPHCQ
BSONS
CAG
CCPQU
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DU5
DWQXO
EBLON
EBS
EDO
EIOEI
EJD
ESBYG
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRNLG
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNUQQ
GNWQR
GQ6
GQ7
GQ8
GROUPED_ABI_INFORM_COMPLETE
GROUPED_ABI_INFORM_RESEARCH
GUQSH
GXS
H13
HCIFZ
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
H~9
I-F
I09
IHE
IJ-
IKXTQ
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
K60
K6V
K6~
K7-
KDC
KOV
KOW
L6V
LAK
LLZTM
M0C
M2O
M2P
M4Y
M7S
MA-
N2Q
N9A
NB0
NDZJH
NPVJJ
NQJWS
NU0
O93
O9G
O9I
O9J
OAM
OVD
P19
P2P
P62
P9R
PF0
PKN
PQBIZ
PQBZA
PQQKQ
PROAC
PT4
PT5
PTHSS
Q2X
QOK
QOS
R4E
R89
R9I
RHV
RNI
RNS
ROL
RPX
RSV
RZC
RZE
RZK
S16
S1Z
S26
S27
S28
S3B
SAP
SCLPG
SDD
SDH
SDM
SHX
SISQX
SJYHP
SMT
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TEORI
TN5
TSG
TSK
TSV
TUC
TUS
TWZ
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
VH1
VOH
W23
W48
WH7
WK8
YLTOR
YQT
Z45
Z7R
Z7S
Z7U
Z7X
Z7Y
Z7Z
Z81
Z83
Z86
Z88
Z8M
Z8N
Z8R
Z8S
Z8T
Z8U
Z8W
Z92
ZCG
ZMTXR
ZWQNP
ZY4
~EX
AAPKM
AAYXX
ABBRH
ABDBE
ABFSG
ABRTQ
ACSTC
ADHKG
ADXHL
AEZWR
AFDZB
AFFHD
AFHIU
AFOHR
AGQPQ
AHPBZ
AHWEU
AIXLP
AMVHM
ATHPR
AYFIA
CITATION
PHGZM
PHGZT
PQGLB
7SC
7TB
7XB
8FD
8FK
FR3
JQ2
KR7
L.-
L.0
L7M
L~C
L~D
MBDVC
PKEHL
PQEST
PQUKI
PRINS
Q9U
PUEGO
ID FETCH-LOGICAL-c349t-476b5165fc0577f49c307d2a3ff7a97eb6d7db7799efffbb20ce4fd033d360de3
IEDL.DBID 7WY
ISICitedReferencesCount 67
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000303867400009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0022-3239
IngestDate Fri Sep 05 13:58:42 EDT 2025
Thu Dec 04 23:22:36 EST 2025
Sat Nov 29 06:02:24 EST 2025
Tue Nov 18 20:33:33 EST 2025
Fri Feb 21 02:34:20 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 3
Keywords Constrained Markov decision processes
Function approximation
Actor–critic algorithm
Long-run average cost criterion
Language English
License http://www.springer.com/tdm
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c349t-476b5165fc0577f49c307d2a3ff7a97eb6d7db7799efffbb20ce4fd033d360de3
Notes SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-2
content type line 23
PQID 1013471582
PQPubID 48247
PageCount 21
ParticipantIDs proquest_miscellaneous_1031300633
proquest_journals_1013471582
crossref_citationtrail_10_1007_s10957_012_9989_5
crossref_primary_10_1007_s10957_012_9989_5
springer_journals_10_1007_s10957_012_9989_5
PublicationCentury 2000
PublicationDate 20120600
2012-6-00
20120601
PublicationDateYYYYMMDD 2012-06-01
PublicationDate_xml – month: 6
  year: 2012
  text: 20120600
PublicationDecade 2010
PublicationPlace Boston
PublicationPlace_xml – name: Boston
– name: New York
PublicationTitle Journal of optimization theory and applications
PublicationTitleAbbrev J Optim Theory Appl
PublicationYear 2012
Publisher Springer US
Springer Nature B.V
Publisher_xml – name: Springer US
– name: Springer Nature B.V
References WalrandJ.An Introduction to Queueing Networks1988New JerseyPrentice Hall0854.60090
KondaV.R.TsitsiklisJ.N.On actor–critic algorithmsSIAM J. Control Optim.20034241143116620447891049.9309510.1137/S0363012901385691
AltmanE.Constrained Markov Decision Processes1999LondonChapman and Hall/CRC Press0963.90068
LazarA.Optimal flow control of a class of queuing networks in equilibriumIEEE Trans. Autom. Control198328100110077223510526.9004110.1109/TAC.1983.1103166
BorkarV.S.MeynS.P.The O.D.E. method for convergence of stochastic approximation and reinforcement learningSIAM J. Control Optim.200038244746917411480990.6207110.1137/S0363012997331639
Mas-ColellA.WhinstonM.D.GreenJ.R.Microeconomic Theory1995OxfordOxford University Press
BhatnagarS.The Borkar-Meyn theorem for asynchronous stochastic approximationsSyst. Control Lett.20116047247828494901222.9322910.1016/j.sysconle.2011.04.002
HirschM.W.Convergent activation dynamics in continuous time networksNeural Netw.1989233134910.1016/0893-6080(89)90018-X
BenvenisteA.MétivierM.PriouretP.Adaptive Algorithms and Stochastic Approximations1990BerlinSpringer0752.93073
BertsekasD.P.TsitsiklisJ.N.Neuro-Dynamic Programming1996BelmontAthena Scientific0924.68163
SpallJ.C.Multivariate stochastic approximation using a simultaneous perturbation gradient approximationIEEE Trans. Autom. Control199237333234111487150745.6011010.1109/9.119632
BorkarV.S.Stochastic Approximation: A Dynamical Systems Viewpoint2008CambridgeCambridge University Press and Hindustan Book Agency
MarbachP.TsitsiklisJ.N.Simulation-based optimization of Markov reward processesIEEE Trans. Autom. Control20014619120918145680992.9308810.1109/9.905687
BorkarV.S.Asynchronous stochastic approximationsSIAM J. Control Optim.199836384085116138690922.6208110.1137/S0363012995282784
SuttonR.S.BartoA.Reinforcement Learning: An Introduction1998CambridgeMIT Press
TsitsiklisJ.N.Van RoyB.Average cost temporal-difference learningAutomatica199935179918080932.9308510.1016/S0005-1098(99)00099-0
SuttonR.S.McAllesterD.SinghS.MansourY.Policy gradient methods for reinforcement learning with function approximationAdvances in Neural Information Processing Systems (NIPS)2000CambridgeMIT Press10571063
BorkarV.S.An actor–critic algorithm for constrained Markov decision processesSyst. Control Lett.20055420721321155381129.9032210.1016/j.sysconle.2004.08.007
SchweitzerP.J.Perturbation theory and finite Markov chainsJ. Appl. Probab.196854014132345270196.1980310.2307/3212261
BhatnagarS.SuttonR.S.GhavamzadehM.LeeM.Natural actor–critic algorithmsAutomatica200945247124821183.9313010.1016/j.automatica.2009.07.008
BhatnagarS.An actor–critic algorithm with function approximation for discounted cost constrained Markov decision processesSyst. Control Lett.20105976076627797861209.9034410.1016/j.sysconle.2010.08.013
S. Bhatnagar (9989_CR5) 2009; 45
V.S. Borkar (9989_CR16) 2008
V.S. Borkar (9989_CR15) 1998; 36
V.R. Konda (9989_CR4) 2003; 42
R.S. Sutton (9989_CR3) 1998
R.S. Sutton (9989_CR7) 2000
S. Bhatnagar (9989_CR14) 2011; 60
J. Walrand (9989_CR13) 1988
E. Altman (9989_CR1) 1999
J.N. Tsitsiklis (9989_CR6) 1999; 35
V.S. Borkar (9989_CR18) 2000; 38
A. Benveniste (9989_CR17) 1990
P.J. Schweitzer (9989_CR19) 1968; 5
D.P. Bertsekas (9989_CR2) 1996
A. Lazar (9989_CR9) 1983; 28
S. Bhatnagar (9989_CR10) 2010; 59
V.S. Borkar (9989_CR12) 2005; 54
M.W. Hirsch (9989_CR20) 1989; 2
P. Marbach (9989_CR8) 2001; 46
A. Mas-Colell (9989_CR21) 1995
J.C. Spall (9989_CR11) 1992; 37
References_xml – reference: BhatnagarS.An actor–critic algorithm with function approximation for discounted cost constrained Markov decision processesSyst. Control Lett.20105976076627797861209.9034410.1016/j.sysconle.2010.08.013
– reference: BorkarV.S.MeynS.P.The O.D.E. method for convergence of stochastic approximation and reinforcement learningSIAM J. Control Optim.200038244746917411480990.6207110.1137/S0363012997331639
– reference: BhatnagarS.The Borkar-Meyn theorem for asynchronous stochastic approximationsSyst. Control Lett.20116047247828494901222.9322910.1016/j.sysconle.2011.04.002
– reference: AltmanE.Constrained Markov Decision Processes1999LondonChapman and Hall/CRC Press0963.90068
– reference: BhatnagarS.SuttonR.S.GhavamzadehM.LeeM.Natural actor–critic algorithmsAutomatica200945247124821183.9313010.1016/j.automatica.2009.07.008
– reference: TsitsiklisJ.N.Van RoyB.Average cost temporal-difference learningAutomatica199935179918080932.9308510.1016/S0005-1098(99)00099-0
– reference: BorkarV.S.Asynchronous stochastic approximationsSIAM J. Control Optim.199836384085116138690922.6208110.1137/S0363012995282784
– reference: LazarA.Optimal flow control of a class of queuing networks in equilibriumIEEE Trans. Autom. Control198328100110077223510526.9004110.1109/TAC.1983.1103166
– reference: WalrandJ.An Introduction to Queueing Networks1988New JerseyPrentice Hall0854.60090
– reference: KondaV.R.TsitsiklisJ.N.On actor–critic algorithmsSIAM J. Control Optim.20034241143116620447891049.9309510.1137/S0363012901385691
– reference: SchweitzerP.J.Perturbation theory and finite Markov chainsJ. Appl. Probab.196854014132345270196.1980310.2307/3212261
– reference: SuttonR.S.McAllesterD.SinghS.MansourY.Policy gradient methods for reinforcement learning with function approximationAdvances in Neural Information Processing Systems (NIPS)2000CambridgeMIT Press10571063
– reference: Mas-ColellA.WhinstonM.D.GreenJ.R.Microeconomic Theory1995OxfordOxford University Press
– reference: BorkarV.S.Stochastic Approximation: A Dynamical Systems Viewpoint2008CambridgeCambridge University Press and Hindustan Book Agency
– reference: BenvenisteA.MétivierM.PriouretP.Adaptive Algorithms and Stochastic Approximations1990BerlinSpringer0752.93073
– reference: BertsekasD.P.TsitsiklisJ.N.Neuro-Dynamic Programming1996BelmontAthena Scientific0924.68163
– reference: BorkarV.S.An actor–critic algorithm for constrained Markov decision processesSyst. Control Lett.20055420721321155381129.9032210.1016/j.sysconle.2004.08.007
– reference: SuttonR.S.BartoA.Reinforcement Learning: An Introduction1998CambridgeMIT Press
– reference: SpallJ.C.Multivariate stochastic approximation using a simultaneous perturbation gradient approximationIEEE Trans. Autom. Control199237333234111487150745.6011010.1109/9.119632
– reference: HirschM.W.Convergent activation dynamics in continuous time networksNeural Netw.1989233134910.1016/0893-6080(89)90018-X
– reference: MarbachP.TsitsiklisJ.N.Simulation-based optimization of Markov reward processesIEEE Trans. Autom. Control20014619120918145680992.9308810.1109/9.905687
– volume: 35
  start-page: 1799
  year: 1999
  ident: 9989_CR6
  publication-title: Automatica
  doi: 10.1016/S0005-1098(99)00099-0
– volume: 36
  start-page: 840
  issue: 3
  year: 1998
  ident: 9989_CR15
  publication-title: SIAM J. Control Optim.
  doi: 10.1137/S0363012995282784
– volume: 5
  start-page: 401
  year: 1968
  ident: 9989_CR19
  publication-title: J. Appl. Probab.
  doi: 10.2307/3212261
– volume: 54
  start-page: 207
  year: 2005
  ident: 9989_CR12
  publication-title: Syst. Control Lett.
  doi: 10.1016/j.sysconle.2004.08.007
– volume-title: Stochastic Approximation: A Dynamical Systems Viewpoint
  year: 2008
  ident: 9989_CR16
  doi: 10.1007/978-93-86279-38-5
– volume: 38
  start-page: 447
  issue: 2
  year: 2000
  ident: 9989_CR18
  publication-title: SIAM J. Control Optim.
  doi: 10.1137/S0363012997331639
– volume-title: Microeconomic Theory
  year: 1995
  ident: 9989_CR21
– volume: 46
  start-page: 191
  year: 2001
  ident: 9989_CR8
  publication-title: IEEE Trans. Autom. Control
  doi: 10.1109/9.905687
– volume: 59
  start-page: 760
  year: 2010
  ident: 9989_CR10
  publication-title: Syst. Control Lett.
  doi: 10.1016/j.sysconle.2010.08.013
– volume-title: Reinforcement Learning: An Introduction
  year: 1998
  ident: 9989_CR3
– volume: 42
  start-page: 1143
  issue: 4
  year: 2003
  ident: 9989_CR4
  publication-title: SIAM J. Control Optim.
  doi: 10.1137/S0363012901385691
– volume: 60
  start-page: 472
  year: 2011
  ident: 9989_CR14
  publication-title: Syst. Control Lett.
  doi: 10.1016/j.sysconle.2011.04.002
– volume: 28
  start-page: 1001
  year: 1983
  ident: 9989_CR9
  publication-title: IEEE Trans. Autom. Control
  doi: 10.1109/TAC.1983.1103166
– volume-title: Constrained Markov Decision Processes
  year: 1999
  ident: 9989_CR1
– volume: 37
  start-page: 332
  issue: 3
  year: 1992
  ident: 9989_CR11
  publication-title: IEEE Trans. Autom. Control
  doi: 10.1109/9.119632
– start-page: 1057
  volume-title: Advances in Neural Information Processing Systems (NIPS)
  year: 2000
  ident: 9989_CR7
– volume: 45
  start-page: 2471
  year: 2009
  ident: 9989_CR5
  publication-title: Automatica
  doi: 10.1016/j.automatica.2009.07.008
– volume-title: Adaptive Algorithms and Stochastic Approximations
  year: 1990
  ident: 9989_CR17
  doi: 10.1007/978-3-642-75894-2
– volume-title: Neuro-Dynamic Programming
  year: 1996
  ident: 9989_CR2
– volume: 2
  start-page: 331
  year: 1989
  ident: 9989_CR20
  publication-title: Neural Netw.
  doi: 10.1016/0893-6080(89)90018-X
– volume-title: An Introduction to Queueing Networks
  year: 1988
  ident: 9989_CR13
SSID ssj0009874
Score 2.2578378
Snippet We develop an online actor–critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We...
We develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 688
SubjectTerms Algorithms
Applications of Mathematics
Approximation
Asymptotic properties
Calculus of Variations and Optimal Control; Optimization
Costs
Engineering
Lagrange multiplier
Markov analysis
Markov processes
Mathematical analysis
Mathematical models
Mathematics
Mathematics and Statistics
On-line systems
Online
Operations Research/Decision Theory
Optimization
Queues
Random variables
Theory of Computation
SummonAdditionalLinks – databaseName: SpringerLINK
  dbid: RSV
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3JTsMwELWgcIADO6JQkJE4gSKldhzHxwioOECFxKLeoji2oVJJUdNWHPkH_pAvwZOlLQiQ4JbFcSyPnZnM8h5CRyqJm7Ei0mHMKMfThjgS3BycMd_Ya1ap5oXCl7zdDjodcV3WcWdVtnsVksy_1DPFboJBmiRxBOT5sHm0wABsBn7Rb-6nSLtBBb1MHEqoqEKZ33XxWRlNLcwvQdFc17RW_zXKNbRSmpY4LNbCOprT6QZangEctGdXE5TWbBM9hikuoEZxCL7799e3gvkAh72Hvj16fMLgp8Utq_xAgDgEBPKXblHuiK29i4HwM6eZ0ApD3U9_jM9K2h5cFiHobAvdtc5vTy-cknnBSagnho7HfcmaPjOJNee48URiPwWKxNQYHgugUVFcSc6F0MYYKYmbaM8ol1JFfVdpuo1qaT_VOwhLNzBakIRrKoHWQyoueays5WF7MQGtI7cSQZSUsOQw7F40BVSGKY3slEYwpRGro-PJI88FJsdvjRuVXKNye2aQ10atVmYBqaPDyW27sSBaEqe6P4I2FEJ9PrVDPKlkPdvFDy_c_VPrPbREYLHkfp0Gqg0HI72PFpPxsJsNDvKV_QFT-fV5
  priority: 102
  providerName: Springer Nature
Title An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes
URI https://link.springer.com/article/10.1007/s10957-012-9989-5
https://www.proquest.com/docview/1013471582
https://www.proquest.com/docview/1031300633
Volume 153
WOSCitedRecordID wos000303867400009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVPQU
  databaseName: ABI/INFORM Collection
  customDbUrl:
  eissn: 1573-2878
  dateEnd: 20171231
  omitProxy: false
  ssIdentifier: ssj0009874
  issn: 0022-3239
  databaseCode: 7WY
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/abicomplete
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ABI/INFORM Global (OCUL)
  customDbUrl:
  eissn: 1573-2878
  dateEnd: 20171231
  omitProxy: false
  ssIdentifier: ssj0009874
  issn: 0022-3239
  databaseCode: M0C
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/abiglobal
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Advanced Technologies & Aerospace Database
  customDbUrl:
  eissn: 1573-2878
  dateEnd: 20171231
  omitProxy: false
  ssIdentifier: ssj0009874
  issn: 0022-3239
  databaseCode: P5Z
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/hightechjournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Computer Science Database
  customDbUrl:
  eissn: 1573-2878
  dateEnd: 20171231
  omitProxy: false
  ssIdentifier: ssj0009874
  issn: 0022-3239
  databaseCode: K7-
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/compscijour
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Engineering Database
  customDbUrl:
  eissn: 1573-2878
  dateEnd: 20171231
  omitProxy: false
  ssIdentifier: ssj0009874
  issn: 0022-3239
  databaseCode: M7S
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl:
  eissn: 1573-2878
  dateEnd: 20171231
  omitProxy: false
  ssIdentifier: ssj0009874
  issn: 0022-3239
  databaseCode: BENPR
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Research Library
  customDbUrl:
  eissn: 1573-2878
  dateEnd: 20171231
  omitProxy: false
  ssIdentifier: ssj0009874
  issn: 0022-3239
  databaseCode: M2O
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/pqrl
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Science Database
  customDbUrl:
  eissn: 1573-2878
  dateEnd: 20171231
  omitProxy: false
  ssIdentifier: ssj0009874
  issn: 0022-3239
  databaseCode: M2P
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/sciencejournals
  providerName: ProQuest
– providerCode: PRVAVX
  databaseName: SpringerLINK
  customDbUrl:
  eissn: 1573-2878
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0009874
  issn: 0022-3239
  databaseCode: RSV
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3dTxQxEG8QfNAHQNR4iqQmPGGa7LXb7faJrMCFBDku4Af6stl-KQnuIXsQ_3xndrt3aCIvvkzS_WibTNuZzHR-P0K2na2GleOGSRkcS33gzGCYQ0mZBXgGRrUtFH6vxuP8_FxPYsCtidcq-zOxPajd1GKMHHY3Fj0OZc53r34yZI3C7Gqk0HhAVsBQS2QwUJ-_LEB38x6FmTPBhe6zml3pnJZ46ZIzjbeG5J92aeFs_pUfbc3OaO1_J7xOVqPDSYtuhTwhS77eII_vwBBC63iO3do8JaGoaQdASguM6LOODYEWl9-g-9n3HxRjt3QEBhGVSgtEJf910ZVAUvCBKZKAttQT3lGsBZre0v1I5UNjYYJvnpGPo4MPe4cssjEwK1I9Y6nKjBxmMlhw8VRItYXjwfFKhKAqjdQqTjmjlNY-hGAMT6xPg0uEcCJLnBfPyXI9rf0LQk2SB6-5VV4YpPowThlVOfBGoJeQiwFJel2UNkKV47QvywXIMqqvBPWVqL5SDsjO_JerDqfjvo83e5WVccs25UJfA_Jm_ho2G2ZQqtpPb_Abgem_TMAU3_YL424X_xjw5f0DviKPOK7ENrizSZZn1zf-NXlob2cXzfVWu6a3yMq7g_HkFFpHioE8TvZQ8pNWTlCqM5AT-RXk6dmn3weSBl8
linkProvider ProQuest
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1LT9wwEB5RqNRygD6ougVaV2ovrSxl7TiODwhFwAq0y6oHKnFL4xcg0SyQhbZ_qr-xnjx2aSW4cegxL9uJv7EnHs_3AXywpugXlmkqhLc0dp5RjcscUojEh3NhUq0ThUdyPE6Pj9WXBfjd5cLgtspuTKwHajsxuEYerBuTHvsiZdsXlxRVozC62kloNLAYul8_wi9btXWwG_r3I2ODvaOdfdqqClDDYzWlsUy06CfCm-CqSB8rE2BuWcG9l4VCiRArrZZSKee915pFxsXeRpxbnkTW8VDuI1iKeZqgRQ0lnZP8ph3rM6OccdVFUZtUPSVwkyejCncpib_nwblz-088tp7mBqv_2wd6BiutQ02yxgKew4IrX8DyLZrFcHQ446atXoLPStIQrJIMIxa0UXsg2flJeJ3p6XeCa9NkECZ8BC3JkHX951mT4kmCj09Q5LSW1nCWYK7T5IbstlJFpE28cNUafH2Q134Fi-WkdK-B6Cj1TjEjHdcoZaKt1LKwwdsKpfiU9yDq-j43LRU7Nvs8n5NII1zyAJcc4ZKLHnyaPXLR8JDcd_NGB5G8HZKqfI6PHryfXQ6DCUaIitJNrvEejuHNhIcmfu6AeLuIOyp8c3-F7-DJ_tHhKB8djIfr8JShFdQLWRuwOL26dpvw2NxMz6qrt7U9Efj20Pj8A7FiXOQ
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Lb9QwEB6VLUJw4I1YKGAkuIAssnYcxweEAtsVVctqhUDqLcSxTSu12dJsC_w1fh0zeewWJHrrgWMSx4_ksz32eL4P4Jkri1HhhOVKBcdjHwS3tM2hlUoC3sNJtQkU3tHTabq7a2Zr8KuPhaFjlf2Y2AzUbl7SHjn2bgp6HKlUvArdsYjZePLm6BsnBSnytPZyGi1Etv3P77h8q19vjfFfPxdisvnp3XveKQzwUsZmwWOdWDVKVCjRbNEhNiVC3olChqALQ3IhTjurtTE-hGCtiEofBxdJ6WQSOS8x30uwriUuegaw_nZzOvu4ovxNew5owaWQpveptoF7RtGRT8ENnVlSf86KK1P3L-9sM-lNbvzPn-smXO9MbZa1feMWrPnqNlw7Q8CIVx-WrLX1HQhZxVrqVZaRL4O3OhAsO_iKzVnsHTLatWYTNAUIziwjPvYf-23wJ0Prn5H8aSO64R2jKKj5KRt3IkasC8nw9V34fCHNvgeDal75-8BslAZvRKm9tCRyYp22unBoh2EuIZVDiHoc5GVH0k7VPshX9NIEnRyhkxN0cjWEF8tXjlqGkvMSb_RwybvBqs5XWBnC0-VjHGbId1RUfn5CaSQ5PhOJVXzZg_JsFv8o8MH5BT6BKwjLfGdruv0QrgrqEM0O1wYMFscn_hFcLk8X-_Xx465zMfhy0QD9DSlJZzY
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Online+Actor-Critic+Algorithm+with+Function+Approximation+for+Constrained+Markov+Decision+Processes&rft.jtitle=Journal+of+optimization+theory+and+applications&rft.au=Bhatnagar%2C+Shalabh&rft.au=Lakshmanan%2C+K&rft.date=2012-06-01&rft.issn=0022-3239&rft.eissn=1573-2878&rft.volume=153&rft.issue=3&rft.spage=688&rft.epage=708&rft_id=info:doi/10.1007%2Fs10957-012-9989-5&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0022-3239&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0022-3239&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0022-3239&client=summon