Residual Sarsa algorithm with function approximation

In this work, we proposed an efficient algorithm named the residual Sarsa algorithm with function approximation (FARS) to improve the performance of the traditional Sarsa algorithm, and we use the gradient-descent method to update the function parameter vector. In the learning process, the Bellman r...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Cluster computing Ročník 22; číslo Suppl 1; s. 795 - 807
Hlavní autoři: Qiming, Fu, Wen, Hu, Quan, Liu, Heng, Luo, Lingyao, Hu, Jianping, Chen
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York Springer US 01.01.2019
Springer Nature B.V
Témata:
ISSN:1386-7857, 1573-7543
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract In this work, we proposed an efficient algorithm named the residual Sarsa algorithm with function approximation (FARS) to improve the performance of the traditional Sarsa algorithm, and we use the gradient-descent method to update the function parameter vector. In the learning process, the Bellman residual method is adopted to guarantee the convergence of the algorithm, and a new rule for updating vectors of action-value functions is adopted to solve unstable and slow convergence problems. To accelerate the convergence rate of the algorithm, we introduce a new factor, named the forgotten factor, which can help improve the robustness of the algorithm’s performance. Based on two classical reinforcement learning benchmark problems, the experimental results show that the FARS algorithm has better performance than other related reinforcement learning algorithms.
AbstractList In this work, we proposed an efficient algorithm named the residual Sarsa algorithm with function approximation (FARS) to improve the performance of the traditional Sarsa algorithm, and we use the gradient-descent method to update the function parameter vector. In the learning process, the Bellman residual method is adopted to guarantee the convergence of the algorithm, and a new rule for updating vectors of action-value functions is adopted to solve unstable and slow convergence problems. To accelerate the convergence rate of the algorithm, we introduce a new factor, named the forgotten factor, which can help improve the robustness of the algorithm’s performance. Based on two classical reinforcement learning benchmark problems, the experimental results show that the FARS algorithm has better performance than other related reinforcement learning algorithms.
Author Qiming, Fu
Wen, Hu
Jianping, Chen
Quan, Liu
Heng, Luo
Lingyao, Hu
Author_xml – sequence: 1
  givenname: Fu
  surname: Qiming
  fullname: Qiming, Fu
  organization: Institute of Electronics and Information Engineering, Suzhou University of Science and Technology, Jiangsu Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou Key Laboratory of Mobile Networking and Applied Technologies, Suzhou University of Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University
– sequence: 2
  givenname: Hu
  surname: Wen
  fullname: Wen, Hu
  organization: Institute of Electronics and Information Engineering, Suzhou University of Science and Technology, Jiangsu Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou Key Laboratory of Mobile Networking and Applied Technologies, Suzhou University of Science and Technology
– sequence: 3
  givenname: Liu
  surname: Quan
  fullname: Quan, Liu
  organization: School of Computer Science and Technology, Soochow University, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Collaborative Innovation Center of Novel Software Technology and Industrialization
– sequence: 4
  givenname: Luo
  surname: Heng
  fullname: Heng, Luo
  organization: Institute of Electronics and Information Engineering, Suzhou University of Science and Technology, Jiangsu Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou Key Laboratory of Mobile Networking and Applied Technologies, Suzhou University of Science and Technology
– sequence: 5
  givenname: Hu
  surname: Lingyao
  fullname: Lingyao, Hu
  organization: Institute of Electronics and Information Engineering, Suzhou University of Science and Technology, Jiangsu Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou Key Laboratory of Mobile Networking and Applied Technologies, Suzhou University of Science and Technology
– sequence: 6
  givenname: Chen
  surname: Jianping
  fullname: Jianping, Chen
  email: fuqinming276ming@163.com
  organization: Institute of Electronics and Information Engineering, Suzhou University of Science and Technology, Jiangsu Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou Key Laboratory of Mobile Networking and Applied Technologies, Suzhou University of Science and Technology
BookMark eNp9kFtLxDAQhYOs4O7qD_Ct4HN0cmkuj7J4gwXBy3PIpunapdvWpEX996ZWEAR9mWTIfHNOzgLNmrbxCJ0SOCcA8iISyJXAQCQmDBhWB2hOcsmwzDmbpTtLr1Ll8ggtYtwBgJZUzxF_8LEqBltnjzZEm9l624aqf9lnb6lm5dC4vmqbzHZdaN-rvR27Y3RY2jr6k-9ziZ6vr55Wt3h9f3O3ulxjx4josdJWaE61kEpQ7a10ZeEdyzkBqbRjmjBBy41VTsFYATbK-oIWomSFgoIt0dm0N2m_Dj72ZtcOoUmShmqiKOeC0zRFpikX2hiDL00XktHwYQiYMRwzhWNSOGYMx6jEyF-Mq_qvv_XBVvW_JJ3ImFSarQ8_nv6GPgHrpHol
CitedBy_id crossref_primary_10_1007_s10586_022_03742_9
Cites_doi 10.1016/j.fss.2009.05.003
10.1007/978-3-319-26532-2_13
10.1007/978-3-319-46675-0_24
10.1109/ICC.2016.7511405
10.1145/1553374.1553501
10.1007/s10994-011-5251-x
10.1109/IJCNN.2016.7727694
10.1109/ADPRL.2011.5967355
10.1023/A:1007678930559
10.1080/10255810213482
10.1007/s10994-007-5038-2
10.1109/21.229449
10.1016/j.asoc.2015.09.003
10.1145/2739480.2754783
10.1145/276698.276876
10.1145/2576768.2598258
ContentType Journal Article
Copyright Springer Science+Business Media, LLC 2017
Springer Science+Business Media, LLC 2017.
Copyright_xml – notice: Springer Science+Business Media, LLC 2017
– notice: Springer Science+Business Media, LLC 2017.
DBID AAYXX
CITATION
8FE
8FG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
GNUQQ
HCIFZ
JQ2
K7-
P5Z
P62
PHGZM
PHGZT
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
DOI 10.1007/s10586-017-1303-8
DatabaseName CrossRef
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central UK/Ireland
Advanced Technologies & Computer Science Collection
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One Community College
ProQuest Central
ProQuest Central Student
SciTech Premium Collection
ProQuest Computer Science Collection
Computer Science Database
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic (New)
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
DatabaseTitle CrossRef
Advanced Technologies & Aerospace Collection
Computer Science Database
ProQuest Central Student
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
ProQuest One Academic Eastern Edition
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central
Advanced Technologies & Aerospace Database
ProQuest One Applied & Life Sciences
ProQuest One Academic UKI Edition
ProQuest Central Korea
ProQuest Central (New)
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList Advanced Technologies & Aerospace Collection

Database_xml – sequence: 1
  dbid: P5Z
  name: Advanced Technologies & Aerospace Database
  url: https://search.proquest.com/hightechjournals
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1573-7543
EndPage 807
ExternalDocumentID 10_1007_s10586_017_1303_8
GrantInformation_xml – fundername: Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University
  grantid: 93K172014K04
– fundername: National Natural Science Foundation of China
  grantid: 61373094; 61472262
  funderid: http://dx.doi.org/10.13039/501100001809
– fundername: National Natural Science Foundation of China
  grantid: 61272005; 61303108
  funderid: http://dx.doi.org/10.13039/501100001809
– fundername: National Natural Science Foundation of China
  grantid: 61672371; 61602334
  funderid: http://dx.doi.org/10.13039/501100001809
– fundername: Fundation of Ministry of Housing and Urban-Rural Development of the People’s Republic of China
  grantid: 2015-K1-047
– fundername: High School Natural Foundation of Jiangsu
  grantid: BK2012616; 13KJB520020
– fundername: National Natural Science Foundation of China
  grantid: 61502329; 61502323
  funderid: http://dx.doi.org/10.13039/501100001809
– fundername: Natural Science Foundation of Jiangsu
  grantid: BK20140283
– fundername: Suzhou Industrial application of basic research program part
  grantid: SYG201422
GroupedDBID -59
-5G
-BR
-EM
-~C
.86
.DC
.VR
06D
0R~
0VY
1N0
203
29B
2J2
2JN
2JY
2KG
2LR
2~H
30V
4.4
406
408
409
40D
40E
5GY
5VS
67Z
6NX
78A
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHNG
AAJBT
AAJKR
AANZL
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYQN
AAYZH
ABAKF
ABBBX
ABBXA
ABDZT
ABECU
ABFTD
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABWNU
ABXPI
ACAOD
ACDTI
ACGFS
ACHSB
ACHXU
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACPIV
ACSNA
ACZOJ
ADHHG
ADHIR
ADINQ
ADKNI
ADKPE
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEMSY
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFKRA
AFLOW
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHYZX
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARAPS
ARMRJ
ASPBG
AVWKF
AXYYD
AYJHY
AZFZN
B-.
BA0
BDATZ
BENPR
BGLVJ
BGNMA
BSONS
CCPQU
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
EBLON
EBS
EIOEI
EJD
ESBYG
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNWQR
GQ6
GQ7
GQ8
GXS
H13
HCIFZ
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
I09
IJ-
IKXTQ
IWAJR
IXC
IXD
IXE
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
K7-
KDC
KOV
LAK
LLZTM
M4Y
MA-
NB0
NPVJJ
NQJWS
NU0
O93
O9J
OAM
P9O
PF0
PT4
PT5
QOS
R89
R9I
RNS
ROL
RPX
RSV
S16
S27
S3B
SAP
SCO
SDH
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
TSG
TSK
TSV
TUC
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
W23
W48
WK8
YLTOR
Z45
Z7R
Z7X
Z7Z
Z81
Z83
Z88
ZMTXR
~A9
-Y2
1SB
2P1
2VQ
AAIAL
AAPKM
AARHV
AAYTO
AAYXX
ABBRH
ABDBE
ABQSL
ABRTQ
ABULA
ACBXY
ADHKG
ADKFA
AEBTG
AEKMD
AFDZB
AFFHD
AFGCZ
AFOHR
AGGDS
AGQPQ
AHPBZ
AHSBF
AJBLW
ATHPR
AYFIA
CAG
CITATION
COF
HZ~
IHE
N2Q
O9-
OVD
PHGZM
PHGZT
PQGLB
RNI
RZC
RZE
RZK
S1Z
TEORI
8FE
8FG
AZQEC
DWQXO
GNUQQ
JQ2
P62
PKEHL
PQEST
PQQKQ
PQUKI
ID FETCH-LOGICAL-c316t-89a69429678629ea7cfdec35410789c391362fba8c80ba8c00b8aed2d6f3d80d3
IEDL.DBID K7-
ISICitedReferencesCount 2
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000480653200068&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1386-7857
IngestDate Wed Nov 26 14:53:20 EST 2025
Sat Nov 29 05:40:09 EST 2025
Tue Nov 18 22:02:29 EST 2025
Fri Feb 21 02:36:57 EST 2025
IsPeerReviewed true
IsScholarly true
Issue Suppl 1
Keywords Sarsa algorithm
Gradient descent
Function approximation
Bellman residual
Reinforcement learning
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c316t-89a69429678629ea7cfdec35410789c391362fba8c80ba8c00b8aed2d6f3d80d3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
PQID 2918244642
PQPubID 2043865
PageCount 13
ParticipantIDs proquest_journals_2918244642
crossref_primary_10_1007_s10586_017_1303_8
crossref_citationtrail_10_1007_s10586_017_1303_8
springer_journals_10_1007_s10586_017_1303_8
PublicationCentury 2000
PublicationDate 2019-01-01
PublicationDateYYYYMMDD 2019-01-01
PublicationDate_xml – month: 01
  year: 2019
  text: 2019-01-01
  day: 01
PublicationDecade 2010
PublicationPlace New York
PublicationPlace_xml – name: New York
– name: Dordrecht
PublicationSubtitle The Journal of Networks, Software Tools and Applications
PublicationTitle Cluster computing
PublicationTitleAbbrev Cluster Comput
PublicationYear 2019
Publisher Springer US
Springer Nature B.V
Publisher_xml – name: Springer US
– name: Springer Nature B.V
References DerhamiVMajdVJAhmadabadiMNExploration and exploitation balance management in fuzzy reinforcement learningFuzzy Sets Syst.20101614578595257658710.1016/j.fss.2009.05.003
LiuQLiJFuQMA multiple-goal Sarsa(λ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda )$$\end{document} algorithm based on lost reward of greatest massJ. Electron.201341814691473
Sutton, R.S., Maei, H.R., Szepesvári, C. et al.: A convergent O(n) Temporal-difference algorithm for Off-policy learning with linear function approximation. In: Proceedings of the Advances Neural Information Processing Systems, Vancouver, Canada (2009)
Maei, H.R, Szepesvari, C., Bhatnagar, S. et al.: Toward off-policy learning control with function approixamtion. In: Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel (2010)
XiaoFLiuQFuQMGradient descent Sarsa(λ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda )$$\end{document} algorithm based on the adaptive potential function shaping reward mechanismJ. Commun.201317788
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the 30th annual ACM symposium on Theory of computing, New York, USA (1998)
You, S.H., Liu, Q., Fu, Q.M., et al.: A Bayesian Sarsa learning algorithm with bandit-based method. In: Proceedings of International Conference on Neural Information Processing (2015)
Akimoto, Y., Auger, A., Hansen N.: Comparison-based natural gradient optimization in high dimension. In: Proceedings of the Annual Conference on Genetic and Evolutionary Computation. Vancouver, Canada (2014)
SuttonRSBartoAGReinforcement learning: an introduction1998CambridgeMIT press1407.68009
SuttonRSLearning to predict by the method of temporal differencesMach. Learn.19883944
SinghSJaakkolaTLittmanMLConvergence results for single-step on-policy reinforcement-learning algorithmsMach. Learn.200038328730810.1023/A:10076789305590954.68127
ChettibiSChikhiSDynamic fuzzy logic and reinforcement learning for adaptive energy efficient routing in mobile ad-hoc networksAppl. Soft Comput.20163832132810.1016/j.asoc.2015.09.003
Veeriah, V., Van Seijen, H., Sutton, R.S.: Forward actor-Critic for nonlinear function approximation in reinforcement learning. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, São Paulo, Brazil (2017)
FuQMLiuQYouSHA novel fast Sarsa algorithm based on value function transferJ. Electron.2014421121572161
Saadatjou, F., Derhami, V., Majd, V.: Balance of exploration and exploitation in deterministic and stochastic environment in reinforcement learning. In: Proceedings of the 11th Annual Computer Society of Iran Computer Conference, Tehran, Iran (2006)
KalyanakrishSStonePCharacterizing reinforcement learning methods through parameterized learning problemsMach. Learn.2011841–2205247310822310.1007/s10994-011-5251-x
Jaśkowski, W., Szubert, M., Liskowski, P. et al.: High-dimensional function approximation for knowledge-free reinforcement learning: a case study in SZ-Tetris. In: Proceedings of the Annual Conference on Genetic and Evolutionary Computation, New York, USA (2015)
BusoniuLBabuskaRDe SchutterBReinforcement learning and dynamic programming using function approximators2010New YorkCRC Press
Sutton, R.S., Hamid, R.M., Precup, D.: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings of the 26th International Conference on Machine Learning, New York, USA (2009)
Go, C.K., Lao, B., Yoshimoto J., et al.: A reinforcement learning approach to the shepherding task using Sarsa. In: Proceedings of International Joint Conference on Neural Networks (IJCNN), Kuala Lumpur, Malaysia (2016)
AntosASzepesvariCMounosRLearning near-optimal polices with bellman-residual minimization based fitted policy iteration and a single sample pathMach. Learn.20087118912910.1007/s10994-007-5038-2
Zhu, H., Zhu, F., Fu, Y., et al.: A kernel-based Sarsa(λ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda )$$\end{document} algorithm with clustering-based sample sparsification. In: Proceedings of International Conference on Neural Information Processing, Kyoto, Japan (2016
Van Seijen, H.: Effective multi-step temporal-difference learning for non-linear function approximation. arXiv preprint arXiv:1608.05151 (2016)
Geist, M., Pietquin, O.: Parametric value function approximation. In: Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Paris, France (2011)
BarnardETemporal-difference methods and Markov modelsIEEE Trans. Syst. Man Cybern.199323235736510.1109/21.2294490800.68721
YenGYangFHickeyTCoordination of exploration and exploitation in a dynamic environmentInt. J. Smart Eng. Syst. Des.20024317718210.1080/10255810213482
Ortiz, A., Al-Shatri, H., Li, X., et al.: Reinforcement learning for energy harvesting point-to-point communications. In: Proceedings of IEEE International Conference on Communications (ICC), Kuala Lumpur, Malaysia (2016)
LiuQFuQMGongSRFuYCCuiZMReinforcement learning algorithm based on minimum state method and average rewardJ. Commun.20113216671
S Kalyanakrish (1303_CR23) 2011; 84
1303_CR20
1303_CR22
1303_CR21
F Xiao (1303_CR12) 2013; 1
1303_CR24
1303_CR26
1303_CR25
1303_CR4
A Antos (1303_CR15) 2008; 71
1303_CR7
1303_CR6
RS Sutton (1303_CR1) 1998
E Barnard (1303_CR28) 1993; 23
V Derhami (1303_CR9) 2010; 161
L Busoniu (1303_CR16) 2010
1303_CR10
1303_CR14
RS Sutton (1303_CR3) 1988; 3
1303_CR17
S Chettibi (1303_CR5) 2016; 38
1303_CR19
1303_CR18
S Singh (1303_CR27) 2000; 38
Q Liu (1303_CR2) 2011; 32
Q Liu (1303_CR11) 2013; 41
QM Fu (1303_CR13) 2014; 42
G Yen (1303_CR8) 2002; 4
References_xml – reference: Ortiz, A., Al-Shatri, H., Li, X., et al.: Reinforcement learning for energy harvesting point-to-point communications. In: Proceedings of IEEE International Conference on Communications (ICC), Kuala Lumpur, Malaysia (2016)
– reference: Saadatjou, F., Derhami, V., Majd, V.: Balance of exploration and exploitation in deterministic and stochastic environment in reinforcement learning. In: Proceedings of the 11th Annual Computer Society of Iran Computer Conference, Tehran, Iran (2006)
– reference: BarnardETemporal-difference methods and Markov modelsIEEE Trans. Syst. Man Cybern.199323235736510.1109/21.2294490800.68721
– reference: Jaśkowski, W., Szubert, M., Liskowski, P. et al.: High-dimensional function approximation for knowledge-free reinforcement learning: a case study in SZ-Tetris. In: Proceedings of the Annual Conference on Genetic and Evolutionary Computation, New York, USA (2015)
– reference: LiuQFuQMGongSRFuYCCuiZMReinforcement learning algorithm based on minimum state method and average rewardJ. Commun.20113216671
– reference: Zhu, H., Zhu, F., Fu, Y., et al.: A kernel-based Sarsa(λ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda )$$\end{document} algorithm with clustering-based sample sparsification. In: Proceedings of International Conference on Neural Information Processing, Kyoto, Japan (2016)
– reference: KalyanakrishSStonePCharacterizing reinforcement learning methods through parameterized learning problemsMach. Learn.2011841–2205247310822310.1007/s10994-011-5251-x
– reference: Veeriah, V., Van Seijen, H., Sutton, R.S.: Forward actor-Critic for nonlinear function approximation in reinforcement learning. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, São Paulo, Brazil (2017)
– reference: XiaoFLiuQFuQMGradient descent Sarsa(λ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda )$$\end{document} algorithm based on the adaptive potential function shaping reward mechanismJ. Commun.201317788
– reference: Geist, M., Pietquin, O.: Parametric value function approximation. In: Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Paris, France (2011)
– reference: Maei, H.R, Szepesvari, C., Bhatnagar, S. et al.: Toward off-policy learning control with function approixamtion. In: Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel (2010)
– reference: ChettibiSChikhiSDynamic fuzzy logic and reinforcement learning for adaptive energy efficient routing in mobile ad-hoc networksAppl. Soft Comput.20163832132810.1016/j.asoc.2015.09.003
– reference: SinghSJaakkolaTLittmanMLConvergence results for single-step on-policy reinforcement-learning algorithmsMach. Learn.200038328730810.1023/A:10076789305590954.68127
– reference: Van Seijen, H.: Effective multi-step temporal-difference learning for non-linear function approximation. arXiv preprint arXiv:1608.05151 (2016)
– reference: Akimoto, Y., Auger, A., Hansen N.: Comparison-based natural gradient optimization in high dimension. In: Proceedings of the Annual Conference on Genetic and Evolutionary Computation. Vancouver, Canada (2014)
– reference: You, S.H., Liu, Q., Fu, Q.M., et al.: A Bayesian Sarsa learning algorithm with bandit-based method. In: Proceedings of International Conference on Neural Information Processing (2015)
– reference: SuttonRSBartoAGReinforcement learning: an introduction1998CambridgeMIT press1407.68009
– reference: SuttonRSLearning to predict by the method of temporal differencesMach. Learn.19883944
– reference: Sutton, R.S., Maei, H.R., Szepesvári, C. et al.: A convergent O(n) Temporal-difference algorithm for Off-policy learning with linear function approximation. In: Proceedings of the Advances Neural Information Processing Systems, Vancouver, Canada (2009)
– reference: Sutton, R.S., Hamid, R.M., Precup, D.: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings of the 26th International Conference on Machine Learning, New York, USA (2009)
– reference: Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the 30th annual ACM symposium on Theory of computing, New York, USA (1998)
– reference: LiuQLiJFuQMA multiple-goal Sarsa(λ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda )$$\end{document} algorithm based on lost reward of greatest massJ. Electron.201341814691473
– reference: BusoniuLBabuskaRDe SchutterBReinforcement learning and dynamic programming using function approximators2010New YorkCRC Press
– reference: Go, C.K., Lao, B., Yoshimoto J., et al.: A reinforcement learning approach to the shepherding task using Sarsa. In: Proceedings of International Joint Conference on Neural Networks (IJCNN), Kuala Lumpur, Malaysia (2016)
– reference: FuQMLiuQYouSHA novel fast Sarsa algorithm based on value function transferJ. Electron.2014421121572161
– reference: YenGYangFHickeyTCoordination of exploration and exploitation in a dynamic environmentInt. J. Smart Eng. Syst. Des.20024317718210.1080/10255810213482
– reference: DerhamiVMajdVJAhmadabadiMNExploration and exploitation balance management in fuzzy reinforcement learningFuzzy Sets Syst.20101614578595257658710.1016/j.fss.2009.05.003
– reference: AntosASzepesvariCMounosRLearning near-optimal polices with bellman-residual minimization based fitted policy iteration and a single sample pathMach. Learn.20087118912910.1007/s10994-007-5038-2
– volume: 161
  start-page: 578
  issue: 4
  year: 2010
  ident: 1303_CR9
  publication-title: Fuzzy Sets Syst.
  doi: 10.1016/j.fss.2009.05.003
– ident: 1303_CR10
  doi: 10.1007/978-3-319-26532-2_13
– ident: 1303_CR14
  doi: 10.1007/978-3-319-46675-0_24
– ident: 1303_CR6
  doi: 10.1109/ICC.2016.7511405
– ident: 1303_CR20
  doi: 10.1145/1553374.1553501
– ident: 1303_CR21
  doi: 10.1145/1553374.1553501
– volume: 84
  start-page: 205
  issue: 1–2
  year: 2011
  ident: 1303_CR23
  publication-title: Mach. Learn.
  doi: 10.1007/s10994-011-5251-x
– volume: 1
  start-page: 77
  year: 2013
  ident: 1303_CR12
  publication-title: J. Commun.
– ident: 1303_CR4
  doi: 10.1109/IJCNN.2016.7727694
– ident: 1303_CR18
  doi: 10.1109/ADPRL.2011.5967355
– volume-title: Reinforcement learning: an introduction
  year: 1998
  ident: 1303_CR1
– volume: 42
  start-page: 2157
  issue: 11
  year: 2014
  ident: 1303_CR13
  publication-title: J. Electron.
– volume: 38
  start-page: 287
  issue: 3
  year: 2000
  ident: 1303_CR27
  publication-title: Mach. Learn.
  doi: 10.1023/A:1007678930559
– volume: 4
  start-page: 177
  issue: 3
  year: 2002
  ident: 1303_CR8
  publication-title: Int. J. Smart Eng. Syst. Des.
  doi: 10.1080/10255810213482
– ident: 1303_CR7
– ident: 1303_CR26
– volume: 71
  start-page: 89
  issue: 1
  year: 2008
  ident: 1303_CR15
  publication-title: Mach. Learn.
  doi: 10.1007/s10994-007-5038-2
– volume: 23
  start-page: 357
  issue: 2
  year: 1993
  ident: 1303_CR28
  publication-title: IEEE Trans. Syst. Man Cybern.
  doi: 10.1109/21.229449
– volume: 38
  start-page: 321
  year: 2016
  ident: 1303_CR5
  publication-title: Appl. Soft Comput.
  doi: 10.1016/j.asoc.2015.09.003
– ident: 1303_CR22
– volume: 3
  start-page: 9
  year: 1988
  ident: 1303_CR3
  publication-title: Mach. Learn.
– ident: 1303_CR24
  doi: 10.1145/2739480.2754783
– volume: 41
  start-page: 1469
  issue: 8
  year: 2013
  ident: 1303_CR11
  publication-title: J. Electron.
– volume: 32
  start-page: 66
  issue: 1
  year: 2011
  ident: 1303_CR2
  publication-title: J. Commun.
– ident: 1303_CR17
  doi: 10.1145/276698.276876
– ident: 1303_CR19
  doi: 10.1145/2576768.2598258
– volume-title: Reinforcement learning and dynamic programming using function approximators
  year: 2010
  ident: 1303_CR16
– ident: 1303_CR25
SSID ssj0009729
Score 2.1562595
Snippet In this work, we proposed an efficient algorithm named the residual Sarsa algorithm with function approximation (FARS) to improve the performance of the...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 795
SubjectTerms Algorithms
Approximation
Computer Communication Networks
Computer Science
Convergence
Distance learning
Fuzzy logic
Machine learning
Mathematical analysis
Operating Systems
Performance enhancement
Processor Architectures
SummonAdditionalLinks – databaseName: SpringerLINK Contemporary 1997-Present
  dbid: RSV
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8QwEA66evDi-sTVVXLwpASSpm2So4iLB1lkV2VvJa_qwrorbRV_vkkfVkUFvfTSSSiZTL6ZTOcbAI6ZUYymkiKjMUWhDBRSjBikiVLGHYvOokoS1ys2HPLJRFzXddx587d7k5IsT-oPxW4R99EvQ_7cRXwZrDi0494aR-O7lmmXla3JCHXCjEesSWV-N8VnMGo9zC9J0RJrBt1_feUGWK9dS3hW7YVNsGTnW6DbtG2AtRVvg3Bk87IEC45dVCuhnN0vsmnx8Aj9pSz0SOe1BUu68ddpVdu4A24HFzfnl6hunoA0JXGBuJCxcGDjwCgOhJVMp8ZqGoXEE8xrKoiDrlRJrjn2T4wVl9YEJk6p4djQXdCZL-Z2D0CuYhUHWEWRG55aIgWOLNcx4TZIpeA9gJtVTHTNLO4bXMySlhPZr0riVsWnz2jihpy8D3mqaDV-E-43qklqC8uTQLjIyMWyYdADp40q2tc_Trb_J-kDsOY8JFHdufRBp8ie7SFY1S_FNM-Oyo33BmQM0ZY
  priority: 102
  providerName: Springer Nature
Title Residual Sarsa algorithm with function approximation
URI https://link.springer.com/article/10.1007/s10586-017-1303-8
https://www.proquest.com/docview/2918244642
Volume 22
WOSCitedRecordID wos000480653200068&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVPQU
  databaseName: Advanced Technologies & Aerospace Database
  customDbUrl:
  eissn: 1573-7543
  dateEnd: 20241207
  omitProxy: false
  ssIdentifier: ssj0009729
  issn: 1386-7857
  databaseCode: P5Z
  dateStart: 19980101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/hightechjournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Computer Science Database
  customDbUrl:
  eissn: 1573-7543
  dateEnd: 20241207
  omitProxy: false
  ssIdentifier: ssj0009729
  issn: 1386-7857
  databaseCode: K7-
  dateStart: 19980101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/compscijour
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central (subscription)
  customDbUrl:
  eissn: 1573-7543
  dateEnd: 20241207
  omitProxy: false
  ssIdentifier: ssj0009729
  issn: 1386-7857
  databaseCode: BENPR
  dateStart: 19980101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVAVX
  databaseName: SpringerLINK Contemporary 1997-Present
  customDbUrl:
  eissn: 1573-7543
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0009729
  issn: 1386-7857
  databaseCode: RSV
  dateStart: 19980101
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3NS8MwFA-6efDi_MTpHD14UoJNsjbJSVQ2BGWMTWV4KfmqDuY21yr--Sb9sCjoxcu7tHmU95L32fweAMdUS0piQaBWPoEdgSWUFGmokJTamkV7ojIQ11va77PxmA-KgltS_FZZ2sTMUOu5cjXyM8xtJGxzlw4-X7xCNzXKdVeLERqroI4wRm6f31BYge7SbEoZIiyElAW07GrmV-cC5nJpCp0Vh-y7X6qCzR_90czt9Br__eBNsFEEnN5FvkO2wIqZbYNGOczBK872DugMTZJdzPJGNtcVnpg-WW7p84vnSrWe839Oh14GQv4xyW887oL7Xvfu6hoWIxWgIihMIeMi5NYFWRcVYm4EVbE2igQd5GDnFeHIOrRYCqaY76jvSyaMxjqMiWa-JnugNpvPzD7wmAxliH0ZBHZ5bJDgfmCYChEzOBacNYFfCjRSBd64G3sxjSqkZKeDyOrANdVIZJecfC1Z5GAbf73cKuUeFecuiSqhN8Fpqbnq8a_MDv5mdgjWbaDE89JLC9TS5Zs5AmvqPZ0kyzaoX3b7g2E723yWDoJHS4ejh0_ZAt-a
linkProvider ProQuest
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1JS8UwEB7cQC_u4m4OelGCbdI2yUFEXFDe8yEu4K1mqwr6XN5z-1P-RpMuFgW9efDSS5sQOpNvZjKZbwCWmVGMZpJiowOKI0kUViw0WIdKGQeLbkflJK5N1mrx83Nx1APvVS2Mv1ZZYWIO1OZO-zPydSKcJ-xil4hs3j9g3zXKZ1erFhqFWjTs24sL2TobBztOviuE7O2ebu_jsqsA1jRMupgLmQiHwg6lEyKsZDozVtM4Cj3zuqYidJieKck1D_wzCBSX1hCTZNTwwFA3by_0R5Qzz9XfYLgm-WV5V7SQ8gQzHrMqi1qU6sXcx-4Me6uB-Vc7WDu33_KxuZnbG_lvP2gUhkuHGm0VO2AMemx7HEaqZhWoxK4JiI5tJy88QyculpdI3ly61XevbpE_ikbevnsdRTnJ-ut1UdE5CWd_svYp6Gvfte00IK4SlZBAxbEbntlQiiC2XCchtySTgs9AUAkw1SWfum_rcZPWTNBe5qmTuU8a0tQNWf0ccl-Qifz28Xwl57TElU5aC3kG1ipNqV__ONns75MtweD-6WEzbR60GnMw5JxCURwzzUNf9_HJLsCAfu5edx4Xc4VHcPHXCvQBAoU42Q
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3dS8MwEA86RXxxfuJ0ah58UoJt0zbJo6hDcYzhVPZW8lUdzG6sVfzzTfphVVQQX_rSu1Ducrm7XO93ABwSJQiOOUZKOhj53BNIEFch6QqhzLFoLCoHce2SXo8Oh6xfzjlNq7_dq5Jk0dNgUZqS7GSq4pMPjW8BtZkwQfYMRnQeLPh2ZpBN1wf3NeouyceUudgQExqQqqz53RKfHVMdbX4pkOZ-p9P89xevgpUy5ISnxR5ZA3M6WQfNapwDLK17A_g3Os1bs-DAZLsc8vHDZDbKHp-gvayF1gNaLcIchvx1VPQ8boK7zsXt2SUqhyogid0wQ5TxkBknZJxU6DHNiYyVljgw4iOUScxc49Jiwamkjn06jqBcK0-FMVbUUXgLNJJJorcBpCIUoeeIIDDssXY5cwJNZehS7cWc0RZwKolGskQct4MvxlGNlWylEhmp2LIajgzL0TvLtIDb-I24XakpKi0vjTxmMiaT4_peCxxXaqlf_7jYzp-oD8BS_7wTda9617tg2QRRrLiWaYNGNnvWe2BRvmSjdLaf78c3iz3dXg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Residual+Sarsa+algorithm+with+function+approximation&rft.jtitle=Cluster+computing&rft.au=Qiming%2C+Fu&rft.au=Wen%2C+Hu&rft.au=Quan%2C+Liu&rft.au=Heng%2C+Luo&rft.date=2019-01-01&rft.pub=Springer+Nature+B.V&rft.issn=1386-7857&rft.eissn=1573-7543&rft.volume=22&rft.spage=795&rft.epage=807&rft_id=info:doi/10.1007%2Fs10586-017-1303-8
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1386-7857&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1386-7857&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1386-7857&client=summon