Residual Sarsa algorithm with function approximation
In this work, we proposed an efficient algorithm named the residual Sarsa algorithm with function approximation (FARS) to improve the performance of the traditional Sarsa algorithm, and we use the gradient-descent method to update the function parameter vector. In the learning process, the Bellman r...
Uloženo v:
| Vydáno v: | Cluster computing Ročník 22; číslo Suppl 1; s. 795 - 807 |
|---|---|
| Hlavní autoři: | , , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
New York
Springer US
01.01.2019
Springer Nature B.V |
| Témata: | |
| ISSN: | 1386-7857, 1573-7543 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | In this work, we proposed an efficient algorithm named the residual Sarsa algorithm with function approximation (FARS) to improve the performance of the traditional Sarsa algorithm, and we use the gradient-descent method to update the function parameter vector. In the learning process, the Bellman residual method is adopted to guarantee the convergence of the algorithm, and a new rule for updating vectors of action-value functions is adopted to solve unstable and slow convergence problems. To accelerate the convergence rate of the algorithm, we introduce a new factor, named the forgotten factor, which can help improve the robustness of the algorithm’s performance. Based on two classical reinforcement learning benchmark problems, the experimental results show that the FARS algorithm has better performance than other related reinforcement learning algorithms. |
|---|---|
| AbstractList | In this work, we proposed an efficient algorithm named the residual Sarsa algorithm with function approximation (FARS) to improve the performance of the traditional Sarsa algorithm, and we use the gradient-descent method to update the function parameter vector. In the learning process, the Bellman residual method is adopted to guarantee the convergence of the algorithm, and a new rule for updating vectors of action-value functions is adopted to solve unstable and slow convergence problems. To accelerate the convergence rate of the algorithm, we introduce a new factor, named the forgotten factor, which can help improve the robustness of the algorithm’s performance. Based on two classical reinforcement learning benchmark problems, the experimental results show that the FARS algorithm has better performance than other related reinforcement learning algorithms. |
| Author | Qiming, Fu Wen, Hu Jianping, Chen Quan, Liu Heng, Luo Lingyao, Hu |
| Author_xml | – sequence: 1 givenname: Fu surname: Qiming fullname: Qiming, Fu organization: Institute of Electronics and Information Engineering, Suzhou University of Science and Technology, Jiangsu Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou Key Laboratory of Mobile Networking and Applied Technologies, Suzhou University of Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University – sequence: 2 givenname: Hu surname: Wen fullname: Wen, Hu organization: Institute of Electronics and Information Engineering, Suzhou University of Science and Technology, Jiangsu Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou Key Laboratory of Mobile Networking and Applied Technologies, Suzhou University of Science and Technology – sequence: 3 givenname: Liu surname: Quan fullname: Quan, Liu organization: School of Computer Science and Technology, Soochow University, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Collaborative Innovation Center of Novel Software Technology and Industrialization – sequence: 4 givenname: Luo surname: Heng fullname: Heng, Luo organization: Institute of Electronics and Information Engineering, Suzhou University of Science and Technology, Jiangsu Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou Key Laboratory of Mobile Networking and Applied Technologies, Suzhou University of Science and Technology – sequence: 5 givenname: Hu surname: Lingyao fullname: Lingyao, Hu organization: Institute of Electronics and Information Engineering, Suzhou University of Science and Technology, Jiangsu Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou Key Laboratory of Mobile Networking and Applied Technologies, Suzhou University of Science and Technology – sequence: 6 givenname: Chen surname: Jianping fullname: Jianping, Chen email: fuqinming276ming@163.com organization: Institute of Electronics and Information Engineering, Suzhou University of Science and Technology, Jiangsu Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou Key Laboratory of Mobile Networking and Applied Technologies, Suzhou University of Science and Technology |
| BookMark | eNp9kFtLxDAQhYOs4O7qD_Ct4HN0cmkuj7J4gwXBy3PIpunapdvWpEX996ZWEAR9mWTIfHNOzgLNmrbxCJ0SOCcA8iISyJXAQCQmDBhWB2hOcsmwzDmbpTtLr1Ll8ggtYtwBgJZUzxF_8LEqBltnjzZEm9l624aqf9lnb6lm5dC4vmqbzHZdaN-rvR27Y3RY2jr6k-9ziZ6vr55Wt3h9f3O3ulxjx4josdJWaE61kEpQ7a10ZeEdyzkBqbRjmjBBy41VTsFYATbK-oIWomSFgoIt0dm0N2m_Dj72ZtcOoUmShmqiKOeC0zRFpikX2hiDL00XktHwYQiYMRwzhWNSOGYMx6jEyF-Mq_qvv_XBVvW_JJ3ImFSarQ8_nv6GPgHrpHol |
| CitedBy_id | crossref_primary_10_1007_s10586_022_03742_9 |
| Cites_doi | 10.1016/j.fss.2009.05.003 10.1007/978-3-319-26532-2_13 10.1007/978-3-319-46675-0_24 10.1109/ICC.2016.7511405 10.1145/1553374.1553501 10.1007/s10994-011-5251-x 10.1109/IJCNN.2016.7727694 10.1109/ADPRL.2011.5967355 10.1023/A:1007678930559 10.1080/10255810213482 10.1007/s10994-007-5038-2 10.1109/21.229449 10.1016/j.asoc.2015.09.003 10.1145/2739480.2754783 10.1145/276698.276876 10.1145/2576768.2598258 |
| ContentType | Journal Article |
| Copyright | Springer Science+Business Media, LLC 2017 Springer Science+Business Media, LLC 2017. |
| Copyright_xml | – notice: Springer Science+Business Media, LLC 2017 – notice: Springer Science+Business Media, LLC 2017. |
| DBID | AAYXX CITATION 8FE 8FG AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO GNUQQ HCIFZ JQ2 K7- P5Z P62 PHGZM PHGZT PKEHL PQEST PQGLB PQQKQ PQUKI |
| DOI | 10.1007/s10586-017-1303-8 |
| DatabaseName | CrossRef ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central UK/Ireland Advanced Technologies & Computer Science Collection ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central ProQuest Central Student SciTech Premium Collection ProQuest Computer Science Collection Computer Science Database Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic (New) ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition |
| DatabaseTitle | CrossRef Advanced Technologies & Aerospace Collection Computer Science Database ProQuest Central Student Technology Collection ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection ProQuest One Academic Eastern Edition SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central Advanced Technologies & Aerospace Database ProQuest One Applied & Life Sciences ProQuest One Academic UKI Edition ProQuest Central Korea ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New) |
| DatabaseTitleList | Advanced Technologies & Aerospace Collection |
| Database_xml | – sequence: 1 dbid: P5Z name: Advanced Technologies & Aerospace Database url: https://search.proquest.com/hightechjournals sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1573-7543 |
| EndPage | 807 |
| ExternalDocumentID | 10_1007_s10586_017_1303_8 |
| GrantInformation_xml | – fundername: Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University grantid: 93K172014K04 – fundername: National Natural Science Foundation of China grantid: 61373094; 61472262 funderid: http://dx.doi.org/10.13039/501100001809 – fundername: National Natural Science Foundation of China grantid: 61272005; 61303108 funderid: http://dx.doi.org/10.13039/501100001809 – fundername: National Natural Science Foundation of China grantid: 61672371; 61602334 funderid: http://dx.doi.org/10.13039/501100001809 – fundername: Fundation of Ministry of Housing and Urban-Rural Development of the People’s Republic of China grantid: 2015-K1-047 – fundername: High School Natural Foundation of Jiangsu grantid: BK2012616; 13KJB520020 – fundername: National Natural Science Foundation of China grantid: 61502329; 61502323 funderid: http://dx.doi.org/10.13039/501100001809 – fundername: Natural Science Foundation of Jiangsu grantid: BK20140283 – fundername: Suzhou Industrial application of basic research program part grantid: SYG201422 |
| GroupedDBID | -59 -5G -BR -EM -~C .86 .DC .VR 06D 0R~ 0VY 1N0 203 29B 2J2 2JN 2JY 2KG 2LR 2~H 30V 4.4 406 408 409 40D 40E 5GY 5VS 67Z 6NX 78A 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAJBT AAJKR AANZL AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYQN AAYZH ABAKF ABBBX ABBXA ABDZT ABECU ABFTD ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABSXP ABTEG ABTHY ABTKH ABTMW ABWNU ABXPI ACAOD ACDTI ACGFS ACHSB ACHXU ACKNC ACMDZ ACMLO ACOKC ACOMO ACPIV ACSNA ACZOJ ADHHG ADHIR ADINQ ADKNI ADKPE ADRFC ADTPH ADURQ ADYFF ADZKW AEFQL AEGAL AEGNC AEJHL AEJRE AEMSY AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFKRA AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHKAY AHYZX AIAKS AIGIU AIIXL AILAN AITGF AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARAPS ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN B-. BA0 BDATZ BENPR BGLVJ BGNMA BSONS CCPQU CS3 CSCUP DDRTE DL5 DNIVK DPUIP EBLON EBS EIOEI EJD ESBYG FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNWQR GQ6 GQ7 GQ8 GXS H13 HCIFZ HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF I09 IJ- IKXTQ IWAJR IXC IXD IXE IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ K7- KDC KOV LAK LLZTM M4Y MA- NB0 NPVJJ NQJWS NU0 O93 O9J OAM P9O PF0 PT4 PT5 QOS R89 R9I RNS ROL RPX RSV S16 S27 S3B SAP SCO SDH SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 TSG TSK TSV TUC U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW W23 W48 WK8 YLTOR Z45 Z7R Z7X Z7Z Z81 Z83 Z88 ZMTXR ~A9 -Y2 1SB 2P1 2VQ AAIAL AAPKM AARHV AAYTO AAYXX ABBRH ABDBE ABQSL ABRTQ ABULA ACBXY ADHKG ADKFA AEBTG AEKMD AFDZB AFFHD AFGCZ AFOHR AGGDS AGQPQ AHPBZ AHSBF AJBLW ATHPR AYFIA CAG CITATION COF HZ~ IHE N2Q O9- OVD PHGZM PHGZT PQGLB RNI RZC RZE RZK S1Z TEORI 8FE 8FG AZQEC DWQXO GNUQQ JQ2 P62 PKEHL PQEST PQQKQ PQUKI |
| ID | FETCH-LOGICAL-c316t-89a69429678629ea7cfdec35410789c391362fba8c80ba8c00b8aed2d6f3d80d3 |
| IEDL.DBID | K7- |
| ISICitedReferencesCount | 2 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000480653200068&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1386-7857 |
| IngestDate | Wed Nov 26 14:53:20 EST 2025 Sat Nov 29 05:40:09 EST 2025 Tue Nov 18 22:02:29 EST 2025 Fri Feb 21 02:36:57 EST 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | Suppl 1 |
| Keywords | Sarsa algorithm Gradient descent Function approximation Bellman residual Reinforcement learning |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c316t-89a69429678629ea7cfdec35410789c391362fba8c80ba8c00b8aed2d6f3d80d3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| PQID | 2918244642 |
| PQPubID | 2043865 |
| PageCount | 13 |
| ParticipantIDs | proquest_journals_2918244642 crossref_primary_10_1007_s10586_017_1303_8 crossref_citationtrail_10_1007_s10586_017_1303_8 springer_journals_10_1007_s10586_017_1303_8 |
| PublicationCentury | 2000 |
| PublicationDate | 2019-01-01 |
| PublicationDateYYYYMMDD | 2019-01-01 |
| PublicationDate_xml | – month: 01 year: 2019 text: 2019-01-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York – name: Dordrecht |
| PublicationSubtitle | The Journal of Networks, Software Tools and Applications |
| PublicationTitle | Cluster computing |
| PublicationTitleAbbrev | Cluster Comput |
| PublicationYear | 2019 |
| Publisher | Springer US Springer Nature B.V |
| Publisher_xml | – name: Springer US – name: Springer Nature B.V |
| References | DerhamiVMajdVJAhmadabadiMNExploration and exploitation balance management in fuzzy reinforcement learningFuzzy Sets Syst.20101614578595257658710.1016/j.fss.2009.05.003 LiuQLiJFuQMA multiple-goal Sarsa(λ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda )$$\end{document} algorithm based on lost reward of greatest massJ. Electron.201341814691473 Sutton, R.S., Maei, H.R., Szepesvári, C. et al.: A convergent O(n) Temporal-difference algorithm for Off-policy learning with linear function approximation. In: Proceedings of the Advances Neural Information Processing Systems, Vancouver, Canada (2009) Maei, H.R, Szepesvari, C., Bhatnagar, S. et al.: Toward off-policy learning control with function approixamtion. In: Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel (2010) XiaoFLiuQFuQMGradient descent Sarsa(λ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda )$$\end{document} algorithm based on the adaptive potential function shaping reward mechanismJ. Commun.201317788 Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the 30th annual ACM symposium on Theory of computing, New York, USA (1998) You, S.H., Liu, Q., Fu, Q.M., et al.: A Bayesian Sarsa learning algorithm with bandit-based method. In: Proceedings of International Conference on Neural Information Processing (2015) Akimoto, Y., Auger, A., Hansen N.: Comparison-based natural gradient optimization in high dimension. In: Proceedings of the Annual Conference on Genetic and Evolutionary Computation. Vancouver, Canada (2014) SuttonRSBartoAGReinforcement learning: an introduction1998CambridgeMIT press1407.68009 SuttonRSLearning to predict by the method of temporal differencesMach. Learn.19883944 SinghSJaakkolaTLittmanMLConvergence results for single-step on-policy reinforcement-learning algorithmsMach. Learn.200038328730810.1023/A:10076789305590954.68127 ChettibiSChikhiSDynamic fuzzy logic and reinforcement learning for adaptive energy efficient routing in mobile ad-hoc networksAppl. Soft Comput.20163832132810.1016/j.asoc.2015.09.003 Veeriah, V., Van Seijen, H., Sutton, R.S.: Forward actor-Critic for nonlinear function approximation in reinforcement learning. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, São Paulo, Brazil (2017) FuQMLiuQYouSHA novel fast Sarsa algorithm based on value function transferJ. Electron.2014421121572161 Saadatjou, F., Derhami, V., Majd, V.: Balance of exploration and exploitation in deterministic and stochastic environment in reinforcement learning. In: Proceedings of the 11th Annual Computer Society of Iran Computer Conference, Tehran, Iran (2006) KalyanakrishSStonePCharacterizing reinforcement learning methods through parameterized learning problemsMach. Learn.2011841–2205247310822310.1007/s10994-011-5251-x Jaśkowski, W., Szubert, M., Liskowski, P. et al.: High-dimensional function approximation for knowledge-free reinforcement learning: a case study in SZ-Tetris. In: Proceedings of the Annual Conference on Genetic and Evolutionary Computation, New York, USA (2015) BusoniuLBabuskaRDe SchutterBReinforcement learning and dynamic programming using function approximators2010New YorkCRC Press Sutton, R.S., Hamid, R.M., Precup, D.: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings of the 26th International Conference on Machine Learning, New York, USA (2009) Go, C.K., Lao, B., Yoshimoto J., et al.: A reinforcement learning approach to the shepherding task using Sarsa. In: Proceedings of International Joint Conference on Neural Networks (IJCNN), Kuala Lumpur, Malaysia (2016) AntosASzepesvariCMounosRLearning near-optimal polices with bellman-residual minimization based fitted policy iteration and a single sample pathMach. Learn.20087118912910.1007/s10994-007-5038-2 Zhu, H., Zhu, F., Fu, Y., et al.: A kernel-based Sarsa(λ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda )$$\end{document} algorithm with clustering-based sample sparsification. In: Proceedings of International Conference on Neural Information Processing, Kyoto, Japan (2016 Van Seijen, H.: Effective multi-step temporal-difference learning for non-linear function approximation. arXiv preprint arXiv:1608.05151 (2016) Geist, M., Pietquin, O.: Parametric value function approximation. In: Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Paris, France (2011) BarnardETemporal-difference methods and Markov modelsIEEE Trans. Syst. Man Cybern.199323235736510.1109/21.2294490800.68721 YenGYangFHickeyTCoordination of exploration and exploitation in a dynamic environmentInt. J. Smart Eng. Syst. Des.20024317718210.1080/10255810213482 Ortiz, A., Al-Shatri, H., Li, X., et al.: Reinforcement learning for energy harvesting point-to-point communications. In: Proceedings of IEEE International Conference on Communications (ICC), Kuala Lumpur, Malaysia (2016) LiuQFuQMGongSRFuYCCuiZMReinforcement learning algorithm based on minimum state method and average rewardJ. Commun.20113216671 S Kalyanakrish (1303_CR23) 2011; 84 1303_CR20 1303_CR22 1303_CR21 F Xiao (1303_CR12) 2013; 1 1303_CR24 1303_CR26 1303_CR25 1303_CR4 A Antos (1303_CR15) 2008; 71 1303_CR7 1303_CR6 RS Sutton (1303_CR1) 1998 E Barnard (1303_CR28) 1993; 23 V Derhami (1303_CR9) 2010; 161 L Busoniu (1303_CR16) 2010 1303_CR10 1303_CR14 RS Sutton (1303_CR3) 1988; 3 1303_CR17 S Chettibi (1303_CR5) 2016; 38 1303_CR19 1303_CR18 S Singh (1303_CR27) 2000; 38 Q Liu (1303_CR2) 2011; 32 Q Liu (1303_CR11) 2013; 41 QM Fu (1303_CR13) 2014; 42 G Yen (1303_CR8) 2002; 4 |
| References_xml | – reference: Ortiz, A., Al-Shatri, H., Li, X., et al.: Reinforcement learning for energy harvesting point-to-point communications. In: Proceedings of IEEE International Conference on Communications (ICC), Kuala Lumpur, Malaysia (2016) – reference: Saadatjou, F., Derhami, V., Majd, V.: Balance of exploration and exploitation in deterministic and stochastic environment in reinforcement learning. In: Proceedings of the 11th Annual Computer Society of Iran Computer Conference, Tehran, Iran (2006) – reference: BarnardETemporal-difference methods and Markov modelsIEEE Trans. Syst. Man Cybern.199323235736510.1109/21.2294490800.68721 – reference: Jaśkowski, W., Szubert, M., Liskowski, P. et al.: High-dimensional function approximation for knowledge-free reinforcement learning: a case study in SZ-Tetris. In: Proceedings of the Annual Conference on Genetic and Evolutionary Computation, New York, USA (2015) – reference: LiuQFuQMGongSRFuYCCuiZMReinforcement learning algorithm based on minimum state method and average rewardJ. Commun.20113216671 – reference: Zhu, H., Zhu, F., Fu, Y., et al.: A kernel-based Sarsa(λ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda )$$\end{document} algorithm with clustering-based sample sparsification. In: Proceedings of International Conference on Neural Information Processing, Kyoto, Japan (2016) – reference: KalyanakrishSStonePCharacterizing reinforcement learning methods through parameterized learning problemsMach. Learn.2011841–2205247310822310.1007/s10994-011-5251-x – reference: Veeriah, V., Van Seijen, H., Sutton, R.S.: Forward actor-Critic for nonlinear function approximation in reinforcement learning. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, São Paulo, Brazil (2017) – reference: XiaoFLiuQFuQMGradient descent Sarsa(λ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda )$$\end{document} algorithm based on the adaptive potential function shaping reward mechanismJ. Commun.201317788 – reference: Geist, M., Pietquin, O.: Parametric value function approximation. In: Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Paris, France (2011) – reference: Maei, H.R, Szepesvari, C., Bhatnagar, S. et al.: Toward off-policy learning control with function approixamtion. In: Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel (2010) – reference: ChettibiSChikhiSDynamic fuzzy logic and reinforcement learning for adaptive energy efficient routing in mobile ad-hoc networksAppl. Soft Comput.20163832132810.1016/j.asoc.2015.09.003 – reference: SinghSJaakkolaTLittmanMLConvergence results for single-step on-policy reinforcement-learning algorithmsMach. Learn.200038328730810.1023/A:10076789305590954.68127 – reference: Van Seijen, H.: Effective multi-step temporal-difference learning for non-linear function approximation. arXiv preprint arXiv:1608.05151 (2016) – reference: Akimoto, Y., Auger, A., Hansen N.: Comparison-based natural gradient optimization in high dimension. In: Proceedings of the Annual Conference on Genetic and Evolutionary Computation. Vancouver, Canada (2014) – reference: You, S.H., Liu, Q., Fu, Q.M., et al.: A Bayesian Sarsa learning algorithm with bandit-based method. In: Proceedings of International Conference on Neural Information Processing (2015) – reference: SuttonRSBartoAGReinforcement learning: an introduction1998CambridgeMIT press1407.68009 – reference: SuttonRSLearning to predict by the method of temporal differencesMach. Learn.19883944 – reference: Sutton, R.S., Maei, H.R., Szepesvári, C. et al.: A convergent O(n) Temporal-difference algorithm for Off-policy learning with linear function approximation. In: Proceedings of the Advances Neural Information Processing Systems, Vancouver, Canada (2009) – reference: Sutton, R.S., Hamid, R.M., Precup, D.: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings of the 26th International Conference on Machine Learning, New York, USA (2009) – reference: Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the 30th annual ACM symposium on Theory of computing, New York, USA (1998) – reference: LiuQLiJFuQMA multiple-goal Sarsa(λ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda )$$\end{document} algorithm based on lost reward of greatest massJ. Electron.201341814691473 – reference: BusoniuLBabuskaRDe SchutterBReinforcement learning and dynamic programming using function approximators2010New YorkCRC Press – reference: Go, C.K., Lao, B., Yoshimoto J., et al.: A reinforcement learning approach to the shepherding task using Sarsa. In: Proceedings of International Joint Conference on Neural Networks (IJCNN), Kuala Lumpur, Malaysia (2016) – reference: FuQMLiuQYouSHA novel fast Sarsa algorithm based on value function transferJ. Electron.2014421121572161 – reference: YenGYangFHickeyTCoordination of exploration and exploitation in a dynamic environmentInt. J. Smart Eng. Syst. Des.20024317718210.1080/10255810213482 – reference: DerhamiVMajdVJAhmadabadiMNExploration and exploitation balance management in fuzzy reinforcement learningFuzzy Sets Syst.20101614578595257658710.1016/j.fss.2009.05.003 – reference: AntosASzepesvariCMounosRLearning near-optimal polices with bellman-residual minimization based fitted policy iteration and a single sample pathMach. Learn.20087118912910.1007/s10994-007-5038-2 – volume: 161 start-page: 578 issue: 4 year: 2010 ident: 1303_CR9 publication-title: Fuzzy Sets Syst. doi: 10.1016/j.fss.2009.05.003 – ident: 1303_CR10 doi: 10.1007/978-3-319-26532-2_13 – ident: 1303_CR14 doi: 10.1007/978-3-319-46675-0_24 – ident: 1303_CR6 doi: 10.1109/ICC.2016.7511405 – ident: 1303_CR20 doi: 10.1145/1553374.1553501 – ident: 1303_CR21 doi: 10.1145/1553374.1553501 – volume: 84 start-page: 205 issue: 1–2 year: 2011 ident: 1303_CR23 publication-title: Mach. Learn. doi: 10.1007/s10994-011-5251-x – volume: 1 start-page: 77 year: 2013 ident: 1303_CR12 publication-title: J. Commun. – ident: 1303_CR4 doi: 10.1109/IJCNN.2016.7727694 – ident: 1303_CR18 doi: 10.1109/ADPRL.2011.5967355 – volume-title: Reinforcement learning: an introduction year: 1998 ident: 1303_CR1 – volume: 42 start-page: 2157 issue: 11 year: 2014 ident: 1303_CR13 publication-title: J. Electron. – volume: 38 start-page: 287 issue: 3 year: 2000 ident: 1303_CR27 publication-title: Mach. Learn. doi: 10.1023/A:1007678930559 – volume: 4 start-page: 177 issue: 3 year: 2002 ident: 1303_CR8 publication-title: Int. J. Smart Eng. Syst. Des. doi: 10.1080/10255810213482 – ident: 1303_CR7 – ident: 1303_CR26 – volume: 71 start-page: 89 issue: 1 year: 2008 ident: 1303_CR15 publication-title: Mach. Learn. doi: 10.1007/s10994-007-5038-2 – volume: 23 start-page: 357 issue: 2 year: 1993 ident: 1303_CR28 publication-title: IEEE Trans. Syst. Man Cybern. doi: 10.1109/21.229449 – volume: 38 start-page: 321 year: 2016 ident: 1303_CR5 publication-title: Appl. Soft Comput. doi: 10.1016/j.asoc.2015.09.003 – ident: 1303_CR22 – volume: 3 start-page: 9 year: 1988 ident: 1303_CR3 publication-title: Mach. Learn. – ident: 1303_CR24 doi: 10.1145/2739480.2754783 – volume: 41 start-page: 1469 issue: 8 year: 2013 ident: 1303_CR11 publication-title: J. Electron. – volume: 32 start-page: 66 issue: 1 year: 2011 ident: 1303_CR2 publication-title: J. Commun. – ident: 1303_CR17 doi: 10.1145/276698.276876 – ident: 1303_CR19 doi: 10.1145/2576768.2598258 – volume-title: Reinforcement learning and dynamic programming using function approximators year: 2010 ident: 1303_CR16 – ident: 1303_CR25 |
| SSID | ssj0009729 |
| Score | 2.1562595 |
| Snippet | In this work, we proposed an efficient algorithm named the residual Sarsa algorithm with function approximation (FARS) to improve the performance of the... |
| SourceID | proquest crossref springer |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 795 |
| SubjectTerms | Algorithms Approximation Computer Communication Networks Computer Science Convergence Distance learning Fuzzy logic Machine learning Mathematical analysis Operating Systems Performance enhancement Processor Architectures |
| SummonAdditionalLinks | – databaseName: SpringerLINK Contemporary 1997-Present dbid: RSV link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8QwEA66evDi-sTVVXLwpASSpm2So4iLB1lkV2VvJa_qwrorbRV_vkkfVkUFvfTSSSiZTL6ZTOcbAI6ZUYymkiKjMUWhDBRSjBikiVLGHYvOokoS1ys2HPLJRFzXddx587d7k5IsT-oPxW4R99EvQ_7cRXwZrDi0494aR-O7lmmXla3JCHXCjEesSWV-N8VnMGo9zC9J0RJrBt1_feUGWK9dS3hW7YVNsGTnW6DbtG2AtRVvg3Bk87IEC45dVCuhnN0vsmnx8Aj9pSz0SOe1BUu68ddpVdu4A24HFzfnl6hunoA0JXGBuJCxcGDjwCgOhJVMp8ZqGoXEE8xrKoiDrlRJrjn2T4wVl9YEJk6p4djQXdCZL-Z2D0CuYhUHWEWRG55aIgWOLNcx4TZIpeA9gJtVTHTNLO4bXMySlhPZr0riVsWnz2jihpy8D3mqaDV-E-43qklqC8uTQLjIyMWyYdADp40q2tc_Trb_J-kDsOY8JFHdufRBp8ie7SFY1S_FNM-Oyo33BmQM0ZY priority: 102 providerName: Springer Nature |
| Title | Residual Sarsa algorithm with function approximation |
| URI | https://link.springer.com/article/10.1007/s10586-017-1303-8 https://www.proquest.com/docview/2918244642 |
| Volume | 22 |
| WOSCitedRecordID | wos000480653200068&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVPQU databaseName: Advanced Technologies & Aerospace Database customDbUrl: eissn: 1573-7543 dateEnd: 20241207 omitProxy: false ssIdentifier: ssj0009729 issn: 1386-7857 databaseCode: P5Z dateStart: 19980101 isFulltext: true titleUrlDefault: https://search.proquest.com/hightechjournals providerName: ProQuest – providerCode: PRVPQU databaseName: Computer Science Database customDbUrl: eissn: 1573-7543 dateEnd: 20241207 omitProxy: false ssIdentifier: ssj0009729 issn: 1386-7857 databaseCode: K7- dateStart: 19980101 isFulltext: true titleUrlDefault: http://search.proquest.com/compscijour providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central (subscription) customDbUrl: eissn: 1573-7543 dateEnd: 20241207 omitProxy: false ssIdentifier: ssj0009729 issn: 1386-7857 databaseCode: BENPR dateStart: 19980101 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVAVX databaseName: SpringerLINK Contemporary 1997-Present customDbUrl: eissn: 1573-7543 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0009729 issn: 1386-7857 databaseCode: RSV dateStart: 19980101 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3NS8MwFA-6efDi_MTpHD14UoJNsjbJSVQ2BGWMTWV4KfmqDuY21yr--Sb9sCjoxcu7tHmU95L32fweAMdUS0piQaBWPoEdgSWUFGmokJTamkV7ojIQ11va77PxmA-KgltS_FZZ2sTMUOu5cjXyM8xtJGxzlw4-X7xCNzXKdVeLERqroI4wRm6f31BYge7SbEoZIiyElAW07GrmV-cC5nJpCp0Vh-y7X6qCzR_90czt9Br__eBNsFEEnN5FvkO2wIqZbYNGOczBK872DugMTZJdzPJGNtcVnpg-WW7p84vnSrWe839Oh14GQv4xyW887oL7Xvfu6hoWIxWgIihMIeMi5NYFWRcVYm4EVbE2igQd5GDnFeHIOrRYCqaY76jvSyaMxjqMiWa-JnugNpvPzD7wmAxliH0ZBHZ5bJDgfmCYChEzOBacNYFfCjRSBd64G3sxjSqkZKeDyOrANdVIZJecfC1Z5GAbf73cKuUeFecuiSqhN8Fpqbnq8a_MDv5mdgjWbaDE89JLC9TS5Zs5AmvqPZ0kyzaoX3b7g2E723yWDoJHS4ejh0_ZAt-a |
| linkProvider | ProQuest |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1JS8UwEB7cQC_u4m4OelGCbdI2yUFEXFDe8yEu4K1mqwr6XN5z-1P-RpMuFgW9efDSS5sQOpNvZjKZbwCWmVGMZpJiowOKI0kUViw0WIdKGQeLbkflJK5N1mrx83Nx1APvVS2Mv1ZZYWIO1OZO-zPydSKcJ-xil4hs3j9g3zXKZ1erFhqFWjTs24sL2TobBztOviuE7O2ebu_jsqsA1jRMupgLmQiHwg6lEyKsZDozVtM4Cj3zuqYidJieKck1D_wzCBSX1hCTZNTwwFA3by_0R5Qzz9XfYLgm-WV5V7SQ8gQzHrMqi1qU6sXcx-4Me6uB-Vc7WDu33_KxuZnbG_lvP2gUhkuHGm0VO2AMemx7HEaqZhWoxK4JiI5tJy88QyculpdI3ly61XevbpE_ikbevnsdRTnJ-ut1UdE5CWd_svYp6Gvfte00IK4SlZBAxbEbntlQiiC2XCchtySTgs9AUAkw1SWfum_rcZPWTNBe5qmTuU8a0tQNWf0ccl-Qifz28Xwl57TElU5aC3kG1ipNqV__ONns75MtweD-6WEzbR60GnMw5JxCURwzzUNf9_HJLsCAfu5edx4Xc4VHcPHXCvQBAoU42Q |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3dS8MwEA86RXxxfuJ0ah58UoJt0zbJo6hDcYzhVPZW8lUdzG6sVfzzTfphVVQQX_rSu1Ducrm7XO93ABwSJQiOOUZKOhj53BNIEFch6QqhzLFoLCoHce2SXo8Oh6xfzjlNq7_dq5Jk0dNgUZqS7GSq4pMPjW8BtZkwQfYMRnQeLPh2ZpBN1wf3NeouyceUudgQExqQqqz53RKfHVMdbX4pkOZ-p9P89xevgpUy5ISnxR5ZA3M6WQfNapwDLK17A_g3Os1bs-DAZLsc8vHDZDbKHp-gvayF1gNaLcIchvx1VPQ8boK7zsXt2SUqhyogid0wQ5TxkBknZJxU6DHNiYyVljgw4iOUScxc49Jiwamkjn06jqBcK0-FMVbUUXgLNJJJorcBpCIUoeeIIDDssXY5cwJNZehS7cWc0RZwKolGskQct4MvxlGNlWylEhmp2LIajgzL0TvLtIDb-I24XakpKi0vjTxmMiaT4_peCxxXaqlf_7jYzp-oD8BS_7wTda9617tg2QRRrLiWaYNGNnvWe2BRvmSjdLaf78c3iz3dXg |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Residual+Sarsa+algorithm+with+function+approximation&rft.jtitle=Cluster+computing&rft.au=Qiming%2C+Fu&rft.au=Wen%2C+Hu&rft.au=Quan%2C+Liu&rft.au=Heng%2C+Luo&rft.date=2019-01-01&rft.pub=Springer+Nature+B.V&rft.issn=1386-7857&rft.eissn=1573-7543&rft.volume=22&rft.spage=795&rft.epage=807&rft_id=info:doi/10.1007%2Fs10586-017-1303-8 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1386-7857&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1386-7857&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1386-7857&client=summon |