An Improved Sarsa( \lambda ) Reinforcement Learning Algorithm for Wireless Communication Systems

In this article, we provide a novel improved model-free temporal-difference control algorithm, namely, Expected Sarsa(λ), using the average value as an update target and introducing eligibility traces in wireless communication networks. In particular, we construct the update target using the average...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access Jg. 7; S. 115418 - 115427
Hauptverfasser:	Jiang, Hao, Gui, Renjie, Chen, Zhen, Wu, Liang, Dang, Jian, Zhou, Jie
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Piscataway IEEE 2019 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:	<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">Q learning Algorithms Communication networks Communications networks Control algorithms Control theory Decision making eligibility traces Machine learning Machine learning algorithms Markov processes Mathematical model Model-free reinforcement learning Numerical models Reinforcement learning Sarsa Wireless communication Wireless communication systems Wireless communications Wireless networks
ISSN:	2169-3536, 2169-3536
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Abstract	In this article, we provide a novel improved model-free temporal-difference control algorithm, namely, Expected Sarsa(λ), using the average value as an update target and introducing eligibility traces in wireless communication networks. In particular, we construct the update target using the average action value of all possible successive actions, and apply eligibility traces to record the historical access of every state action pair, which greatly improve the model's convergence property and learning efficiency. Numerical results demonstrate that the proposed algorithm has the advantage of high learning efficiency and a higher learning-rate tolerance range than Q Learning, Sarsa, Expected Sarsa, and Sarsa(λ) in the tabular case of a finite Markov decision process, thereby providing an efficient solution for the study and design wireless communication networks. This provides an efficient and effective solution to design further artificial intelligent communication networks.
AbstractList	In this article, we provide a novel improved model-free temporal-difference control algorithm, namely, Expected Sarsa(λ), using the average value as an update target and introducing eligibility traces in wireless communication networks. In particular, we construct the update target using the average action value of all possible successive actions, and apply eligibility traces to record the historical access of every state action pair, which greatly improve the model's convergence property and learning efficiency. Numerical results demonstrate that the proposed algorithm has the advantage of high learning efficiency and a higher learning-rate tolerance range than Q Learning, Sarsa, Expected Sarsa, and Sarsa(λ) in the tabular case of a finite Markov decision process, thereby providing an efficient solution for the study and design wireless communication networks. This provides an efficient and effective solution to design further artificial intelligent communication networks.
Author	Dang, Jian Gui, Renjie Zhou, Jie Wu, Liang Chen, Zhen Jiang, Hao
Author_xml	– sequence: 1 givenname: Hao orcidid: 0000-0002-6757-4231 surname: Jiang fullname: Jiang, Hao organization: School of Electronic and Information Engineering, Nanjing University of Information Science and Technology, Nanjing, China – sequence: 2 givenname: Renjie surname: Gui fullname: Gui, Renjie organization: National Mobile Communications Research Laboratory, Southeast University, Nanjing, China – sequence: 3 givenname: Zhen surname: Chen fullname: Chen, Zhen organization: School of Electronic and Information Engineering, Nanjing University of Information Science and Technology, Nanjing, China – sequence: 4 givenname: Liang orcidid: 0000-0001-9054-0155 surname: Wu fullname: Wu, Liang organization: National Mobile Communications Research Laboratory, Southeast University, Nanjing, China – sequence: 5 givenname: Jian orcidid: 0000-0002-4199-9645 surname: Dang fullname: Dang, Jian organization: National Mobile Communications Research Laboratory, Southeast University, Nanjing, China – sequence: 6 givenname: Jie surname: Zhou fullname: Zhou, Jie email: zhoujie@nuist.edu.cn organization: School of Electronic and Information Engineering, Nanjing University of Information Science and Technology, Nanjing, China
BookMark	eNp9kU-LFDEQxRtZwXXdT7CXgBc9zNjp_OnkODSrDgwIjuJFiNXpypihO1mTjLDf3p7tVcSDdUlRqd_jJe95dRFiwKq6ofWa0lq_2XTd7X6_bmqq141mohHiSXXZUKlXTDB58Vf_rLrO-VjPpeaRaC-rb5tAttNdij9xIHtIGV6RryNM_QDkNfmIPriYLE4YCtkhpODDgWzGQ0y-fJ_IfEm--IQj5ky6OE2n4C0UHwPZ3-eCU35RPXUwZrx-PK-qz29vP3XvV7sP77bdZreyvFZlpbRVHMSgGFBuh7aVnCIfnGrbQYKjPeMKbNNT61CxwfUDCul6KXpACVyzq2q76A4RjuYu-QnSvYngzcMgpoOBVLwd0UinawWIVtTAact61WgtbYPScaa1mrVeLlrzv_w4YS7mGE8pzPZNw4WQdLZ33tLLlk0x54TOWF8e3l4S-NHQ2pzzMUs-5pyPecxnZtk_7G_H_6duFsoj4h9CtVrJWrFfNnKegA
CODEN	IAECCG
CitedBy_id	crossref_primary_10_1155_2022_9190687 crossref_primary_10_3390_computers14010023 crossref_primary_10_1080_00207721_2021_1919337 crossref_primary_10_1109_TNSM_2020_3031843 crossref_primary_10_1007_s10845_025_02580_x crossref_primary_10_1002_aisy_202200455 crossref_primary_10_1007_s00500_023_08734_4 crossref_primary_10_1186_s13638_023_02288_7 crossref_primary_10_1108_IR_09_2021_0194 crossref_primary_10_1109_ACCESS_2021_3103718 crossref_primary_10_1109_ACCESS_2024_3466989 crossref_primary_10_1016_j_enbuild_2024_114189 crossref_primary_10_1016_j_engappai_2025_111156 crossref_primary_10_1109_TWC_2022_3185545 crossref_primary_10_1007_s10586_025_05424_8 crossref_primary_10_3390_s22083031 crossref_primary_10_1109_TSP_2024_3366434 crossref_primary_10_3233_JIFS_236981 crossref_primary_10_1016_j_jobe_2024_110491 crossref_primary_10_1016_j_swevo_2025_101874 crossref_primary_10_1109_ACCESS_2020_2971780 crossref_primary_10_3390_s23031383
Cites_doi	10.1109/TWC.2014.2370046 10.1109/TVT.2010.2048766 10.1109/TITS.2014.2376873 10.1109/TAMD.2012.2205924 10.1109/TNNLS.2014.2376703 10.1109/ACCESS.2019.2929091 10.1038/nature14236 10.1109/JIOT.2018.2876152 10.1109/LWC.2016.2600576 10.1109/TVT.2019.2893928 10.1109/TSUSC.2019.2929935 10.1109/TNN.1998.712192 10.1109/ACCESS.2019.2902658 10.1109/ACCESS.2019.2894756 10.1109/TWC.2017.2769644 10.1109/TVT.2010.2059055 10.1109/ICCUBEA.2018.8697808 10.1109/TVT.2019.2900460 10.1109/TNET.2018.2869244
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019
DBID	97E ESBDL RIA RIE AAYXX CITATION 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D DOA
DOI	10.1109/ACCESS.2019.2935255
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE Xplore Open Access Journals IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE/IET Electronic Library (IEL) (UW System Shared) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts METADEX Technology Research Database Materials Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional DOAJ Open Access Full Text
DatabaseTitle	CrossRef Materials Research Database Engineered Materials Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace METADEX Computer and Information Systems Abstracts Professional
DatabaseTitleList	Materials Research Database
Database_xml	– sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: RIE name: IEEE/IET Electronic Library (IEL) (UW System Shared) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	2169-3536
EndPage	115427
ExternalDocumentID	oai_doaj_org_article_6f908aeec50a4173b82996c2e6f43998 10_1109_ACCESS_2019_2935255 8798608
Genre	orig-research
GrantInformation_xml	– fundername: National Natural Science Foundation of China grantid: 61771248 funderid: 10.13039/501100001809 – fundername: Major Program of the Natural Science Foundation of Institution of Higher Education of Jiangsu Province grantid: 14KJA510001 – fundername: Priority Academic Program Development of Jiangsu Higher Education Institutions – fundername: National Basic Research Program of China (973 Program); National Key Research and Development Program of China grantid: 2018YFB1801101 funderid: 10.13039/501100012166 – fundername: Startup Foundation for Introducing Talent of NUIST
GroupedDBID	0R~ 4.4 5VS 6IK 97E AAJGR ABAZT ABVLG ACGFS ADBBV AGSQL ALMA_UNASSIGNED_HOLDINGS BCNDV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD ESBDL GROUPED_DOAJ IPLJI JAVBF KQ8 M43 M~E O9- OCL OK1 RIA RIE RNS AAYXX CITATION 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D RIG
ID	FETCH-LOGICAL-c408t-89c84a5d83a14cd77641e4df877d6af1b348ac2b1cfe83dfbde56fb65bae6a493
IEDL.DBID	DOA
ISICitedReferencesCount	31
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000484227500018&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN	2169-3536
IngestDate	Fri Oct 03 12:43:06 EDT 2025 Sun Jun 29 16:58:01 EDT 2025 Tue Nov 18 20:53:16 EST 2025 Sat Nov 29 03:57:53 EST 2025 Wed Aug 27 08:33:31 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Language	English
License	https://creativecommons.org/licenses/by/4.0/legalcode
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c408t-89c84a5d83a14cd77641e4df877d6af1b348ac2b1cfe83dfbde56fb65bae6a493
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0001-9054-0155 0000-0002-4199-9645 0000-0002-6757-4231
OpenAccessLink	https://doaj.org/article/6f908aeec50a4173b82996c2e6f43998
PQID	2455617768
PQPubID	4845423
PageCount	10
ParticipantIDs	crossref_citationtrail_10_1109_ACCESS_2019_2935255 crossref_primary_10_1109_ACCESS_2019_2935255 ieee_primary_8798608 proquest_journals_2455617768 doaj_primary_oai_doaj_org_article_6f908aeec50a4173b82996c2e6f43998
PublicationCentury	2000
PublicationDate	20190000 2019-00-00 20190101 2019-01-01
PublicationDateYYYYMMDD	2019-01-01
PublicationDate_xml	– year: 2019 text: 20190000
PublicationDecade	2010
PublicationPlace	Piscataway
PublicationPlace_xml	– name: Piscataway
PublicationTitle	IEEE access
PublicationTitleAbbrev	Access
PublicationYear	2019
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref13 ref12 ref14 ref11 ref10 ref2 ref1 ref17 ref16 ref19 ref18 ref8 mnih (ref15) 2015; 518 ref7 ref9 ref4 ref3 ref6 ref5
References_xml	– ident: ref11 doi: 10.1109/TWC.2014.2370046 – ident: ref12 doi: 10.1109/TVT.2010.2048766 – ident: ref1 doi: 10.1109/TITS.2014.2376873 – ident: ref5 doi: 10.1109/TAMD.2012.2205924 – ident: ref6 doi: 10.1109/TNNLS.2014.2376703 – ident: ref16 doi: 10.1109/ACCESS.2019.2929091 – volume: 518 start-page: 529 year: 2015 ident: ref15 article-title: Human-level control through deep reinforcement learning publication-title: Nature doi: 10.1038/nature14236 – ident: ref8 doi: 10.1109/JIOT.2018.2876152 – ident: ref18 doi: 10.1109/LWC.2016.2600576 – ident: ref3 doi: 10.1109/TVT.2019.2893928 – ident: ref17 doi: 10.1109/TSUSC.2019.2929935 – ident: ref19 doi: 10.1109/TNN.1998.712192 – ident: ref4 doi: 10.1109/ACCESS.2019.2902658 – ident: ref2 doi: 10.1109/ACCESS.2019.2894756 – ident: ref14 doi: 10.1109/TWC.2017.2769644 – ident: ref10 doi: 10.1109/TVT.2010.2059055 – ident: ref13 doi: 10.1109/ICCUBEA.2018.8697808 – ident: ref7 doi: 10.1109/TVT.2019.2900460 – ident: ref9 doi: 10.1109/TNET.2018.2869244
SSID	ssj0000816957
Score	2.3385851
Snippet	In this article, we provide a novel improved model-free temporal-difference control algorithm, namely, Expected Sarsa(λ), using the average value as an update...
SourceID	doaj proquest crossref ieee
SourceType	Open Website Aggregation Database Enrichment Source Index Database Publisher
StartPage	115418
SubjectTerms	<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">Q learning Algorithms Communication networks Communications networks Control algorithms Control theory Decision making eligibility traces Machine learning Machine learning algorithms Markov processes Mathematical model Model-free reinforcement learning Numerical models Reinforcement learning Sarsa Wireless communication Wireless communication systems Wireless communications Wireless networks
SummonAdditionalLinks	– databaseName: IEEE/IET Electronic Library (IEL) (UW System Shared) dbid: RIE link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB6Vqgc4AG1BXSjIBw5FatrE8fO4rKg4VQhaqQck48e4rbTdRdstvx_bcaMiEBK3yLGjiT_bk5nMfAPwLooQ07rljYtBNEygaJRG1niRbTGvW2ptKTYhT0_VxYX-vAGHYy4MIpbgMzzKl-Vfflj6u-wqO1ZSK5Ezex9JKYZcrdGfkgtIaC4rsVDX6uPpbJbeIUdv6aOk1DjN6XwPlE_h6K9FVf44iYt6OXn2f4I9h6f1M5JMB9y3YQMXO_DkAbngLnyfLsjgMsBAvib71R6Qbwl_Fyx5T75goUz1xTtIKsvqJZnOL5er6_XVDUk3SQ6NnaejkPyWRkIqy_kLOD_5eDb71NR6Co1nrVonELxilgfV2475kOaQdchCVFIGYWPneqasp67zEVUfogvIRXSCO4vCMt2_hM3FcoF7QCiVHmVPUytlkkbHOSqqRdRR2kj7CdD7iTa-ko3nmhdzU4yOVpsBHZPRMRWdCRyOg34MXBv_7v4hIzh2zUTZpSFBY-q-M0miVllEz1vLOtm7JKUWnqKI2RJTE9jNcI4PqUhOYP9-PZi6qW8NZbmWaJo29ervo17D4yzg4KHZh8316g7fwJb_ub6-Xb0t6_UX-W_pTg priority: 102 providerName: IEEE
Title	An Improved Sarsa( \lambda ) Reinforcement Learning Algorithm for Wireless Communication Systems
URI	https://ieeexplore.ieee.org/document/8798608 https://www.proquest.com/docview/2455617768 https://doaj.org/article/6f908aeec50a4173b82996c2e6f43998
Volume	7
WOSCitedRecordID	wos000484227500018&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2169-3536 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000816957 issn: 2169-3536 databaseCode: DOA dateStart: 20130101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2169-3536 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000816957 issn: 2169-3536 databaseCode: M~E dateStart: 20130101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LSwMxEA4iHvQgPrE-Sg49KLi6m83zWIvixSI-wIMQ8lShtlKrR3-7STaWiqAXL3vIZneTb2YnmTDzDQAdT60PeksK7S0tMHW04MLhwtDoixlRIqVSsQnW7_O7O3E5U-orxoQ19MANcMfUi5Ir5wwpFa5YrXkwoNQgR33cSqc035KJGWcq2WBeUUFYphmqSnHc7fXCjGIslzgKSxxBMblvZilKjP25xMoPu5wWm7MVsJx3ibDbjG4VzLnhGlia4Q5cB7o7hM2JgLPwOrinah927oN8tVUdeACvXOJENen4D2Ya1QfYHTyMxk-Tx2cYbsIY-zoItg5-yxOBmcZ8A9yend70zotcMKEwuOSTgLLhWBHLa1VhYxmjuHLYes6YpcpXusZcGaQr4x2vrdfWEeo1JVo5qrCoN8H8cDR0WwAixIxjNQqtCDPkNSEuIB-k4ZnyqG4B9IWdNJlNPBa1GMjkVZRCNoDLCLjMgLfA4fShl4ZM4_fuJ1Eo066RCTs1BP2QWT_kX_rRAutRpNOXcCY4LUPz7peIZf5rXyXCsVhogI1v_8end8BinE5zYLML5ifjN7cHFsz75Ol13E4KG64XH6ftlHb4CYZT7hY
linkProvider	Directory of Open Access Journals
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3daxQxEB9KFbQPflXp2ap58EGh2-5m8_l4HpaK9RCt0AchZpNJLVzvyvXq398kmy4VRfBtySbLbH6TZGd25jcAr4PwIeotr7rgRcUEikppZJUTyRZzuqbW5mITcjpVJyf68xrsDrkwiJiDz3AvXeZ_-X7hrpKrbF9JrUTK7L3DGaN1n601eFRSCQnNZaEWamq9P55M4luk-C29F481TlNC363jJ7P0l7Iqf-zF-YA5ePh_oj2CB-VDkox75B_DGs6fwMYtesFN-DGek95pgJ58jRasfUO-Rw3ovCVvyRfMpKku-wdJ4Vk9JePZ6WJ5tvp5TuJNkoJjZ3EzJL8lkpDCc_4Uvh28P54cVqWiQuVYrVYRBqeY5V61tmHOSylYg8wHJaUXNjRdy5R1tGtcQNX60HnkInSCdxaFZbp9BuvzxRy3gFAqHcqWxlbKJA0d56ioFkEHaQNtR0BvJtq4Qjeeql7MTDY7am16dExCxxR0RrA7DLro2Tb-3f1dQnDomqiyc0OExpSVZ6JEtbKIjteWNbLtopRaOIoiJFtMjWAzwTk8pCA5gp0bfTBlWV8aylI10Tht6vnfR72Ce4fHn47M0Yfpx224n4Tt_TU7sL5aXuELuOt-rc4uly-z7l4DPkjslQ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Improved+Sarsa%28%24%5Clambda%24+%29+Reinforcement+Learning+Algorithm+for+Wireless+Communication+Systems&rft.jtitle=IEEE+access&rft.au=Jiang%2C+Hao&rft.au=Gui%2C+Renjie&rft.au=Chen%2C+Zhen&rft.au=Wu%2C+Liang&rft.date=2019&rft.issn=2169-3536&rft.eissn=2169-3536&rft.volume=7&rft.spage=115418&rft.epage=115427&rft_id=info:doi/10.1109%2FACCESS.2019.2935255&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_ACCESS_2019_2935255
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon