An Improved Sarsa( \lambda ) Reinforcement Learning Algorithm for Wireless Communication Systems

In this article, we provide a novel improved model-free temporal-difference control algorithm, namely, Expected Sarsa(λ), using the average value as an update target and introducing eligibility traces in wireless communication networks. In particular, we construct the update target using the average...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access Jg. 7; S. 115418 - 115427
Hauptverfasser: Jiang, Hao, Gui, Renjie, Chen, Zhen, Wu, Liang, Dang, Jian, Zhou, Jie
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Piscataway IEEE 2019
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:
ISSN:2169-3536, 2169-3536
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract In this article, we provide a novel improved model-free temporal-difference control algorithm, namely, Expected Sarsa(λ), using the average value as an update target and introducing eligibility traces in wireless communication networks. In particular, we construct the update target using the average action value of all possible successive actions, and apply eligibility traces to record the historical access of every state action pair, which greatly improve the model's convergence property and learning efficiency. Numerical results demonstrate that the proposed algorithm has the advantage of high learning efficiency and a higher learning-rate tolerance range than Q Learning, Sarsa, Expected Sarsa, and Sarsa(λ) in the tabular case of a finite Markov decision process, thereby providing an efficient solution for the study and design wireless communication networks. This provides an efficient and effective solution to design further artificial intelligent communication networks.
AbstractList In this article, we provide a novel improved model-free temporal-difference control algorithm, namely, Expected Sarsa(λ), using the average value as an update target and introducing eligibility traces in wireless communication networks. In particular, we construct the update target using the average action value of all possible successive actions, and apply eligibility traces to record the historical access of every state action pair, which greatly improve the model's convergence property and learning efficiency. Numerical results demonstrate that the proposed algorithm has the advantage of high learning efficiency and a higher learning-rate tolerance range than Q Learning, Sarsa, Expected Sarsa, and Sarsa(λ) in the tabular case of a finite Markov decision process, thereby providing an efficient solution for the study and design wireless communication networks. This provides an efficient and effective solution to design further artificial intelligent communication networks.
Author Dang, Jian
Gui, Renjie
Zhou, Jie
Wu, Liang
Chen, Zhen
Jiang, Hao
Author_xml – sequence: 1
  givenname: Hao
  orcidid: 0000-0002-6757-4231
  surname: Jiang
  fullname: Jiang, Hao
  organization: School of Electronic and Information Engineering, Nanjing University of Information Science and Technology, Nanjing, China
– sequence: 2
  givenname: Renjie
  surname: Gui
  fullname: Gui, Renjie
  organization: National Mobile Communications Research Laboratory, Southeast University, Nanjing, China
– sequence: 3
  givenname: Zhen
  surname: Chen
  fullname: Chen, Zhen
  organization: School of Electronic and Information Engineering, Nanjing University of Information Science and Technology, Nanjing, China
– sequence: 4
  givenname: Liang
  orcidid: 0000-0001-9054-0155
  surname: Wu
  fullname: Wu, Liang
  organization: National Mobile Communications Research Laboratory, Southeast University, Nanjing, China
– sequence: 5
  givenname: Jian
  orcidid: 0000-0002-4199-9645
  surname: Dang
  fullname: Dang, Jian
  organization: National Mobile Communications Research Laboratory, Southeast University, Nanjing, China
– sequence: 6
  givenname: Jie
  surname: Zhou
  fullname: Zhou, Jie
  email: zhoujie@nuist.edu.cn
  organization: School of Electronic and Information Engineering, Nanjing University of Information Science and Technology, Nanjing, China
BookMark eNp9kU-LFDEQxRtZwXXdT7CXgBc9zNjp_OnkODSrDgwIjuJFiNXpypihO1mTjLDf3p7tVcSDdUlRqd_jJe95dRFiwKq6ofWa0lq_2XTd7X6_bmqq141mohHiSXXZUKlXTDB58Vf_rLrO-VjPpeaRaC-rb5tAttNdij9xIHtIGV6RryNM_QDkNfmIPriYLE4YCtkhpODDgWzGQ0y-fJ_IfEm--IQj5ky6OE2n4C0UHwPZ3-eCU35RPXUwZrx-PK-qz29vP3XvV7sP77bdZreyvFZlpbRVHMSgGFBuh7aVnCIfnGrbQYKjPeMKbNNT61CxwfUDCul6KXpACVyzq2q76A4RjuYu-QnSvYngzcMgpoOBVLwd0UinawWIVtTAact61WgtbYPScaa1mrVeLlrzv_w4YS7mGE8pzPZNw4WQdLZ33tLLlk0x54TOWF8e3l4S-NHQ2pzzMUs-5pyPecxnZtk_7G_H_6duFsoj4h9CtVrJWrFfNnKegA
CODEN IAECCG
CitedBy_id crossref_primary_10_1155_2022_9190687
crossref_primary_10_3390_computers14010023
crossref_primary_10_1080_00207721_2021_1919337
crossref_primary_10_1109_TNSM_2020_3031843
crossref_primary_10_1007_s10845_025_02580_x
crossref_primary_10_1002_aisy_202200455
crossref_primary_10_1007_s00500_023_08734_4
crossref_primary_10_1186_s13638_023_02288_7
crossref_primary_10_1108_IR_09_2021_0194
crossref_primary_10_1109_ACCESS_2021_3103718
crossref_primary_10_1109_ACCESS_2024_3466989
crossref_primary_10_1016_j_enbuild_2024_114189
crossref_primary_10_1016_j_engappai_2025_111156
crossref_primary_10_1109_TWC_2022_3185545
crossref_primary_10_1007_s10586_025_05424_8
crossref_primary_10_3390_s22083031
crossref_primary_10_1109_TSP_2024_3366434
crossref_primary_10_3233_JIFS_236981
crossref_primary_10_1016_j_jobe_2024_110491
crossref_primary_10_1016_j_swevo_2025_101874
crossref_primary_10_1109_ACCESS_2020_2971780
crossref_primary_10_3390_s23031383
Cites_doi 10.1109/TWC.2014.2370046
10.1109/TVT.2010.2048766
10.1109/TITS.2014.2376873
10.1109/TAMD.2012.2205924
10.1109/TNNLS.2014.2376703
10.1109/ACCESS.2019.2929091
10.1038/nature14236
10.1109/JIOT.2018.2876152
10.1109/LWC.2016.2600576
10.1109/TVT.2019.2893928
10.1109/TSUSC.2019.2929935
10.1109/TNN.1998.712192
10.1109/ACCESS.2019.2902658
10.1109/ACCESS.2019.2894756
10.1109/TWC.2017.2769644
10.1109/TVT.2010.2059055
10.1109/ICCUBEA.2018.8697808
10.1109/TVT.2019.2900460
10.1109/TNET.2018.2869244
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019
DBID 97E
ESBDL
RIA
RIE
AAYXX
CITATION
7SC
7SP
7SR
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
DOA
DOI 10.1109/ACCESS.2019.2935255
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE Xplore Open Access Journals
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE/IET Electronic Library (IEL) (UW System Shared)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Engineered Materials Abstracts
METADEX
Technology Research Database
Materials Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DOAJ Open Access Full Text
DatabaseTitle CrossRef
Materials Research Database
Engineered Materials Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
METADEX
Computer and Information Systems Abstracts Professional
DatabaseTitleList Materials Research Database


Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: RIE
  name: IEEE/IET Electronic Library (IEL) (UW System Shared)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2169-3536
EndPage 115427
ExternalDocumentID oai_doaj_org_article_6f908aeec50a4173b82996c2e6f43998
10_1109_ACCESS_2019_2935255
8798608
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 61771248
  funderid: 10.13039/501100001809
– fundername: Major Program of the Natural Science Foundation of Institution of Higher Education of Jiangsu Province
  grantid: 14KJA510001
– fundername: Priority Academic Program Development of Jiangsu Higher Education Institutions
– fundername: National Basic Research Program of China (973 Program); National Key Research and Development Program of China
  grantid: 2018YFB1801101
  funderid: 10.13039/501100012166
– fundername: Startup Foundation for Introducing Talent of NUIST
GroupedDBID 0R~
4.4
5VS
6IK
97E
AAJGR
ABAZT
ABVLG
ACGFS
ADBBV
AGSQL
ALMA_UNASSIGNED_HOLDINGS
BCNDV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
ESBDL
GROUPED_DOAJ
IPLJI
JAVBF
KQ8
M43
M~E
O9-
OCL
OK1
RIA
RIE
RNS
AAYXX
CITATION
7SC
7SP
7SR
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
RIG
ID FETCH-LOGICAL-c408t-89c84a5d83a14cd77641e4df877d6af1b348ac2b1cfe83dfbde56fb65bae6a493
IEDL.DBID DOA
ISICitedReferencesCount 31
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000484227500018&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2169-3536
IngestDate Fri Oct 03 12:43:06 EDT 2025
Sun Jun 29 16:58:01 EDT 2025
Tue Nov 18 20:53:16 EST 2025
Sat Nov 29 03:57:53 EST 2025
Wed Aug 27 08:33:31 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
License https://creativecommons.org/licenses/by/4.0/legalcode
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c408t-89c84a5d83a14cd77641e4df877d6af1b348ac2b1cfe83dfbde56fb65bae6a493
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0001-9054-0155
0000-0002-4199-9645
0000-0002-6757-4231
OpenAccessLink https://doaj.org/article/6f908aeec50a4173b82996c2e6f43998
PQID 2455617768
PQPubID 4845423
PageCount 10
ParticipantIDs crossref_citationtrail_10_1109_ACCESS_2019_2935255
crossref_primary_10_1109_ACCESS_2019_2935255
ieee_primary_8798608
proquest_journals_2455617768
doaj_primary_oai_doaj_org_article_6f908aeec50a4173b82996c2e6f43998
PublicationCentury 2000
PublicationDate 20190000
2019-00-00
20190101
2019-01-01
PublicationDateYYYYMMDD 2019-01-01
PublicationDate_xml – year: 2019
  text: 20190000
PublicationDecade 2010
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE access
PublicationTitleAbbrev Access
PublicationYear 2019
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref14
ref11
ref10
ref2
ref1
ref17
ref16
ref19
ref18
ref8
mnih (ref15) 2015; 518
ref7
ref9
ref4
ref3
ref6
ref5
References_xml – ident: ref11
  doi: 10.1109/TWC.2014.2370046
– ident: ref12
  doi: 10.1109/TVT.2010.2048766
– ident: ref1
  doi: 10.1109/TITS.2014.2376873
– ident: ref5
  doi: 10.1109/TAMD.2012.2205924
– ident: ref6
  doi: 10.1109/TNNLS.2014.2376703
– ident: ref16
  doi: 10.1109/ACCESS.2019.2929091
– volume: 518
  start-page: 529
  year: 2015
  ident: ref15
  article-title: Human-level control through deep reinforcement learning
  publication-title: Nature
  doi: 10.1038/nature14236
– ident: ref8
  doi: 10.1109/JIOT.2018.2876152
– ident: ref18
  doi: 10.1109/LWC.2016.2600576
– ident: ref3
  doi: 10.1109/TVT.2019.2893928
– ident: ref17
  doi: 10.1109/TSUSC.2019.2929935
– ident: ref19
  doi: 10.1109/TNN.1998.712192
– ident: ref4
  doi: 10.1109/ACCESS.2019.2902658
– ident: ref2
  doi: 10.1109/ACCESS.2019.2894756
– ident: ref14
  doi: 10.1109/TWC.2017.2769644
– ident: ref10
  doi: 10.1109/TVT.2010.2059055
– ident: ref13
  doi: 10.1109/ICCUBEA.2018.8697808
– ident: ref7
  doi: 10.1109/TVT.2019.2900460
– ident: ref9
  doi: 10.1109/TNET.2018.2869244
SSID ssj0000816957
Score 2.3385851
Snippet In this article, we provide a novel improved model-free temporal-difference control algorithm, namely, Expected Sarsa(λ), using the average value as an update...
SourceID doaj
proquest
crossref
ieee
SourceType Open Website
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 115418
SubjectTerms <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">Q learning
Algorithms
Communication networks
Communications networks
Control algorithms
Control theory
Decision making
eligibility traces
Machine learning
Machine learning algorithms
Markov processes
Mathematical model
Model-free reinforcement learning
Numerical models
Reinforcement learning
Sarsa
Wireless communication
Wireless communication systems
Wireless communications
Wireless networks
SummonAdditionalLinks – databaseName: IEEE/IET Electronic Library (IEL) (UW System Shared)
  dbid: RIE
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB6Vqgc4AG1BXSjIBw5FatrE8fO4rKg4VQhaqQck48e4rbTdRdstvx_bcaMiEBK3yLGjiT_bk5nMfAPwLooQ07rljYtBNEygaJRG1niRbTGvW2ptKTYhT0_VxYX-vAGHYy4MIpbgMzzKl-Vfflj6u-wqO1ZSK5Ezex9JKYZcrdGfkgtIaC4rsVDX6uPpbJbeIUdv6aOk1DjN6XwPlE_h6K9FVf44iYt6OXn2f4I9h6f1M5JMB9y3YQMXO_DkAbngLnyfLsjgMsBAvib71R6Qbwl_Fyx5T75goUz1xTtIKsvqJZnOL5er6_XVDUk3SQ6NnaejkPyWRkIqy_kLOD_5eDb71NR6Co1nrVonELxilgfV2475kOaQdchCVFIGYWPneqasp67zEVUfogvIRXSCO4vCMt2_hM3FcoF7QCiVHmVPUytlkkbHOSqqRdRR2kj7CdD7iTa-ko3nmhdzU4yOVpsBHZPRMRWdCRyOg34MXBv_7v4hIzh2zUTZpSFBY-q-M0miVllEz1vLOtm7JKUWnqKI2RJTE9jNcI4PqUhOYP9-PZi6qW8NZbmWaJo29ervo17D4yzg4KHZh8316g7fwJb_ub6-Xb0t6_UX-W_pTg
  priority: 102
  providerName: IEEE
Title An Improved Sarsa( \lambda ) Reinforcement Learning Algorithm for Wireless Communication Systems
URI https://ieeexplore.ieee.org/document/8798608
https://www.proquest.com/docview/2455617768
https://doaj.org/article/6f908aeec50a4173b82996c2e6f43998
Volume 7
WOSCitedRecordID wos000484227500018&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2169-3536
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000816957
  issn: 2169-3536
  databaseCode: DOA
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2169-3536
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000816957
  issn: 2169-3536
  databaseCode: M~E
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LSwMxEA4iHvQgPrE-Sg49KLi6m83zWIvixSI-wIMQ8lShtlKrR3-7STaWiqAXL3vIZneTb2YnmTDzDQAdT60PeksK7S0tMHW04MLhwtDoixlRIqVSsQnW7_O7O3E5U-orxoQ19MANcMfUi5Ir5wwpFa5YrXkwoNQgR33cSqc035KJGWcq2WBeUUFYphmqSnHc7fXCjGIslzgKSxxBMblvZilKjP25xMoPu5wWm7MVsJx3ibDbjG4VzLnhGlia4Q5cB7o7hM2JgLPwOrinah927oN8tVUdeACvXOJENen4D2Ya1QfYHTyMxk-Tx2cYbsIY-zoItg5-yxOBmcZ8A9yend70zotcMKEwuOSTgLLhWBHLa1VhYxmjuHLYes6YpcpXusZcGaQr4x2vrdfWEeo1JVo5qrCoN8H8cDR0WwAixIxjNQqtCDPkNSEuIB-k4ZnyqG4B9IWdNJlNPBa1GMjkVZRCNoDLCLjMgLfA4fShl4ZM4_fuJ1Eo066RCTs1BP2QWT_kX_rRAutRpNOXcCY4LUPz7peIZf5rXyXCsVhogI1v_8end8BinE5zYLML5ifjN7cHFsz75Ol13E4KG64XH6ftlHb4CYZT7hY
linkProvider Directory of Open Access Journals
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3daxQxEB9KFbQPflXp2ap58EGh2-5m8_l4HpaK9RCt0AchZpNJLVzvyvXq398kmy4VRfBtySbLbH6TZGd25jcAr4PwIeotr7rgRcUEikppZJUTyRZzuqbW5mITcjpVJyf68xrsDrkwiJiDz3AvXeZ_-X7hrpKrbF9JrUTK7L3DGaN1n601eFRSCQnNZaEWamq9P55M4luk-C29F481TlNC363jJ7P0l7Iqf-zF-YA5ePh_oj2CB-VDkox75B_DGs6fwMYtesFN-DGek95pgJ58jRasfUO-Rw3ovCVvyRfMpKku-wdJ4Vk9JePZ6WJ5tvp5TuJNkoJjZ3EzJL8lkpDCc_4Uvh28P54cVqWiQuVYrVYRBqeY5V61tmHOSylYg8wHJaUXNjRdy5R1tGtcQNX60HnkInSCdxaFZbp9BuvzxRy3gFAqHcqWxlbKJA0d56ioFkEHaQNtR0BvJtq4Qjeeql7MTDY7am16dExCxxR0RrA7DLro2Tb-3f1dQnDomqiyc0OExpSVZ6JEtbKIjteWNbLtopRaOIoiJFtMjWAzwTk8pCA5gp0bfTBlWV8aylI10Tht6vnfR72Ce4fHn47M0Yfpx224n4Tt_TU7sL5aXuELuOt-rc4uly-z7l4DPkjslQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Improved+Sarsa%28%24%5Clambda%24+%29+Reinforcement+Learning+Algorithm+for+Wireless+Communication+Systems&rft.jtitle=IEEE+access&rft.au=Jiang%2C+Hao&rft.au=Gui%2C+Renjie&rft.au=Chen%2C+Zhen&rft.au=Wu%2C+Liang&rft.date=2019&rft.issn=2169-3536&rft.eissn=2169-3536&rft.volume=7&rft.spage=115418&rft.epage=115427&rft_id=info:doi/10.1109%2FACCESS.2019.2935255&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_ACCESS_2019_2935255
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon