An Improved Sarsa( \lambda ) Reinforcement Learning Algorithm for Wireless Communication Systems
In this article, we provide a novel improved model-free temporal-difference control algorithm, namely, Expected Sarsa(λ), using the average value as an update target and introducing eligibility traces in wireless communication networks. In particular, we construct the update target using the average...
Gespeichert in:
| Veröffentlicht in: | IEEE access Jg. 7; S. 115418 - 115427 |
|---|---|
| Hauptverfasser: | , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Piscataway
IEEE
2019
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Schlagworte: | |
| ISSN: | 2169-3536, 2169-3536 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | In this article, we provide a novel improved model-free temporal-difference control algorithm, namely, Expected Sarsa(λ), using the average value as an update target and introducing eligibility traces in wireless communication networks. In particular, we construct the update target using the average action value of all possible successive actions, and apply eligibility traces to record the historical access of every state action pair, which greatly improve the model's convergence property and learning efficiency. Numerical results demonstrate that the proposed algorithm has the advantage of high learning efficiency and a higher learning-rate tolerance range than Q Learning, Sarsa, Expected Sarsa, and Sarsa(λ) in the tabular case of a finite Markov decision process, thereby providing an efficient solution for the study and design wireless communication networks. This provides an efficient and effective solution to design further artificial intelligent communication networks. |
|---|---|
| AbstractList | In this article, we provide a novel improved model-free temporal-difference control algorithm, namely, Expected Sarsa(λ), using the average value as an update target and introducing eligibility traces in wireless communication networks. In particular, we construct the update target using the average action value of all possible successive actions, and apply eligibility traces to record the historical access of every state action pair, which greatly improve the model's convergence property and learning efficiency. Numerical results demonstrate that the proposed algorithm has the advantage of high learning efficiency and a higher learning-rate tolerance range than Q Learning, Sarsa, Expected Sarsa, and Sarsa(λ) in the tabular case of a finite Markov decision process, thereby providing an efficient solution for the study and design wireless communication networks. This provides an efficient and effective solution to design further artificial intelligent communication networks. |
| Author | Dang, Jian Gui, Renjie Zhou, Jie Wu, Liang Chen, Zhen Jiang, Hao |
| Author_xml | – sequence: 1 givenname: Hao orcidid: 0000-0002-6757-4231 surname: Jiang fullname: Jiang, Hao organization: School of Electronic and Information Engineering, Nanjing University of Information Science and Technology, Nanjing, China – sequence: 2 givenname: Renjie surname: Gui fullname: Gui, Renjie organization: National Mobile Communications Research Laboratory, Southeast University, Nanjing, China – sequence: 3 givenname: Zhen surname: Chen fullname: Chen, Zhen organization: School of Electronic and Information Engineering, Nanjing University of Information Science and Technology, Nanjing, China – sequence: 4 givenname: Liang orcidid: 0000-0001-9054-0155 surname: Wu fullname: Wu, Liang organization: National Mobile Communications Research Laboratory, Southeast University, Nanjing, China – sequence: 5 givenname: Jian orcidid: 0000-0002-4199-9645 surname: Dang fullname: Dang, Jian organization: National Mobile Communications Research Laboratory, Southeast University, Nanjing, China – sequence: 6 givenname: Jie surname: Zhou fullname: Zhou, Jie email: zhoujie@nuist.edu.cn organization: School of Electronic and Information Engineering, Nanjing University of Information Science and Technology, Nanjing, China |
| BookMark | eNp9kU-LFDEQxRtZwXXdT7CXgBc9zNjp_OnkODSrDgwIjuJFiNXpypihO1mTjLDf3p7tVcSDdUlRqd_jJe95dRFiwKq6ofWa0lq_2XTd7X6_bmqq141mohHiSXXZUKlXTDB58Vf_rLrO-VjPpeaRaC-rb5tAttNdij9xIHtIGV6RryNM_QDkNfmIPriYLE4YCtkhpODDgWzGQ0y-fJ_IfEm--IQj5ky6OE2n4C0UHwPZ3-eCU35RPXUwZrx-PK-qz29vP3XvV7sP77bdZreyvFZlpbRVHMSgGFBuh7aVnCIfnGrbQYKjPeMKbNNT61CxwfUDCul6KXpACVyzq2q76A4RjuYu-QnSvYngzcMgpoOBVLwd0UinawWIVtTAact61WgtbYPScaa1mrVeLlrzv_w4YS7mGE8pzPZNw4WQdLZ33tLLlk0x54TOWF8e3l4S-NHQ2pzzMUs-5pyPecxnZtk_7G_H_6duFsoj4h9CtVrJWrFfNnKegA |
| CODEN | IAECCG |
| CitedBy_id | crossref_primary_10_1155_2022_9190687 crossref_primary_10_3390_computers14010023 crossref_primary_10_1080_00207721_2021_1919337 crossref_primary_10_1109_TNSM_2020_3031843 crossref_primary_10_1007_s10845_025_02580_x crossref_primary_10_1002_aisy_202200455 crossref_primary_10_1007_s00500_023_08734_4 crossref_primary_10_1186_s13638_023_02288_7 crossref_primary_10_1108_IR_09_2021_0194 crossref_primary_10_1109_ACCESS_2021_3103718 crossref_primary_10_1109_ACCESS_2024_3466989 crossref_primary_10_1016_j_enbuild_2024_114189 crossref_primary_10_1016_j_engappai_2025_111156 crossref_primary_10_1109_TWC_2022_3185545 crossref_primary_10_1007_s10586_025_05424_8 crossref_primary_10_3390_s22083031 crossref_primary_10_1109_TSP_2024_3366434 crossref_primary_10_3233_JIFS_236981 crossref_primary_10_1016_j_jobe_2024_110491 crossref_primary_10_1016_j_swevo_2025_101874 crossref_primary_10_1109_ACCESS_2020_2971780 crossref_primary_10_3390_s23031383 |
| Cites_doi | 10.1109/TWC.2014.2370046 10.1109/TVT.2010.2048766 10.1109/TITS.2014.2376873 10.1109/TAMD.2012.2205924 10.1109/TNNLS.2014.2376703 10.1109/ACCESS.2019.2929091 10.1038/nature14236 10.1109/JIOT.2018.2876152 10.1109/LWC.2016.2600576 10.1109/TVT.2019.2893928 10.1109/TSUSC.2019.2929935 10.1109/TNN.1998.712192 10.1109/ACCESS.2019.2902658 10.1109/ACCESS.2019.2894756 10.1109/TWC.2017.2769644 10.1109/TVT.2010.2059055 10.1109/ICCUBEA.2018.8697808 10.1109/TVT.2019.2900460 10.1109/TNET.2018.2869244 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019 |
| DBID | 97E ESBDL RIA RIE AAYXX CITATION 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D DOA |
| DOI | 10.1109/ACCESS.2019.2935255 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE Xplore Open Access Journals IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE/IET Electronic Library (IEL) (UW System Shared) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts METADEX Technology Research Database Materials Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional DOAJ Open Access Full Text |
| DatabaseTitle | CrossRef Materials Research Database Engineered Materials Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace METADEX Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Materials Research Database |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: RIE name: IEEE/IET Electronic Library (IEL) (UW System Shared) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 2169-3536 |
| EndPage | 115427 |
| ExternalDocumentID | oai_doaj_org_article_6f908aeec50a4173b82996c2e6f43998 10_1109_ACCESS_2019_2935255 8798608 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Natural Science Foundation of China grantid: 61771248 funderid: 10.13039/501100001809 – fundername: Major Program of the Natural Science Foundation of Institution of Higher Education of Jiangsu Province grantid: 14KJA510001 – fundername: Priority Academic Program Development of Jiangsu Higher Education Institutions – fundername: National Basic Research Program of China (973 Program); National Key Research and Development Program of China grantid: 2018YFB1801101 funderid: 10.13039/501100012166 – fundername: Startup Foundation for Introducing Talent of NUIST |
| GroupedDBID | 0R~ 4.4 5VS 6IK 97E AAJGR ABAZT ABVLG ACGFS ADBBV AGSQL ALMA_UNASSIGNED_HOLDINGS BCNDV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD ESBDL GROUPED_DOAJ IPLJI JAVBF KQ8 M43 M~E O9- OCL OK1 RIA RIE RNS AAYXX CITATION 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D RIG |
| ID | FETCH-LOGICAL-c408t-89c84a5d83a14cd77641e4df877d6af1b348ac2b1cfe83dfbde56fb65bae6a493 |
| IEDL.DBID | DOA |
| ISICitedReferencesCount | 31 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000484227500018&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2169-3536 |
| IngestDate | Fri Oct 03 12:43:06 EDT 2025 Sun Jun 29 16:58:01 EDT 2025 Tue Nov 18 20:53:16 EST 2025 Sat Nov 29 03:57:53 EST 2025 Wed Aug 27 08:33:31 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Language | English |
| License | https://creativecommons.org/licenses/by/4.0/legalcode |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c408t-89c84a5d83a14cd77641e4df877d6af1b348ac2b1cfe83dfbde56fb65bae6a493 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0001-9054-0155 0000-0002-4199-9645 0000-0002-6757-4231 |
| OpenAccessLink | https://doaj.org/article/6f908aeec50a4173b82996c2e6f43998 |
| PQID | 2455617768 |
| PQPubID | 4845423 |
| PageCount | 10 |
| ParticipantIDs | crossref_citationtrail_10_1109_ACCESS_2019_2935255 crossref_primary_10_1109_ACCESS_2019_2935255 ieee_primary_8798608 proquest_journals_2455617768 doaj_primary_oai_doaj_org_article_6f908aeec50a4173b82996c2e6f43998 |
| PublicationCentury | 2000 |
| PublicationDate | 20190000 2019-00-00 20190101 2019-01-01 |
| PublicationDateYYYYMMDD | 2019-01-01 |
| PublicationDate_xml | – year: 2019 text: 20190000 |
| PublicationDecade | 2010 |
| PublicationPlace | Piscataway |
| PublicationPlace_xml | – name: Piscataway |
| PublicationTitle | IEEE access |
| PublicationTitleAbbrev | Access |
| PublicationYear | 2019 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref12 ref14 ref11 ref10 ref2 ref1 ref17 ref16 ref19 ref18 ref8 mnih (ref15) 2015; 518 ref7 ref9 ref4 ref3 ref6 ref5 |
| References_xml | – ident: ref11 doi: 10.1109/TWC.2014.2370046 – ident: ref12 doi: 10.1109/TVT.2010.2048766 – ident: ref1 doi: 10.1109/TITS.2014.2376873 – ident: ref5 doi: 10.1109/TAMD.2012.2205924 – ident: ref6 doi: 10.1109/TNNLS.2014.2376703 – ident: ref16 doi: 10.1109/ACCESS.2019.2929091 – volume: 518 start-page: 529 year: 2015 ident: ref15 article-title: Human-level control through deep reinforcement learning publication-title: Nature doi: 10.1038/nature14236 – ident: ref8 doi: 10.1109/JIOT.2018.2876152 – ident: ref18 doi: 10.1109/LWC.2016.2600576 – ident: ref3 doi: 10.1109/TVT.2019.2893928 – ident: ref17 doi: 10.1109/TSUSC.2019.2929935 – ident: ref19 doi: 10.1109/TNN.1998.712192 – ident: ref4 doi: 10.1109/ACCESS.2019.2902658 – ident: ref2 doi: 10.1109/ACCESS.2019.2894756 – ident: ref14 doi: 10.1109/TWC.2017.2769644 – ident: ref10 doi: 10.1109/TVT.2010.2059055 – ident: ref13 doi: 10.1109/ICCUBEA.2018.8697808 – ident: ref7 doi: 10.1109/TVT.2019.2900460 – ident: ref9 doi: 10.1109/TNET.2018.2869244 |
| SSID | ssj0000816957 |
| Score | 2.3385851 |
| Snippet | In this article, we provide a novel improved model-free temporal-difference control algorithm, namely, Expected Sarsa(λ), using the average value as an update... |
| SourceID | doaj proquest crossref ieee |
| SourceType | Open Website Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 115418 |
| SubjectTerms | <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">Q learning Algorithms Communication networks Communications networks Control algorithms Control theory Decision making eligibility traces Machine learning Machine learning algorithms Markov processes Mathematical model Model-free reinforcement learning Numerical models Reinforcement learning Sarsa Wireless communication Wireless communication systems Wireless communications Wireless networks |
| SummonAdditionalLinks | – databaseName: IEEE/IET Electronic Library (IEL) (UW System Shared) dbid: RIE link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB6Vqgc4AG1BXSjIBw5FatrE8fO4rKg4VQhaqQck48e4rbTdRdstvx_bcaMiEBK3yLGjiT_bk5nMfAPwLooQ07rljYtBNEygaJRG1niRbTGvW2ptKTYhT0_VxYX-vAGHYy4MIpbgMzzKl-Vfflj6u-wqO1ZSK5Ezex9JKYZcrdGfkgtIaC4rsVDX6uPpbJbeIUdv6aOk1DjN6XwPlE_h6K9FVf44iYt6OXn2f4I9h6f1M5JMB9y3YQMXO_DkAbngLnyfLsjgMsBAvib71R6Qbwl_Fyx5T75goUz1xTtIKsvqJZnOL5er6_XVDUk3SQ6NnaejkPyWRkIqy_kLOD_5eDb71NR6Co1nrVonELxilgfV2475kOaQdchCVFIGYWPneqasp67zEVUfogvIRXSCO4vCMt2_hM3FcoF7QCiVHmVPUytlkkbHOSqqRdRR2kj7CdD7iTa-ko3nmhdzU4yOVpsBHZPRMRWdCRyOg34MXBv_7v4hIzh2zUTZpSFBY-q-M0miVllEz1vLOtm7JKUWnqKI2RJTE9jNcI4PqUhOYP9-PZi6qW8NZbmWaJo29ervo17D4yzg4KHZh8316g7fwJb_ub6-Xb0t6_UX-W_pTg priority: 102 providerName: IEEE |
| Title | An Improved Sarsa( \lambda ) Reinforcement Learning Algorithm for Wireless Communication Systems |
| URI | https://ieeexplore.ieee.org/document/8798608 https://www.proquest.com/docview/2455617768 https://doaj.org/article/6f908aeec50a4173b82996c2e6f43998 |
| Volume | 7 |
| WOSCitedRecordID | wos000484227500018&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2169-3536 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000816957 issn: 2169-3536 databaseCode: DOA dateStart: 20130101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2169-3536 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000816957 issn: 2169-3536 databaseCode: M~E dateStart: 20130101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LSwMxEA4iHvQgPrE-Sg49KLi6m83zWIvixSI-wIMQ8lShtlKrR3-7STaWiqAXL3vIZneTb2YnmTDzDQAdT60PeksK7S0tMHW04MLhwtDoixlRIqVSsQnW7_O7O3E5U-orxoQ19MANcMfUi5Ir5wwpFa5YrXkwoNQgR33cSqc035KJGWcq2WBeUUFYphmqSnHc7fXCjGIslzgKSxxBMblvZilKjP25xMoPu5wWm7MVsJx3ibDbjG4VzLnhGlia4Q5cB7o7hM2JgLPwOrinah927oN8tVUdeACvXOJENen4D2Ya1QfYHTyMxk-Tx2cYbsIY-zoItg5-yxOBmcZ8A9yend70zotcMKEwuOSTgLLhWBHLa1VhYxmjuHLYes6YpcpXusZcGaQr4x2vrdfWEeo1JVo5qrCoN8H8cDR0WwAixIxjNQqtCDPkNSEuIB-k4ZnyqG4B9IWdNJlNPBa1GMjkVZRCNoDLCLjMgLfA4fShl4ZM4_fuJ1Eo066RCTs1BP2QWT_kX_rRAutRpNOXcCY4LUPz7peIZf5rXyXCsVhogI1v_8end8BinE5zYLML5ifjN7cHFsz75Ol13E4KG64XH6ftlHb4CYZT7hY |
| linkProvider | Directory of Open Access Journals |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3daxQxEB9KFbQPflXp2ap58EGh2-5m8_l4HpaK9RCt0AchZpNJLVzvyvXq398kmy4VRfBtySbLbH6TZGd25jcAr4PwIeotr7rgRcUEikppZJUTyRZzuqbW5mITcjpVJyf68xrsDrkwiJiDz3AvXeZ_-X7hrpKrbF9JrUTK7L3DGaN1n601eFRSCQnNZaEWamq9P55M4luk-C29F481TlNC363jJ7P0l7Iqf-zF-YA5ePh_oj2CB-VDkox75B_DGs6fwMYtesFN-DGek95pgJ58jRasfUO-Rw3ovCVvyRfMpKku-wdJ4Vk9JePZ6WJ5tvp5TuJNkoJjZ3EzJL8lkpDCc_4Uvh28P54cVqWiQuVYrVYRBqeY5V61tmHOSylYg8wHJaUXNjRdy5R1tGtcQNX60HnkInSCdxaFZbp9BuvzxRy3gFAqHcqWxlbKJA0d56ioFkEHaQNtR0BvJtq4Qjeeql7MTDY7am16dExCxxR0RrA7DLro2Tb-3f1dQnDomqiyc0OExpSVZ6JEtbKIjteWNbLtopRaOIoiJFtMjWAzwTk8pCA5gp0bfTBlWV8aylI10Tht6vnfR72Ce4fHn47M0Yfpx224n4Tt_TU7sL5aXuELuOt-rc4uly-z7l4DPkjslQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Improved+Sarsa%28%24%5Clambda%24+%29+Reinforcement+Learning+Algorithm+for+Wireless+Communication+Systems&rft.jtitle=IEEE+access&rft.au=Jiang%2C+Hao&rft.au=Gui%2C+Renjie&rft.au=Chen%2C+Zhen&rft.au=Wu%2C+Liang&rft.date=2019&rft.issn=2169-3536&rft.eissn=2169-3536&rft.volume=7&rft.spage=115418&rft.epage=115427&rft_id=info:doi/10.1109%2FACCESS.2019.2935255&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_ACCESS_2019_2935255 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon |