Asymptotic Analysis of Sample-Averaged Q-Learning
Reinforcement learning (RL) has emerged as a key approach for training agents in complex and uncertain environments. Incorporating statistical inference in RL algorithms is essential for understanding and managing uncertainty in model performance. This paper introduces a generalized framework for ti...
Uloženo v:
| Vydáno v: | IEEE transactions on information theory Ročník 71; číslo 7; s. 5601 - 5619 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
01.07.2025
|
| Témata: | |
| ISSN: | 0018-9448, 1557-9654 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Reinforcement learning (RL) has emerged as a key approach for training agents in complex and uncertain environments. Incorporating statistical inference in RL algorithms is essential for understanding and managing uncertainty in model performance. This paper introduces a generalized framework for time-varying batch-averaged Q-learning, termed sample-averaged Q-learning (SA-QL), which extends traditional single-sample Q-learning by aggregating samples of rewards and next states to better account for data variability and uncertainty. We leverage the functional central limit theorem (FCLT) to establish a novel framework that provides insights into the asymptotic normality of the sample-averaged algorithm under mild conditions. Additionally, we develop a random scaling method for interval estimation, enabling the construction of confidence intervals without requiring extra hyperparameters. Extensive numerical experiments across classic stochastic OpenAI Gym environments, including windy gridworld and slippery frozenlake, demonstrate how different batch scheduling strategies affect learning efficiency, coverage rates, and confidence interval widths. This work establishes a unified theoretical foundation for sample-averaged Q-learning, providing insights into effective batch scheduling and statistical inference for RL algorithms. |
|---|---|
| AbstractList | Reinforcement learning (RL) has emerged as a key approach for training agents in complex and uncertain environments. Incorporating statistical inference in RL algorithms is essential for understanding and managing uncertainty in model performance. This paper introduces a generalized framework for time-varying batch-averaged Q-learning, termed sample-averaged Q-learning (SA-QL), which extends traditional single-sample Q-learning by aggregating samples of rewards and next states to better account for data variability and uncertainty. We leverage the functional central limit theorem (FCLT) to establish a novel framework that provides insights into the asymptotic normality of the sample-averaged algorithm under mild conditions. Additionally, we develop a random scaling method for interval estimation, enabling the construction of confidence intervals without requiring extra hyperparameters. Extensive numerical experiments across classic stochastic OpenAI Gym environments, including windy gridworld and slippery frozenlake, demonstrate how different batch scheduling strategies affect learning efficiency, coverage rates, and confidence interval widths. This work establishes a unified theoretical foundation for sample-averaged Q-learning, providing insights into effective batch scheduling and statistical inference for RL algorithms. |
| Author | Xiang, Yisha Liu, Ruiqi Panda, Saunak Kumar |
| Author_xml | – sequence: 1 givenname: Saunak Kumar orcidid: 0009-0008-2399-2387 surname: Panda fullname: Panda, Saunak Kumar organization: Department of Industrial and Systems Engineering, University of Houston, Houston, TX, USA – sequence: 2 givenname: Ruiqi orcidid: 0000-0001-9392-3071 surname: Liu fullname: Liu, Ruiqi organization: Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX, USA – sequence: 3 givenname: Yisha orcidid: 0000-0003-0696-2924 surname: Xiang fullname: Xiang, Yisha email: yxiang4@central.uh.edu organization: Department of Industrial and Systems Engineering, University of Houston, Houston, TX, USA |
| BookMark | eNpFj8lqwzAYhEVJoU7aew89-AXkarWtowldAoZS6p6FLP0KLvGCZQp--yok0NMwzALfFm2GcQCEHinJKCXquTk0GSNMZlzmSjB6gxIqZYFVLsUGJYTQEishyju0DeEnWiEpSxCtwtpPy7h0Nq0Gc1pDF9LRp1-mn06Aq1-YzRFc-olrMPPQDcd7dOvNKcDDVXfo-_Wl2b_j-uPtsK9qbBlVC26Vc5K0QgoDijumPOPKO1aQFgQtOWtFYVtDYmhzp5xUMQbmmSuML4nlO0Quv3YeQ5jB62nuejOvmhJ9RtYRWZ-R9RU5Tp4ukw4A_uuUxJKU_A8paVPm |
| CODEN | IETTAW |
| Cites_doi | 10.1007/BF02771562 10.2307/2171758 10.1017/cbo9780511802256 10.1111/rssb.12050 10.1214/aop/1176988849 10.1609/aaai.v36i7.20701 10.1080/01621459.2020.1826325 10.1109/MSP.2017.2743240 10.1214/18-AOS1801 10.1109/WSC52266.2021.9715437 10.1177/0278364913495721 10.1002/SERIES1345 10.1109/tit.2024.3386122 10.1137/s0363012901385691 10.1080/01621459.2022.2096620 10.1007/s12045-013-0136-x 10.1609/aaai.v32i1.11686 10.1287/opre.2023.2450 10.3150/18-BEJ1088 10.1109/TIT.2021.3120096 |
| ContentType | Journal Article |
| DBID | 97E RIA RIE AAYXX CITATION |
| DOI | 10.1109/TIT.2025.3569421 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EISSN | 1557-9654 |
| EndPage | 5619 |
| ExternalDocumentID | 10_1109_TIT_2025_3569421 11002555 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Science Foundation; U.S. National Science Foundation grantid: 2305486 funderid: 10.13039/100000001 |
| GroupedDBID | -~X .DC 0R~ 29I 3EH 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFO ACGFS ACGOD ACIWK AENEX AETEA AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P HZ~ H~9 IAAWW IBMZZ ICLAB IDIHD IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS RXW TAE TN5 VH1 VJK AAYXX CITATION |
| ID | FETCH-LOGICAL-c219t-b9dd50b454ae93d29f239fd270be41832b47cba0e93c6d9d5939fe2f2d7af80c3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001513211100031&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0018-9448 |
| IngestDate | Sat Nov 29 07:47:41 EST 2025 Wed Aug 27 01:46:03 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 7 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c219t-b9dd50b454ae93d29f239fd270be41832b47cba0e93c6d9d5939fe2f2d7af80c3 |
| ORCID | 0009-0008-2399-2387 0000-0001-9392-3071 0000-0003-0696-2924 |
| PageCount | 19 |
| ParticipantIDs | crossref_primary_10_1109_TIT_2025_3569421 ieee_primary_11002555 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-07-01 |
| PublicationDateYYYYMMDD | 2025-07-01 |
| PublicationDate_xml | – month: 07 year: 2025 text: 2025-07-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationTitle | IEEE transactions on information theory |
| PublicationTitleAbbrev | TIT |
| PublicationYear | 2025 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| References | ref15 ref14 ref33 ref32 Devraj (ref10) 2017 ref2 Murphy (ref25) 2012 ref1 ref17 ref16 ref19 ref18 Wainwright (ref30) 2019 Borkar (ref5) 2021 Li (ref22) ref24 Liu (ref23) 2023 ref26 ref20 ref21 Gadat (ref13) 2018; 18 Su (ref28) 2023; 24 ref27 Fang (ref12) 2018; 19 ref7 Xie (ref31); 35 ref9 ref4 ref3 ref6 Chen (ref8) 2022 Donsker (ref11) 1951; 6 Sutton (ref29) 2018 |
| References_xml | – volume: 35 start-page: 8998 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref31 article-title: A statistical online inference approach in averaged stochastic approximation – year: 2022 ident: ref8 article-title: Online statistical inference for contextual bandits via stochastic gradient descent publication-title: arXiv:2212.14883 – ident: ref27 doi: 10.1007/BF02771562 – volume: 24 start-page: 1 issue: 124 year: 2023 ident: ref28 article-title: HiGrad: Uncertainty quantification for online learning and stochastic approximation publication-title: J. Mach. Learn. Res. – ident: ref1 doi: 10.2307/2171758 – ident: ref9 doi: 10.1017/cbo9780511802256 – ident: ref15 doi: 10.1111/rssb.12050 – volume: 19 start-page: 1 year: 2018 ident: ref12 article-title: Online bootstrap confidence intervals for the stochastic gradient descent estimator publication-title: J. Mach. Learn. Res. – ident: ref32 doi: 10.1214/aop/1176988849 – year: 2017 ident: ref10 article-title: Fastest convergence for Q-learning publication-title: arXiv:1707.03770 – ident: ref18 doi: 10.1609/aaai.v36i7.20701 – ident: ref6 doi: 10.1080/01621459.2020.1826325 – ident: ref2 doi: 10.1109/MSP.2017.2743240 – ident: ref7 doi: 10.1214/18-AOS1801 – volume-title: Reinforcement Learning: An Introduction year: 2018 ident: ref29 – ident: ref33 doi: 10.1109/WSC52266.2021.9715437 – ident: ref16 doi: 10.1177/0278364913495721 – ident: ref3 doi: 10.1002/SERIES1345 – volume: 18 start-page: 1 year: 2018 ident: ref13 article-title: Statistical properties of reinforcement learning algorithms publication-title: J. Mach. Learn. Res. – ident: ref14 doi: 10.1109/tit.2024.3386122 – ident: ref17 doi: 10.1137/s0363012901385691 – ident: ref26 doi: 10.1080/01621459.2022.2096620 – year: 2019 ident: ref30 article-title: Variance-reduced Q-learning is minimax optimal publication-title: arXiv:1906.04697 – ident: ref4 doi: 10.1007/s12045-013-0136-x – ident: ref21 doi: 10.1609/aaai.v32i1.11686 – ident: ref19 doi: 10.1287/opre.2023.2450 – volume-title: Machine Learning: A Probabilistic Perspective year: 2012 ident: ref25 – ident: ref24 doi: 10.3150/18-BEJ1088 – year: 2023 ident: ref23 article-title: Statistical inference with stochastic gradient methods under Φ-mixing data publication-title: arXiv:2302.12717 – start-page: 2207 volume-title: Proc. 26th Int. Conf. Artif. Intell. Statist. ident: ref22 article-title: A statistical analysis of Polyak–Ruppert averaged Q-learning – volume: 6 start-page: 1 issue: 6 year: 1951 ident: ref11 article-title: An invariance principle for certain probability limit theorems publication-title: Memoirs Amer. Math. Soc. – ident: ref20 doi: 10.1109/TIT.2021.3120096 – year: 2021 ident: ref5 article-title: The ODE method for asymptotic statistics in stochastic approximation and reinforcement learning publication-title: arXiv:2110.14427 |
| SSID | ssj0014512 |
| Score | 2.4877725 |
| Snippet | Reinforcement learning (RL) has emerged as a key approach for training agents in complex and uncertain environments. Incorporating statistical inference in RL... |
| SourceID | crossref ieee |
| SourceType | Index Database Publisher |
| StartPage | 5601 |
| SubjectTerms | Accuracy Approximation algorithms asymptotic normality Convergence Estimation Inference algorithms Q-learning random scaling Reinforcement learning (RL) Reliability statistical inference Stochastic processes Training Uncertainty |
| Title | Asymptotic Analysis of Sample-Averaged Q-Learning |
| URI | https://ieeexplore.ieee.org/document/11002555 |
| Volume | 71 |
| WOSCitedRecordID | wos001513211100031&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1557-9654 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014512 issn: 0018-9448 databaseCode: RIE dateStart: 19630101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED7RigEGCqWI8lIGFga3qWMnvbFCVCChCkRB3aLEPiMGmqpNkfj32HlAGRjYLD8k67PP9vnuvgO4NC4akpNmaHDIhLIyh8oolpiQMDKB5oU34ct9NJkMZzN8qILVi1gYIiqcz6jnioUtX2dq7b7K-o7ezD6BZQMaURSWwVrfJgMhByU1-MBKsFU6apukj_3p3dRqglz2Ahmi4INfd9BGUpXiThm3_jmbfdirHo_eqFztA9iieRtadWIGr5LTNuxusAwewmC0-nxf5Jkd5NUcJF5mvKfEMQOzkd3M9lDR3iOryFZfO_A8vple37IqUwJT9sTJWYpaSz8VUiSEFl80PECjeeSnJJzQpiJSaeLbRhVq1BJtM3HDdZSYoa-CI2jOszkdg4fG9hea0BdGKBUgJRFxRUIqIQSpLlzV2MWLkhAjLhQJH2OLc-xwjiucu9BxsP30qxA7-aP-FHbc8NIb9gya-XJN57CtPvK31fKiWO4v_DSoCw |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED5BQQIGCqWI8szAwpCSOnbTGytE1YpSgQioW5TYZ8RAU_WBxL_HzgPKwMBmxZfI-pyzfb677wAutc2GZKRc1NhxuTQ6h1JLN9ZtwkD7imXRhC_DYDTqjMf4UCSrZ7kwRJQFn1HTNjNfvkrl0l6VXVt6M3MEFuuwIThnXp6u9e004KKVk4O3jA4bs6P0Snp4HQ5CYwsy0fRFGzlr_dqFVsqqZLtKr_rP8ezBbnF8dLr5fO_DGk1qUC1LMziFptZgZ4Vn8ABa3fnn-3SRmpeckoXESbXzFFtuYLdrfmezrCjn0S3oVl_r8Ny7DW_6blErwZVmzVm4CSolvIQLHhMahFEzH7VigZcQt2qb8EAmsWc6ZVuhEmi6iWmmglh3POkfQmWSTugIHNRGnitCj2supY8UB8QkcSE55yQbcFViF01zSowoMyU8jAzOkcU5KnBuQN3C9iNXIHb8x_ML2OqH98NoOBjdncC2_VQeG3sKlcVsSWewKT8Wb_PZeTb1X-v2q1I |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Asymptotic+Analysis+of+Sample-Averaged+Q-Learning&rft.jtitle=IEEE+transactions+on+information+theory&rft.au=Panda%2C+Saunak+Kumar&rft.au=Liu%2C+Ruiqi&rft.au=Xiang%2C+Yisha&rft.date=2025-07-01&rft.pub=IEEE&rft.issn=0018-9448&rft.volume=71&rft.issue=7&rft.spage=5601&rft.epage=5619&rft_id=info:doi/10.1109%2FTIT.2025.3569421&rft.externalDocID=11002555 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9448&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9448&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9448&client=summon |