Asymptotic Analysis of Sample-Averaged Q-Learning

Reinforcement learning (RL) has emerged as a key approach for training agents in complex and uncertain environments. Incorporating statistical inference in RL algorithms is essential for understanding and managing uncertainty in model performance. This paper introduces a generalized framework for ti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on information theory Jg. 71; H. 7; S. 5601 - 5619
Hauptverfasser: Panda, Saunak Kumar, Liu, Ruiqi, Xiang, Yisha
Format: Journal Article
Sprache:Englisch
Veröffentlicht: IEEE 01.07.2025
Schlagworte:
ISSN:0018-9448, 1557-9654
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Reinforcement learning (RL) has emerged as a key approach for training agents in complex and uncertain environments. Incorporating statistical inference in RL algorithms is essential for understanding and managing uncertainty in model performance. This paper introduces a generalized framework for time-varying batch-averaged Q-learning, termed sample-averaged Q-learning (SA-QL), which extends traditional single-sample Q-learning by aggregating samples of rewards and next states to better account for data variability and uncertainty. We leverage the functional central limit theorem (FCLT) to establish a novel framework that provides insights into the asymptotic normality of the sample-averaged algorithm under mild conditions. Additionally, we develop a random scaling method for interval estimation, enabling the construction of confidence intervals without requiring extra hyperparameters. Extensive numerical experiments across classic stochastic OpenAI Gym environments, including windy gridworld and slippery frozenlake, demonstrate how different batch scheduling strategies affect learning efficiency, coverage rates, and confidence interval widths. This work establishes a unified theoretical foundation for sample-averaged Q-learning, providing insights into effective batch scheduling and statistical inference for RL algorithms.
AbstractList Reinforcement learning (RL) has emerged as a key approach for training agents in complex and uncertain environments. Incorporating statistical inference in RL algorithms is essential for understanding and managing uncertainty in model performance. This paper introduces a generalized framework for time-varying batch-averaged Q-learning, termed sample-averaged Q-learning (SA-QL), which extends traditional single-sample Q-learning by aggregating samples of rewards and next states to better account for data variability and uncertainty. We leverage the functional central limit theorem (FCLT) to establish a novel framework that provides insights into the asymptotic normality of the sample-averaged algorithm under mild conditions. Additionally, we develop a random scaling method for interval estimation, enabling the construction of confidence intervals without requiring extra hyperparameters. Extensive numerical experiments across classic stochastic OpenAI Gym environments, including windy gridworld and slippery frozenlake, demonstrate how different batch scheduling strategies affect learning efficiency, coverage rates, and confidence interval widths. This work establishes a unified theoretical foundation for sample-averaged Q-learning, providing insights into effective batch scheduling and statistical inference for RL algorithms.
Author Xiang, Yisha
Liu, Ruiqi
Panda, Saunak Kumar
Author_xml – sequence: 1
  givenname: Saunak Kumar
  orcidid: 0009-0008-2399-2387
  surname: Panda
  fullname: Panda, Saunak Kumar
  organization: Department of Industrial and Systems Engineering, University of Houston, Houston, TX, USA
– sequence: 2
  givenname: Ruiqi
  orcidid: 0000-0001-9392-3071
  surname: Liu
  fullname: Liu, Ruiqi
  organization: Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX, USA
– sequence: 3
  givenname: Yisha
  orcidid: 0000-0003-0696-2924
  surname: Xiang
  fullname: Xiang, Yisha
  email: yxiang4@central.uh.edu
  organization: Department of Industrial and Systems Engineering, University of Houston, Houston, TX, USA
BookMark eNpFj8lqwzAYhEVJoU7aew89-AXkarWtowldAoZS6p6FLP0KLvGCZQp--yok0NMwzALfFm2GcQCEHinJKCXquTk0GSNMZlzmSjB6gxIqZYFVLsUGJYTQEishyju0DeEnWiEpSxCtwtpPy7h0Nq0Gc1pDF9LRp1-mn06Aq1-YzRFc-olrMPPQDcd7dOvNKcDDVXfo-_Wl2b_j-uPtsK9qbBlVC26Vc5K0QgoDijumPOPKO1aQFgQtOWtFYVtDYmhzp5xUMQbmmSuML4nlO0Quv3YeQ5jB62nuejOvmhJ9RtYRWZ-R9RU5Tp4ukw4A_uuUxJKU_A8paVPm
CODEN IETTAW
Cites_doi 10.1007/BF02771562
10.2307/2171758
10.1017/cbo9780511802256
10.1111/rssb.12050
10.1214/aop/1176988849
10.1609/aaai.v36i7.20701
10.1080/01621459.2020.1826325
10.1109/MSP.2017.2743240
10.1214/18-AOS1801
10.1109/WSC52266.2021.9715437
10.1177/0278364913495721
10.1002/SERIES1345
10.1109/tit.2024.3386122
10.1137/s0363012901385691
10.1080/01621459.2022.2096620
10.1007/s12045-013-0136-x
10.1609/aaai.v32i1.11686
10.1287/opre.2023.2450
10.3150/18-BEJ1088
10.1109/TIT.2021.3120096
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
DOI 10.1109/TIT.2025.3569421
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1557-9654
EndPage 5619
ExternalDocumentID 10_1109_TIT_2025_3569421
11002555
Genre orig-research
GrantInformation_xml – fundername: National Science Foundation; U.S. National Science Foundation
  grantid: 2305486
  funderid: 10.13039/100000001
GroupedDBID -~X
.DC
0R~
29I
3EH
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFO
ACGFS
ACGOD
ACIWK
AENEX
AETEA
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
F5P
HZ~
H~9
IAAWW
IBMZZ
ICLAB
IDIHD
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNS
RXW
TAE
TN5
VH1
VJK
AAYXX
CITATION
ID FETCH-LOGICAL-c219t-b9dd50b454ae93d29f239fd270be41832b47cba0e93c6d9d5939fe2f2d7af80c3
IEDL.DBID RIE
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001513211100031&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0018-9448
IngestDate Sat Nov 29 07:47:41 EST 2025
Wed Aug 27 01:46:03 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 7
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c219t-b9dd50b454ae93d29f239fd270be41832b47cba0e93c6d9d5939fe2f2d7af80c3
ORCID 0009-0008-2399-2387
0000-0001-9392-3071
0000-0003-0696-2924
PageCount 19
ParticipantIDs crossref_primary_10_1109_TIT_2025_3569421
ieee_primary_11002555
PublicationCentury 2000
PublicationDate 2025-07-01
PublicationDateYYYYMMDD 2025-07-01
PublicationDate_xml – month: 07
  year: 2025
  text: 2025-07-01
  day: 01
PublicationDecade 2020
PublicationTitle IEEE transactions on information theory
PublicationTitleAbbrev TIT
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
References ref15
ref14
ref33
ref32
Devraj (ref10) 2017
ref2
Murphy (ref25) 2012
ref1
ref17
ref16
ref19
ref18
Wainwright (ref30) 2019
Borkar (ref5) 2021
Li (ref22)
ref24
Liu (ref23) 2023
ref26
ref20
ref21
Gadat (ref13) 2018; 18
Su (ref28) 2023; 24
ref27
Fang (ref12) 2018; 19
ref7
Xie (ref31); 35
ref9
ref4
ref3
ref6
Chen (ref8) 2022
Donsker (ref11) 1951; 6
Sutton (ref29) 2018
References_xml – volume: 35
  start-page: 8998
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  ident: ref31
  article-title: A statistical online inference approach in averaged stochastic approximation
– year: 2022
  ident: ref8
  article-title: Online statistical inference for contextual bandits via stochastic gradient descent
  publication-title: arXiv:2212.14883
– ident: ref27
  doi: 10.1007/BF02771562
– volume: 24
  start-page: 1
  issue: 124
  year: 2023
  ident: ref28
  article-title: HiGrad: Uncertainty quantification for online learning and stochastic approximation
  publication-title: J. Mach. Learn. Res.
– ident: ref1
  doi: 10.2307/2171758
– ident: ref9
  doi: 10.1017/cbo9780511802256
– ident: ref15
  doi: 10.1111/rssb.12050
– volume: 19
  start-page: 1
  year: 2018
  ident: ref12
  article-title: Online bootstrap confidence intervals for the stochastic gradient descent estimator
  publication-title: J. Mach. Learn. Res.
– ident: ref32
  doi: 10.1214/aop/1176988849
– year: 2017
  ident: ref10
  article-title: Fastest convergence for Q-learning
  publication-title: arXiv:1707.03770
– ident: ref18
  doi: 10.1609/aaai.v36i7.20701
– ident: ref6
  doi: 10.1080/01621459.2020.1826325
– ident: ref2
  doi: 10.1109/MSP.2017.2743240
– ident: ref7
  doi: 10.1214/18-AOS1801
– volume-title: Reinforcement Learning: An Introduction
  year: 2018
  ident: ref29
– ident: ref33
  doi: 10.1109/WSC52266.2021.9715437
– ident: ref16
  doi: 10.1177/0278364913495721
– ident: ref3
  doi: 10.1002/SERIES1345
– volume: 18
  start-page: 1
  year: 2018
  ident: ref13
  article-title: Statistical properties of reinforcement learning algorithms
  publication-title: J. Mach. Learn. Res.
– ident: ref14
  doi: 10.1109/tit.2024.3386122
– ident: ref17
  doi: 10.1137/s0363012901385691
– ident: ref26
  doi: 10.1080/01621459.2022.2096620
– year: 2019
  ident: ref30
  article-title: Variance-reduced Q-learning is minimax optimal
  publication-title: arXiv:1906.04697
– ident: ref4
  doi: 10.1007/s12045-013-0136-x
– ident: ref21
  doi: 10.1609/aaai.v32i1.11686
– ident: ref19
  doi: 10.1287/opre.2023.2450
– volume-title: Machine Learning: A Probabilistic Perspective
  year: 2012
  ident: ref25
– ident: ref24
  doi: 10.3150/18-BEJ1088
– year: 2023
  ident: ref23
  article-title: Statistical inference with stochastic gradient methods under Φ-mixing data
  publication-title: arXiv:2302.12717
– start-page: 2207
  volume-title: Proc. 26th Int. Conf. Artif. Intell. Statist.
  ident: ref22
  article-title: A statistical analysis of Polyak–Ruppert averaged Q-learning
– volume: 6
  start-page: 1
  issue: 6
  year: 1951
  ident: ref11
  article-title: An invariance principle for certain probability limit theorems
  publication-title: Memoirs Amer. Math. Soc.
– ident: ref20
  doi: 10.1109/TIT.2021.3120096
– year: 2021
  ident: ref5
  article-title: The ODE method for asymptotic statistics in stochastic approximation and reinforcement learning
  publication-title: arXiv:2110.14427
SSID ssj0014512
Score 2.487847
Snippet Reinforcement learning (RL) has emerged as a key approach for training agents in complex and uncertain environments. Incorporating statistical inference in RL...
SourceID crossref
ieee
SourceType Index Database
Publisher
StartPage 5601
SubjectTerms Accuracy
Approximation algorithms
asymptotic normality
Convergence
Estimation
Inference algorithms
Q-learning
random scaling
Reinforcement learning (RL)
Reliability
statistical inference
Stochastic processes
Training
Uncertainty
Title Asymptotic Analysis of Sample-Averaged Q-Learning
URI https://ieeexplore.ieee.org/document/11002555
Volume 71
WOSCitedRecordID wos001513211100031&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1557-9654
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014512
  issn: 0018-9448
  databaseCode: RIE
  dateStart: 19630101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED5BxQADhVJEeSkDC4PbPOy4HitEBRKqQBSpW-QnYqCp2hSJf885DygDA5sV21H0OWff-e6-A7hiNKW-6jtxVKOBMowkkSqkhKbKaW4SLlRJ4vrAJ5PhbCYe62T1MhfGWlsGn9m-b5a-fJPrtb8qG3h6M1SB2TZsc55WyVrfLgPKoooaPEIJRqOj8UmGYjC9n6IlGLN-wlJB4-jXGbRRVKU8U8btf37NAezXymMwqlb7ELbsvAPtpjBDUMtpB_Y2WAaPIBqtPt8XRY6TgoaDJMhd8Cw9MzAZ4c-Mm4oJnkhNtvrahZfx7fTmjtSVEojGHacgShjDQkUZlVYkJhYuToQzMQ-VpV5oFeVayRA7dWqEYQK7bexiw6Ubhjo5htY8n9sTCChVqJO4RDsjsa2UUdoqjS-WzAkle3DdYJctKkKMrDQkQpEhzpnHOatx7kHXw_Yzrkbs9I_nZ7Drp1fRsOfQKpZrewE7-qN4Wy0vy-X-Ahn1qJk
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED5BQQIGCqWI8szAwpDWceykHitE1YpSgQhSt8hPxEBbtSkS_x47DygDA5sVO1b0OWff-e6-A7imJCKu6rtviLQGSjfgPheI-CQSRsYqjJnISVxH8XjcnUzYY5msnufCaK3z4DPdds3cl69mcuWuyjqO3syqwHQTtighGBXpWt9OA0KDghw8sDJszY7KK4lYJxkm1hbEtB3SiBEc_DqF1sqq5KdKv_7P7zmA_VJ99HrFeh_Chp42oF6VZvBKSW3A3hrP4BEEveXn-zyb2Ze8ioXEmxnvmTtuYL9nf2e7rSjvyS_pVl-b8NK_S24HflkrwZd2z8l8wZSiSBBKuGahwszgkBmFYyQ0cWIrSCwFR7ZTRoopymy3xgarmJsukuEx1KazqT4BjxBhtRITSqO4bQuhhNRC2ok5NUzwFtxU2KXzghIjzU0JxFKLc-pwTkucW9B0sP2MKxE7_eP5FewMkodROhqO789g101VxMaeQy1brPQFbMuP7G25uMyX_gsPC6vg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Asymptotic+Analysis+of+Sample-Averaged+Q-Learning&rft.jtitle=IEEE+transactions+on+information+theory&rft.au=Panda%2C+Saunak+Kumar&rft.au=Liu%2C+Ruiqi&rft.au=Xiang%2C+Yisha&rft.date=2025-07-01&rft.pub=IEEE&rft.issn=0018-9448&rft.volume=71&rft.issue=7&rft.spage=5601&rft.epage=5619&rft_id=info:doi/10.1109%2FTIT.2025.3569421&rft.externalDocID=11002555
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9448&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9448&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9448&client=summon