Asymptotic Analysis of Sample-Averaged Q-Learning
Reinforcement learning (RL) has emerged as a key approach for training agents in complex and uncertain environments. Incorporating statistical inference in RL algorithms is essential for understanding and managing uncertainty in model performance. This paper introduces a generalized framework for ti...
Saved in:
| Published in: | IEEE transactions on information theory Vol. 71; no. 7; pp. 5601 - 5619 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
IEEE
01.07.2025
|
| Subjects: | |
| ISSN: | 0018-9448, 1557-9654 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Reinforcement learning (RL) has emerged as a key approach for training agents in complex and uncertain environments. Incorporating statistical inference in RL algorithms is essential for understanding and managing uncertainty in model performance. This paper introduces a generalized framework for time-varying batch-averaged Q-learning, termed sample-averaged Q-learning (SA-QL), which extends traditional single-sample Q-learning by aggregating samples of rewards and next states to better account for data variability and uncertainty. We leverage the functional central limit theorem (FCLT) to establish a novel framework that provides insights into the asymptotic normality of the sample-averaged algorithm under mild conditions. Additionally, we develop a random scaling method for interval estimation, enabling the construction of confidence intervals without requiring extra hyperparameters. Extensive numerical experiments across classic stochastic OpenAI Gym environments, including windy gridworld and slippery frozenlake, demonstrate how different batch scheduling strategies affect learning efficiency, coverage rates, and confidence interval widths. This work establishes a unified theoretical foundation for sample-averaged Q-learning, providing insights into effective batch scheduling and statistical inference for RL algorithms. |
|---|---|
| ISSN: | 0018-9448 1557-9654 |
| DOI: | 10.1109/TIT.2025.3569421 |