Asymptotic Analysis of Sample-Averaged Q-Learning

Reinforcement learning (RL) has emerged as a key approach for training agents in complex and uncertain environments. Incorporating statistical inference in RL algorithms is essential for understanding and managing uncertainty in model performance. This paper introduces a generalized framework for ti...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on information theory Vol. 71; no. 7; pp. 5601 - 5619
Main Authors: Panda, Saunak Kumar, Liu, Ruiqi, Xiang, Yisha
Format: Journal Article
Language:English
Published: IEEE 01.07.2025
Subjects:
ISSN:0018-9448, 1557-9654
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Reinforcement learning (RL) has emerged as a key approach for training agents in complex and uncertain environments. Incorporating statistical inference in RL algorithms is essential for understanding and managing uncertainty in model performance. This paper introduces a generalized framework for time-varying batch-averaged Q-learning, termed sample-averaged Q-learning (SA-QL), which extends traditional single-sample Q-learning by aggregating samples of rewards and next states to better account for data variability and uncertainty. We leverage the functional central limit theorem (FCLT) to establish a novel framework that provides insights into the asymptotic normality of the sample-averaged algorithm under mild conditions. Additionally, we develop a random scaling method for interval estimation, enabling the construction of confidence intervals without requiring extra hyperparameters. Extensive numerical experiments across classic stochastic OpenAI Gym environments, including windy gridworld and slippery frozenlake, demonstrate how different batch scheduling strategies affect learning efficiency, coverage rates, and confidence interval widths. This work establishes a unified theoretical foundation for sample-averaged Q-learning, providing insights into effective batch scheduling and statistical inference for RL algorithms.
ISSN:0018-9448
1557-9654
DOI:10.1109/TIT.2025.3569421