Asymptotic Analysis of Sample-Averaged Q-Learning

Reinforcement learning (RL) has emerged as a key approach for training agents in complex and uncertain environments. Incorporating statistical inference in RL algorithms is essential for understanding and managing uncertainty in model performance. This paper introduces a generalized framework for ti...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on information theory Ročník 71; číslo 7; s. 5601 - 5619
Hlavní autoři: Panda, Saunak Kumar, Liu, Ruiqi, Xiang, Yisha
Médium: Journal Article
Jazyk:angličtina
Vydáno: IEEE 01.07.2025
Témata:
ISSN:0018-9448, 1557-9654
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Reinforcement learning (RL) has emerged as a key approach for training agents in complex and uncertain environments. Incorporating statistical inference in RL algorithms is essential for understanding and managing uncertainty in model performance. This paper introduces a generalized framework for time-varying batch-averaged Q-learning, termed sample-averaged Q-learning (SA-QL), which extends traditional single-sample Q-learning by aggregating samples of rewards and next states to better account for data variability and uncertainty. We leverage the functional central limit theorem (FCLT) to establish a novel framework that provides insights into the asymptotic normality of the sample-averaged algorithm under mild conditions. Additionally, we develop a random scaling method for interval estimation, enabling the construction of confidence intervals without requiring extra hyperparameters. Extensive numerical experiments across classic stochastic OpenAI Gym environments, including windy gridworld and slippery frozenlake, demonstrate how different batch scheduling strategies affect learning efficiency, coverage rates, and confidence interval widths. This work establishes a unified theoretical foundation for sample-averaged Q-learning, providing insights into effective batch scheduling and statistical inference for RL algorithms.
ISSN:0018-9448
1557-9654
DOI:10.1109/TIT.2025.3569421