Zobrazit v EDS

The impact of data distribution on Q-learning with function approximation.

Uloženo v:

Podrobná bibliografie
Název:	The impact of data distribution on Q-learning with function approximation.
Autoři:	Santos, Pedro P., Carvalho, Diogo S., Sardinha, Alberto, Melo, Francisco S.
Zdroj:	Machine Learning; Sep2024, Vol. 113 Issue 9, p6141-6163, 23p
Témata:	REINFORCEMENT learning, DATA distribution, APPROXIMATION algorithms, MACHINE learning, DATA quality
Abstrakt:	We study the interplay between the data distribution and Q-learning-based algorithms with function approximation. We provide a unified theoretical and empirical analysis as to how different properties of the data distribution influence the performance of Q-learning-based algorithms. We connect different lines of research, as well as validate and extend previous results, being primarily focused on offline settings. First, we analyze the impact of the data distribution by using optimization as a tool to better understand which data distributions yield low concentrability coefficients. We motivate high-entropy distributions from a game-theoretical point of view and propose an algorithm to find the optimal data distribution from the point of view of concentrability. Second, from an empirical perspective, we introduce a novel four-state MDP specifically tailored to highlight the impact of the data distribution in the performance of Q-learning-based algorithms with function approximation. Finally, we experimentally assess the impact of the data distribution properties on the performance of two offline Q-learning-based algorithms under different environments. Our results attest to the importance of different properties of the data distribution such as entropy, coverage, and data quality (closeness to optimal policy). [ABSTRACT FROM AUTHOR]
	Copyright of Machine Learning is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Databáze:	Complementary Index

Full Text Finder

Nájsť tento článok vo Web of Science

Buďte první, kdo okomentuje tento záznam!