Improving the Performance of Batch-Constrained Reinforcement Learning in Continuous Action Domains via Generative Adversarial Networks

The Batch-Constrained Q-learning algorithm is shown to overcome the extrapolation error and enable deep reinforcement learning agents to learn from a previously collected fixed batch of transitions. However, due to conditional Variational Autoencoders (VAE) used in the data generation module, the BC...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:2022 30th Signal Processing and Communications Applications Conference (SIU) s. 1 - 4
Hlavní autori: Saglam, Baturay, Dalmaz, Onat, Gonc, Kaan, Kozat, Suleyman S.
Médium: Konferenčný príspevok..
Jazyk:English
Turkish
Vydavateľské údaje: IEEE 15.05.2022
Predmet:
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:The Batch-Constrained Q-learning algorithm is shown to overcome the extrapolation error and enable deep reinforcement learning agents to learn from a previously collected fixed batch of transitions. However, due to conditional Variational Autoencoders (VAE) used in the data generation module, the BCQ algorithm optimizes a lower variational bound and hence, it is not generalizable to environments with large state and action spaces. In this paper, we show that the performance of the BCQ algorithm can be further improved with the employment of one of the recent advances in deep learning, Generative Adversarial Networks. Our extensive set of experiments shows that the introduced approach significantly improves BCQ in all of the control tasks tested. Moreover, the introduced approach demonstrates robust generalizability to environments with large state and action spaces in the OpenAI Gym control suite.
DOI:10.1109/SIU55565.2022.9864786