Curriculum Goal-Conditioned Imitation for Offline Reinforcement Learning

Offline reinforcement learning (RL) enables learning policies from precollected datasets without online data collection. Although it offers the possibility to surpass the performance of the datasets, most existing offline RL algorithms struggle to compete with behavior cloning policies in many datas...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on games Vol. 16; no. 1; pp. 102 - 112
Main Authors: Feng, Xiaoyun, Jiang, Li, Yu, Xudong, Xu, Haoran, Sun, Xiaoyan, Wang, Jie, Zhan, Xianyuan, Chan, Wai Kin
Format: Journal Article
Language:English
Published: Piscataway IEEE 01.03.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:2475-1502, 2475-1510
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Offline reinforcement learning (RL) enables learning policies from precollected datasets without online data collection. Although it offers the possibility to surpass the performance of the datasets, most existing offline RL algorithms struggle to compete with behavior cloning policies in many dataset settings due to trading off policy improvement and additional regularization to address the distributional shift issue. In many cases, if one can imitate a sequence of suboptimal subtrajectories in data and properly "stitch" them toward reaching an ideal future state, it may potentially result in a more reliable policy while avoiding difficulties that present in typical value-based offline RL algorithms. We borrow the idea of curriculum learning to embody the above intuition. We construct a curriculum that progressively imitates a sequence of suboptimal trajectories conditioned on a series of carefully constructed future states and cumulative rewards as goals. The suboptimal trajectories gradually guide policy learning toward reaching the ideal goal states. We name our algorithm curriculum goal-conditioned imitation (CGI). Experimental results show that CGI achieves competitive performance against state-of-the-art offline RL algorithms, especially for challenging tasks with long horizons and sparse rewards.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2475-1502
2475-1510
DOI:10.1109/TG.2022.3224088