Recruitment-imitation mechanism for evolutionary reinforcement learning

Reinforcement learning, evolutionary algorithms and imitation learning are three principal methods to deal with continuous control tasks. Reinforcement learning is sample efficient, yet sensitive to hyperparameters settings and needs efficient exploration; Evolutionary algorithms are stable, but wit...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Information sciences Ročník 553; s. 172 - 188
Hlavní autoři:	Lü, Shuai, Han, Shuai, Zhou, Wenbo, Zhang, Junwei
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier Inc 01.04.2021
Témata:	Evolutionary algorithms Evolutionary reinforcement learning Imitation learning Reinforcement learning Evolutionary algorithms Imitation learning Reinforcement learning Evolutionary reinforcement learning
ISSN:	0020-0255, 1872-6291
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Reinforcement learning, evolutionary algorithms and imitation learning are three principal methods to deal with continuous control tasks. Reinforcement learning is sample efficient, yet sensitive to hyperparameters settings and needs efficient exploration; Evolutionary algorithms are stable, but with low sample efficiency; Imitation learning is both sample efficient and stable, however it requires the guidance of expert data. In this paper, we propose Recruitment-imitation Mechanism (RIM) for evolutionary reinforcement learning, a scalable framework that combines advantages of the three methods mentioned above. The core of this framework is a dual-actors and single critic reinforcement learning agent. This agent can recruit high-fitness actors from the population performing evolutionary algorithms, which instructs itself to learn from experience replay buffer. At the same time, low-fitness actors in the evolutionary population can imitate behavior patterns of the reinforcement learning agent and promote their fitness level. Reinforcement and imitation learners in this framework can be replaced with any off-policy actor-critic reinforcement learner and data-driven imitation learner. We evaluate RIM on a series of benchmarks for continuous control tasks in Mujoco. The experimental results show that RIM outperforms prior evolutionary or reinforcement learning methods. The performance of RIM’s components is significantly better than components of previous evolutionary reinforcement learning algorithm, and the recruitment using soft update enables reinforcement learning agent to learn faster than that using hard update.
ISSN:	0020-0255 1872-6291
DOI:	10.1016/j.ins.2020.12.017