Reward-Guided Synthesis of Intelligent Agents with Control Structures

Deep reinforcement learning (RL) has led to encouraging successes in numerous challenging robotics applications. However, the lack of inductive biases to support logic deduction and generalization in the representation of a deep RL model causes it less effective in exploring complex long-horizon rob...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Proceedings of ACM on programming languages Ročník 8; číslo PLDI; s. 1730 - 1754
Hlavní autori: Cui, Guofeng, Wang, Yuning, Qiu, Wenjie, Zhu, He
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: New York, NY, USA ACM 20.06.2024
Predmet:
ISSN:2475-1421, 2475-1421
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Deep reinforcement learning (RL) has led to encouraging successes in numerous challenging robotics applications. However, the lack of inductive biases to support logic deduction and generalization in the representation of a deep RL model causes it less effective in exploring complex long-horizon robot-control tasks with sparse reward signals. Existing program synthesis algorithms for RL problems inherit the same limitation, as they either adapt conventional RL algorithms to guide program search or synthesize robot-control programs to imitate an RL model. We propose ReGuS, a reward-guided synthesis paradigm, to unlock the potential of program synthesis to overcome the exploration challenges. We develop a novel hierarchical synthesis algorithm with decomposed search space for loops, on-demand synthesis of conditional statements, and curriculum synthesis for procedure calls, to effectively compress the exploration space for long-horizon, multi-stage, and procedural robot-control tasks that are difficult to address by conventional RL techniques. Experiment results demonstrate that ReGuS significantly outperforms state-of-the-art RL algorithms and standard program synthesis baselines on challenging robot tasks including autonomous driving, locomotion control, and object manipulation. CCS Concepts: • Software and its engineering → Automatic programming.
ISSN:2475-1421
2475-1421
DOI:10.1145/3656447