Model-Based Safe Reinforcement Learning With Time-Varying Constraints: Applications to Intelligent Vehicles

In recent years, safe reinforcement learning (RL) with the actor-critic structure has gained significant interest for continuous control tasks. However, achieving near-optimal control policies with safety and convergence guarantees remains challenging. Moreover, few works have focused on designing R...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on industrial electronics (1982) Jg. 71; H. 10; S. 12744 - 12753
Hauptverfasser:	Zhang, Xinglong, Peng, Yaoqian, Luo, Biao, Pan, Wei, Xu, Xin, Xie, Haibin
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	New York IEEE 01.10.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:	Algorithms Barrier force Collision avoidance Control tasks Convergence Heuristic algorithms Intelligent vehicles Machine learning multistep policy evaluation Nonlinear control Nonlinear systems Optimal control Reinforcement learning safe reinforcement learning (RL) Safety time-varying constraints Time-varying systems Trajectory planning Vehicle dynamics
ISSN:	0278-0046, 1557-9948
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In recent years, safe reinforcement learning (RL) with the actor-critic structure has gained significant interest for continuous control tasks. However, achieving near-optimal control policies with safety and convergence guarantees remains challenging. Moreover, few works have focused on designing RL algorithms that handle time-varying safety constraints. This article proposes a safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints. The algorithm's novelty lies in two key aspects. Firstly, the approach introduces a unique barrier force-based control policy structure to ensure control safety during learning. Secondly, a multistep policy evaluation mechanism is employed, enabling the prediction of policy safety risks under time-varying constraints and guiding safe updates. Theoretical results on learning convergence, stability, and robustness are proven. The proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment. It is also applied to the real-world problem of integrated path following and collision avoidance for two intelligent vehicles-a differential-drive vehicle and an Ackermann-drive one. The experimental results demonstrate the impressive sim-to-real transfer capability of our approach, while showcasing satisfactory online control performance.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0278-0046 1557-9948
DOI:	10.1109/TIE.2023.3317853