Model-Based Safe Reinforcement Learning With Time-Varying Constraints: Applications to Intelligent Vehicles

In recent years, safe reinforcement learning (RL) with the actor-critic structure has gained significant interest for continuous control tasks. However, achieving near-optimal control policies with safety and convergence guarantees remains challenging. Moreover, few works have focused on designing R...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on industrial electronics (1982) Ročník 71; číslo 10; s. 12744 - 12753
Hlavní autoři: Zhang, Xinglong, Peng, Yaoqian, Luo, Biao, Pan, Wei, Xu, Xin, Xie, Haibin
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.10.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:0278-0046, 1557-9948
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:In recent years, safe reinforcement learning (RL) with the actor-critic structure has gained significant interest for continuous control tasks. However, achieving near-optimal control policies with safety and convergence guarantees remains challenging. Moreover, few works have focused on designing RL algorithms that handle time-varying safety constraints. This article proposes a safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints. The algorithm's novelty lies in two key aspects. Firstly, the approach introduces a unique barrier force-based control policy structure to ensure control safety during learning. Secondly, a multistep policy evaluation mechanism is employed, enabling the prediction of policy safety risks under time-varying constraints and guiding safe updates. Theoretical results on learning convergence, stability, and robustness are proven. The proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment. It is also applied to the real-world problem of integrated path following and collision avoidance for two intelligent vehicles-a differential-drive vehicle and an Ackermann-drive one. The experimental results demonstrate the impressive sim-to-real transfer capability of our approach, while showcasing satisfactory online control performance.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0278-0046
1557-9948
DOI:10.1109/TIE.2023.3317853