Safe Reinforcement Learning Using Robust MPC

Reinforcement learning (RL) has recently impressed the world with stunning results in various applications. While the potential of RL is now well established, many critical aspects still need to be tackled, including safety and stability issues. These issues, while secondary for the RL community, ar...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on automatic control Jg. 66; H. 8; S. 3638 - 3652
Hauptverfasser:	Zanon, Mario, Gros, Sebastien
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	New York IEEE 01.08.2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:	Computational modeling Data models Learning Model accuracy Numerical models Optimization Predictive control Reinforcement learning (RL) robust model predictive control (MPC) Robustness Robustness (mathematics) safe policies Safety Stability analysis Uncertainty
ISSN:	0018-9286, 1558-2523
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Reinforcement learning (RL) has recently impressed the world with stunning results in various applications. While the potential of RL is now well established, many critical aspects still need to be tackled, including safety and stability issues. These issues, while secondary for the RL community, are central to the control community that has been widely investigating them. Model predictive control (MPC) is one of the most successful control techniques because, among others, of its ability to provide such guarantees even for uncertain constrained systems. Since MPC is an optimization-based technique, optimality has also often been claimed. Unfortunately, the performance of MPC is highly dependent on the accuracy of the model used for predictions. In this article, we propose to combine RL and MPC in order to exploit the advantages of both, and therefore, obtain a controller that is optimal and safe. We illustrate the results with two numerical examples in simulations.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0018-9286 1558-2523
DOI:	10.1109/TAC.2020.3024161