Optimized Backstepping Tracking Control Using Reinforcement Learning for a Class of Stochastic Nonlinear Strict-Feedback Systems

In this article, an optimized backstepping (OB) control scheme is proposed for a class of stochastic nonlinear strict-feedback systems with unknown dynamics by using reinforcement learning (RL) strategy of identifier-critic-actor architecture, where the identifier aims to compensate the unknown dyna...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transaction on neural networks and learning systems Ročník 34; číslo 3; s. 1291 - 1303
Hlavní autoři: Wen, Guoxing, Xu, Liguang, Li, Bin
Médium: Journal Article
Jazyk:angličtina
Vydáno: United States IEEE 01.03.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:2162-237X, 2162-2388, 2162-2388
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:In this article, an optimized backstepping (OB) control scheme is proposed for a class of stochastic nonlinear strict-feedback systems with unknown dynamics by using reinforcement learning (RL) strategy of identifier-critic-actor architecture, where the identifier aims to compensate the unknown dynamic, the critic aims to evaluate the control performance and to give the feedback to the actor, and the actor aims to perform the control action. The basic control idea is that all virtual controls and the actual control of backstepping are designed as the optimized solution of corresponding subsystems so that the entire backstepping control is optimized. Different from the deterministic system, stochastic system control needs to consider not only the stochastic disturbance depicted by the Wiener process but also the Hessian term in stability analysis. If the backstepping control is developed on the basis of the published RL optimization methods, it will be difficult to be achieved because, on the one hand, RL of these methods are very complex in the algorithm thanks to their critic and actor updating laws deriving from the negative gradient of the square of approximation of Hamilton-Jacobi-Bellman (HJB) equation; on the other hand, these methods require persistence excitation and known dynamic, where persistence excitation is for training adaptive parameters sufficiently. In this research, both critic and actor updating laws are derived from the negative gradient of a simple positive function, which is yielded on the basis of a partial derivative of the HJB equation. As a result, the RL algorithm can be significantly simplified, meanwhile, two requirements of persistence excitation and known dynamic can be released. Therefore, it can be a natural selection for stochastic optimization control. Finally, from two aspects of theory and simulation, it is demonstrated that the proposed control can arrive at the desired system performance.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2162-237X
2162-2388
2162-2388
DOI:10.1109/TNNLS.2021.3105176