Optimized Backstepping Consensus Control Using Reinforcement Learning for a Class of Nonlinear Strict-Feedback-Dynamic Multi-Agent Systems
In this article, an optimized leader-following consensus control scheme is proposed for the nonlinear strict-feedback-dynamic multi-agent system by learning from the controlling idea of optimized backstepping technique, which designs the virtual and actual controls of backstepping to be the optimize...
Saved in:
| Published in: | IEEE transaction on neural networks and learning systems Vol. 34; no. 3; pp. 1524 - 1536 |
|---|---|
| Main Authors: | , |
| Format: | Journal Article |
| Language: | English |
| Published: |
United States
IEEE
01.03.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 2162-237X, 2162-2388, 2162-2388 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | In this article, an optimized leader-following consensus control scheme is proposed for the nonlinear strict-feedback-dynamic multi-agent system by learning from the controlling idea of optimized backstepping technique, which designs the virtual and actual controls of backstepping to be the optimized solution of corresponding subsystems so that the entire backstepping control is optimized. Since this control needs to not only ensure the optimizing system performance but also synchronize the multiple system state variables, it is an interesting and challenging topic. In order to achieve this optimized control, the neural network approximation-based reinforcement learning (RL) is performed under critic-actor architecture. In most of the existing RL-based optimal controls, since both the critic and actor RL updating laws are derived from the negative gradient of square of the Hamilton-Jacobi-Bellman (HJB) equation's approximation, which contains multiple nonlinear terms, their algorithm are inevitably intricate. However, the proposed optimized control derives the RL updating laws from the negative gradient of a simple positive function, which is correlated with the HJB equation; hence, it can be significantly simple in the algorithm. Meanwhile, it can also release two general conditions, known dynamic and persistence excitation, which are required in most of the RL-based optimal controls. Therefore, the proposed optimized scheme can be a natural selection for the high-order nonlinear multi-agent control. Finally, the effectiveness is demonstrated by both theory and simulation. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ISSN: | 2162-237X 2162-2388 2162-2388 |
| DOI: | 10.1109/TNNLS.2021.3105548 |