Improved off‐policy reinforcement learning algorithm for robust control of unmodeled nonlinear system with asymmetric state constraints
In this article, an improved data‐based off‐policy reinforcement learning algorithm is proposed for the robust control of unmodeled nonlinear systems with asymmetric state constraints. An improved nonlinear mapping is defined for the asymmetric state constraint problem, which can ensure that the map...
Uložené v:
| Vydané v: | International journal of robust and nonlinear control Ročník 33; číslo 3; s. 1607 - 1632 |
|---|---|
| Hlavní autori: | , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Bognor Regis
Wiley Subscription Services, Inc
01.02.2023
|
| Predmet: | |
| ISSN: | 1049-8923, 1099-1239 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | In this article, an improved data‐based off‐policy reinforcement learning algorithm is proposed for the robust control of unmodeled nonlinear systems with asymmetric state constraints. An improved nonlinear mapping is defined for the asymmetric state constraint problem, which can ensure that the mapping state has better response speed and amplitude than the original state. Then, an auxiliary mapping error system is constructed for the off‐policy robust controller design. At the same time, an innovative network dimensionality reduction method based on principal component analysis is proposed to simplify the useless activation function of action network in off‐policy algorithm, which can effectively reduce the computational burden of data episodes. Considering the uncertain data caused by disturbances, a dominant data sampling method is designed to extract samples that are beneficial to algorithm convergence. On this basis, the improved off‐policy robust control algorithm is constructed. Based on an industrial manipulator system, the effectiveness of the dominant data sampling method and the improved off‐policy robust control algorithm is verified by comparative simulation. |
|---|---|
| Bibliografia: | Funding information National Key Research and Development Program of China, Grant/Award Number: 2021YFB1714700; National Natural Science Foundation of China, Grant/Award Number: 62022061; Natural Science Foundation of Tianjin City, Grant/Award Number: 20JCYBJC00880 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1049-8923 1099-1239 |
| DOI: | 10.1002/rnc.6432 |