Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems With Trajectory-Based Initial Control Policy

The policy gradient adaptive dynamic programming (PGADP) technique has gained recognition as an effective approach for optimizing the performance of nonlinear systems. Nonetheless, existing PGADP algorithms often demand a substantial volume of expensive or potentially risky interaction data with the...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE transactions on systems, man, and cybernetics. Systems Ročník 54; číslo 3; s. 1489 - 1501
Hlavní autoři:	Xu, Jiahui, Wang, Jingcheng, Rao, Jun, Wu, Shunyu, Zhong, Yanjiu
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	New York IEEE 01.03.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:	Adaptive control Adaptive dynamic programming (ADP) Algorithms Closed loops Controllers Discrete time systems Dynamic programming Feedback control Heuristic algorithms initial control policy Neural networks Nonlinear control Nonlinear systems Optimal control Optimization OptNet Predictive control System dynamics System effectiveness Training Trajectory Trajectory control
ISSN:	2168-2216, 2168-2232
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	The policy gradient adaptive dynamic programming (PGADP) technique has gained recognition as an effective approach for optimizing the performance of nonlinear systems. Nonetheless, existing PGADP algorithms often demand a substantial volume of expensive or potentially risky interaction data with the system. Moreover, the utilization of neural networks in these algorithms can result in suboptimal learning efficiency and unstable training procedures. To address these challenges, a novel algorithm, referred to as OptNet-PGADP, has been introduced. This algorithm integrates an initially tailored control policy based on OptNet to tackle the optimization of control problems in discrete-time nonlinear systems. The OptNet-PGADP algorithm operates through a two-step process. Initially, the input-output trajectory of the system is computed using the nonlinear model predictive control (NMPC) method. Subsequently, an initial admissible control policy is acquired through OptNet. This policy is iteratively enhanced using the PGADP algorithm to attain the optimal controller. The resulting closed-loop control policy can be readily deployed in real-time applications. The implementation of the algorithm employs OptNet for the actor network and integrates an experience replay mechanism to bolster the controller's learning efficiency. Furthermore, a convergence and optimality analysis of the algorithm is included. Simulation and experimental results conducted on two nonlinear systems conclusively demonstrate that the approach outperforms traditional PGADP and NMPC algorithms. These findings underscore the efficacy of OptNet-PGADP in mitigating the constraints of current methods and achieving superior control performance for nonlinear systems.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2168-2216 2168-2232
DOI:	10.1109/TSMC.2023.3327450