Data-driven optimal tracking control of discrete-time multi-agent systems with two-stage policy iteration algorithm

Herein, a novel adaptive dynamic programming (ADP) algorithm is developed to solve the optimal tracking control problem of discrete-time multi-agent systems. Compared to the classical policy iteration ADP algorithm with two components, policy evaluation, and policy improvement, a two-stage policy it...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Information sciences Ročník 481; s. 189 - 202
Hlavní autoři:	Peng, Zhinan, Zhao, Yiyi, Hu, Jiangping, Ghosh, Bijoy Kumar
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier Inc 01.05.2019
Témata:	Actor-critic networks Data-driven algorithm Multi-agent systems Optimal tracking control Two-stage policy iteration Actor-critic networks Optimal tracking control Two-stage policy iteration Data-driven algorithm Multi-agent systems
ISSN:	0020-0255, 1872-6291
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Herein, a novel adaptive dynamic programming (ADP) algorithm is developed to solve the optimal tracking control problem of discrete-time multi-agent systems. Compared to the classical policy iteration ADP algorithm with two components, policy evaluation, and policy improvement, a two-stage policy iteration algorithm is proposed to obtain the iterative control laws and the iterative performance index functions. The proposed algorithm contains a sub-iteration procedure to calculate the iterative performance index functions at the policy evaluation. The convergence proof for the iterative performance index functions and the iterative control laws are provided. Subsequently, the stability of the closed-loop error system is also provided. Further, an actor-critic neural network (NN) is used to approximate both the iterative control laws and the iterative performance index functions. The actor-critic NN can implement the developed algorithm online without knowledge of the system dynamics. Finally, simulation results are provided to illustrate the performance of our method.
ISSN:	0020-0255 1872-6291
DOI:	10.1016/j.ins.2018.12.079