Optimal Time-Varying Q-Learning Algorithm for Affine Nonlinear Systems With Coupled Players

To address the finite-horizon coupled two-player mixed <inline-formula> <tex-math notation="LaTeX">H_{2}/H_{\infty } </tex-math></inline-formula> control challenge within a continuous-time affine nonlinear system, this research introduces a distinctive Q-function an...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE transactions on systems, man, and cybernetics. Systems Ročník 55; číslo 10; s. 7037 - 7047
Hlavní autoři:	Zhang, Huaguang, Yu, Shuhang, Sun, Jiayue, Li, Mei
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	IEEE 01.10.2025
Témata:	Adaptive dynamic programming (ADP) Couplings Differential games finite-horizon Games Heuristic algorithms mixed H₂/H∞ control Nash equilibrium neural network (NN) Optimal control Q-learning System dynamics Time-varying systems
ISSN:	2168-2216, 2168-2232
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	To address the finite-horizon coupled two-player mixed <inline-formula> <tex-math notation="LaTeX">H_{2}/H_{\infty } </tex-math></inline-formula> control challenge within a continuous-time affine nonlinear system, this research introduces a distinctive Q-function and presents an innovative adaptive dynamic programming (ADP) method that operates autonomously of system-specific information. Initially, we formulate the time-varying Hamilton-Jacobi-Isaacs (HJI) equations, which pose a significant challenge for resolution due to their time-dependent and nonlinear nature. Subsequently, a novel offline policy iteration (PI) algorithm is introduced, highlighting its convergence and reinforcing the substantive proof of the existence of Nash equilibrium points. Moreover, a novel action-dependent Q-function is established to facilitate entirely model-free learning, representing the initial foray into the mixed <inline-formula> <tex-math notation="LaTeX">H_{2}/H_{\infty } </tex-math></inline-formula> control problem involving coupled players. The Lyapunov direct approach is employed to ensure the stability of the closed-loop uncertain affine nonlinear system under the ADP-based control scheme, guaranteeing uniform ultimate boundedness (UUB). Finally, a numerical simulation is conducted to validate the effectiveness of the aforementioned ADP-based control approach.
ISSN:	2168-2216 2168-2232
DOI:	10.1109/TSMC.2025.3580988