Stochastic linear quadratic optimal tracking control for discrete-time systems with delays based on Q-learning algorithm

In this paper, a reinforcement Q-learning method based on value iteration (Ⅵ) is proposed for a class of model-free stochastic linear quadratic (SLQ) optimal tracking problem with time delay. Compared with the traditional reinforcement learning method, Q-learning method avoids the need for accurate...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:AIMS mathematics Ročník 8; číslo 5; s. 10249 - 10265
Hlavní autori: Tan, Xufeng, Li, Yuan, Liu, Yang
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: AIMS Press 01.01.2023
Predmet:
ISSN:2473-6988, 2473-6988
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:In this paper, a reinforcement Q-learning method based on value iteration (Ⅵ) is proposed for a class of model-free stochastic linear quadratic (SLQ) optimal tracking problem with time delay. Compared with the traditional reinforcement learning method, Q-learning method avoids the need for accurate system model. Firstly, the delay operator is introduced to construct a novel augmented system composed of the original system and the command generator. Secondly, the SLQ optimal tracking problem is transformed into a deterministic one by system transformation and the corresponding Q function of SLQ optimal tracking control is derived. Based on this, Q-learning algorithm is proposed and its convergence is proved. Finally, a simulation example shows the effectiveness of the proposed algorithm.
ISSN:2473-6988
2473-6988
DOI:10.3934/math.2023519