An adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems

This paper develops a novel adaptive dynamic programming (ADP)-based model-free policy iteration (PI) algorithm to solve an infinite-horizon continuous-time linear quadratic stochastic (LQS) optimal control problem, where the diffusion term in system dynamics contains both control and state variable...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Journal of applied mathematics & computing Ročník 69; číslo 3; s. 2741 - 2760
Hlavní autor:	Zhang, Heng
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Berlin/Heidelberg Springer Berlin Heidelberg 01.06.2023 Springer Nature B.V
Témata:	Adaptive algorithms Algorithms Approximation Brownian motion Computational Mathematics and Numerical Analysis Dynamic programming Euclidean space Mathematical and Computational Engineering Mathematics Mathematics and Statistics Mathematics of Computing Optimal control Original Research System dynamics Theory of Computation 93E20 Policy iteration Model-free Adaptive dynamic programming 93E03 Linear quadratic stochastic optimal control
ISSN:	1598-5865, 1865-2085
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	This paper develops a novel adaptive dynamic programming (ADP)-based model-free policy iteration (PI) algorithm to solve an infinite-horizon continuous-time linear quadratic stochastic (LQS) optimal control problem, where the diffusion term in system dynamics contains both control and state variables. First, we apply Ito’s lemma and take expectations to describe a relationship among the state trajectory, the control input and the matrices to be solved. Then, without needing the information of all system coefficient matrices, the ADP-based model-free algorithm is developed to approximate the optimal control from the collected data. Moreover, we give the convergence analysis under some mild conditions. Finally, a numerical example and an illustrative application are served to show that the proposed algorithm is effective.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1598-5865 1865-2085
DOI:	10.1007/s12190-023-01857-9