An adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems

This paper develops a novel adaptive dynamic programming (ADP)-based model-free policy iteration (PI) algorithm to solve an infinite-horizon continuous-time linear quadratic stochastic (LQS) optimal control problem, where the diffusion term in system dynamics contains both control and state variable...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of applied mathematics & computing Vol. 69; no. 3; pp. 2741 - 2760
Main Author:	Zhang, Heng
Format:	Journal Article
Language:	English
Published:	Berlin/Heidelberg Springer Berlin Heidelberg 01.06.2023 Springer Nature B.V
Subjects:	Adaptive algorithms Algorithms Approximation Brownian motion Computational Mathematics and Numerical Analysis Dynamic programming Euclidean space Mathematical and Computational Engineering Mathematics Mathematics and Statistics Mathematics of Computing Optimal control Original Research System dynamics Theory of Computation 93E20 Policy iteration Model-free Adaptive dynamic programming 93E03 Linear quadratic stochastic optimal control
ISSN:	1598-5865, 1865-2085
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper develops a novel adaptive dynamic programming (ADP)-based model-free policy iteration (PI) algorithm to solve an infinite-horizon continuous-time linear quadratic stochastic (LQS) optimal control problem, where the diffusion term in system dynamics contains both control and state variables. First, we apply Ito’s lemma and take expectations to describe a relationship among the state trajectory, the control input and the matrices to be solved. Then, without needing the information of all system coefficient matrices, the ADP-based model-free algorithm is developed to approximate the optimal control from the collected data. Moreover, we give the convergence analysis under some mild conditions. Finally, a numerical example and an illustrative application are served to show that the proposed algorithm is effective.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1598-5865 1865-2085
DOI:	10.1007/s12190-023-01857-9