Continuous-Time Q-Learning for Infinite-Horizon Discounted Cost Linear Quadratic Regulator Problems

This paper presents a method of Q-learning to solve the discounted linear quadratic regulator (LQR) problem for continuous-time (CT) continuous-state systems. Most available methods in the existing literature for CT systems to solve the LQR problem generally need partial or complete knowledge of the...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on cybernetics Vol. 45; no. 2; pp. 165 - 176
Main Authors:	Palanisamy, Muthukumar, Modares, Hamidreza, Lewis, Frank L., Aurangzeb, Muhammad
Format:	Journal Article
Language:	English
Published:	United States IEEE 01.02.2015
Subjects:	Approximate dynamic programming (ADP) Approximation algorithms continuous-time dynamical systems Convergence Discrete-time systems Equations Heuristic algorithms infinite-horizon discounted cost function integral reinforcement learning (IRL) Mathematical model Optimal control Q-learning value iteration (VI) infinite-horizon discounted cost function integral reinforcement learning (IRL) Q-learning continuous-time dynamical systems value iteration (VI) Approximate dynamic programming (ADP) optimal control
ISSN:	2168-2267, 2168-2275, 2168-2275
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper presents a method of Q-learning to solve the discounted linear quadratic regulator (LQR) problem for continuous-time (CT) continuous-state systems. Most available methods in the existing literature for CT systems to solve the LQR problem generally need partial or complete knowledge of the system dynamics. Q-learning is effective for unknown dynamical systems, but has generally been well understood only for discrete-time systems. The contribution of this paper is to present a Q-learning methodology for CT systems which solves the LQR problem without having any knowledge of the system dynamics. A natural and rigorous justified parameterization of the Q-function is given in terms of the state, the control input, and its derivatives. This parameterization allows the implementation of an online Q-learning algorithm for CT systems. The simulation results supporting the theoretical development are also presented.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2168-2267 2168-2275 2168-2275
DOI:	10.1109/TCYB.2014.2322116