Twin Deterministic Policy Gradient Adaptive Dynamic Programming for Optimal Control of Affine Nonlinear Discrete-time Systems

Recent achievements in the field of adaptive dynamic programming (ADP), as well as the data resources and computational capabilities in modern control systems, have led to a growing interest in learning and data-driven control technologies. This paper proposes a twin deterministic policy gradient ad...

Full description

Saved in:

Bibliographic Details
Published in:	International journal of control, automation, and systems Vol. 20; no. 9; pp. 3098 - 3109
Main Authors:	Xu, Jiahui, Wang, Jingcheng, Rao, Jun, Zhong, Yanjiu, Zhao, Shangwei
Format:	Journal Article
Language:	English
Published:	Bucheon / Seoul Institute of Control, Robotics and Systems and The Korean Institute of Electrical Engineers 01.09.2022 Springer Nature B.V 제어·로봇·시스템학회
Subjects:	Adaptive control Algorithms Control Discrete time systems Dynamic programming Engineering Liapunov functions Machine learning Mechatronics Nonlinear control Nonlinear systems Optimal control Regular Paper Robotics 제어계측공학 optimal control affine nonlinear system twin deterministic policy gradient Adaptive dynamic programming
ISSN:	1598-6446, 2005-4092
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Recent achievements in the field of adaptive dynamic programming (ADP), as well as the data resources and computational capabilities in modern control systems, have led to a growing interest in learning and data-driven control technologies. This paper proposes a twin deterministic policy gradient adaptive dynamic programming (TDPGADP) algorithm to solve the optimal control problem for a discrete-time affine nonlinear system in a model-free scenario. To solve the overestimation problem resulted from function approximation errors, the minimum value between the double Q network is taken to update the control policy. The convergence of the proposed algorithm in which the value function is served as the Lyapunov function is verified. By designing a twin actor-critic network structure, combining the target network and a specially designed adaptive experience replay mechanism, the algorithm is convenient to implement and the sample efficiency of the learning process can be improved. Two simulation examples are conducted to verify the efficacy of the proposed method.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 http://link.springer.com/article/10.1007/s12555-021-0473-6
ISSN:	1598-6446 2005-4092
DOI:	10.1007/s12555-021-0473-6