Value Iteration Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems

In this paper, a value iteration adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon undiscounted optimal control problems for discrete-time nonlinear systems. The present value iteration ADP algorithm permits an arbitrary positive semi-definite function to initialize...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	IEEE transactions on cybernetics Ročník 46; číslo 3; s. 840 - 853
Hlavní autori:	Wei, Qinglai, Liu, Derong, Lin, Hanquan
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	United States IEEE 01.03.2016 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:	Adaptive control systems Adaptive critic designs adaptive dynamic programming (ADP) Algorithms approximate dynamic programming Computer simulation Convergence Dynamic programming Dynamical systems Heuristic algorithms Iterative algorithms Iterative methods Mathematical analysis Mathematical models neural networks neuro-dynamic programming Nonlinear dynamics Nonlinear systems Optimal control Performance analysis reinforcement learning value iteration approximate dynamic programming neural networks optimal control neuro-dynamic programming Adaptive critic designs adaptive dynamic programming (ADP) reinforcement learning value iteration
ISSN:	2168-2267, 2168-2275, 2168-2275
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	In this paper, a value iteration adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon undiscounted optimal control problems for discrete-time nonlinear systems. The present value iteration ADP algorithm permits an arbitrary positive semi-definite function to initialize the algorithm. A novel convergence analysis is developed to guarantee that the iterative value function converges to the optimal performance index function. Initialized by different initial functions, it is proven that the iterative value function will be monotonically nonincreasing, monotonically nondecreasing, or nonmonotonic and will converge to the optimum. In this paper, for the first time, the admissibility properties of the iterative control laws are developed for value iteration algorithms. It is emphasized that new termination criteria are established to guarantee the effectiveness of the iterative control laws. Neural networks are used to approximate the iterative value function and compute the iterative control law, respectively, for facilitating the implementation of the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the present method.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2168-2267 2168-2275 2168-2275
DOI:	10.1109/TCYB.2015.2492242