An Efficient Impulsive Adaptive Dynamic Programming Algorithm for Stochastic Systems

In this study, a novel general impulsive transition matrix is defined, which can reveal the transition dynamics and probability distribution evolution patterns for all system states between two impulsive "events," instead of two regular time indexes. Based on this general matrix, the polic...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE transactions on cybernetics Ročník 53; číslo 9; s. 5545 - 5559
Hlavní autoři:	Liang, Mingming, Wang, Yonghua, Liu, Derong
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	United States IEEE 01.09.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:	Adaptive algorithms Adaptive dynamic programming (ADP) Aerospace electronics Algorithms Approximation algorithms Control theory Convergence Dynamic programming Heuristic algorithms impulsive stochastic systems Iterative methods Markov processes optimal control Performance indices policy iteration Probability distribution Stability analysis Stochastic systems transition matrix
ISSN:	2168-2267, 2168-2275, 2168-2275
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	In this study, a novel general impulsive transition matrix is defined, which can reveal the transition dynamics and probability distribution evolution patterns for all system states between two impulsive "events," instead of two regular time indexes. Based on this general matrix, the policy iteration-based impulsive adaptive dynamic programming (IADP) algorithm along with its variant, which is a more efficient IADP (EIADP) algorithm, are developed in order to solve the optimal impulsive control problems of discrete stochastic systems. Through analyzing the monotonicity, stability, and convergency properties of the obtained iterative value functions and control laws, it is proved that the IADP and EIADP algorithms both converge to the optimal impulsive performance index function. By dividing the whole impulsive policy into smaller pieces, the proposed EIADP algorithm updates the iterative policies in a "piece-by-piece" manner according to the actual hardware constraints. This feature of the EIADP method enables these ADP-based algorithms to be fully optimized to run on all "sizes" of computing devices including the ones with low memory spaces. A simulation experiment is conducted to validate the effectiveness of the present methods.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2168-2267 2168-2275 2168-2275
DOI:	10.1109/TCYB.2022.3158898