Convergence analysis of the deep neural networks based globalized dual heuristic programming

Globalized dual heuristic programming (GDHP) algorithm is a special form of approximate dynamic programming (ADP) method that solves the Hamilton–Jacobi–Bellman (HJB) equation for the case where the system takes control-affine form subject to the quadratic cost function. This study incorporates the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Automatica (Oxford) Jg. 122; S. 109222
Hauptverfasser: Kim, Jong Woo, Oh, Tae Hoon, Son, Sang Hwan, Jeong, Dong Hwi, Lee, Jong Min
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier Ltd 01.12.2020
Schlagworte:
ISSN:0005-1098, 1873-2836
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Globalized dual heuristic programming (GDHP) algorithm is a special form of approximate dynamic programming (ADP) method that solves the Hamilton–Jacobi–Bellman (HJB) equation for the case where the system takes control-affine form subject to the quadratic cost function. This study incorporates the deep neural networks (DNNs) as a function approximator to inherit the advantages of which to express high-dimensional function space. Elementwise error bound of the costate function sequence is newly derived and the convergence property is presented. In the approximated function space, uniformly ultimate boundedness (UUB) condition for the weights of the general multi-layer NNs weights is obtained. It is also proved that under the gradient descent method for solving the moving target regression problem, UUB gradually converges to the value, which exclusively contains the approximation reconstruction error. The proposed method is demonstrated on the continuous reactor control in aims to obtain the control policy for multiple initial states, which justifies the necessity of DNNs structure for such cases.
ISSN:0005-1098
1873-2836
DOI:10.1016/j.automatica.2020.109222