A model-based deep reinforcement learning method applied to finite-horizon optimal control of nonlinear control-affine system

•A model-based deep reinforcement learning (DRL) algorithm, which solves the Hamilton–Jacobi–Bellman equation for finite-horizon optimal control of nonlinear control-affine system is developed.•Deep neural networks (DNNs) are implemented to approximate the value function, its first-order derivative...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of process control Ročník 87; s. 166 - 178
Hlavní autoři: Kim, Jong Woo, Park, Byung Jun, Yoo, Haeun, Oh, Tae Hoon, Lee, Jay H., Lee, Jong Min
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Ltd 01.03.2020
Témata:
ISSN:0959-1524, 1873-2771
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:•A model-based deep reinforcement learning (DRL) algorithm, which solves the Hamilton–Jacobi–Bellman equation for finite-horizon optimal control of nonlinear control-affine system is developed.•Deep neural networks (DNNs) are implemented to approximate the value function, its first-order derivative (i.e., costate function), and the policy function.•State-of-the-art DRL methods are incorporated to efficiently train the DNNs.•The use of DNNs has allow for the application of the algorithm to the high-dimensional problem, and is shown to improve the performance of a learned policy in the presence of uncertainty.•Examples involving the batch chemical reactor and a diffusion-convection-reaction system are used to demonstrate the statements. The Hamilton–Jacobi–Bellman (HJB) equation can be solved to obtain optimal closed-loop control policies for general nonlinear systems. As it is seldom possible to solve the HJB equation exactly for nonlinear systems, either analytically or numerically, methods to build approximate solutions through simulation based learning have been studied in various names like neurodynamic programming (NDP) and approximate dynamic programming (ADP). The aspect of learning connects these methods to reinforcement learning (RL), which also tries to learn optimal decision policies through trial-and-error based learning. This study develops a model-based RL method, which iteratively learns the solution to the HJB and its associated equations. We focus particularly on the control-affine system with a quadratic objective function and the finite horizon optimal control (FHOC) problem with time-varying reference trajectories. The HJB solutions for such systems involve time-varying value, costate, and policy functions subject to boundary conditions. To represent the time-varying HJB solution in high-dimensional state space in a general and efficient way, deep neural networks (DNNs) are employed. It is shown that the use of DNNs, compared to shallow neural networks (SNNs), can significantly improve the performance of a learned policy in the presence of uncertain initial state and state noise. Examples involving a batch chemical reactor and a one-dimensional diffusion-convection-reaction system are used to demonstrate this and other key aspects of the method.
ISSN:0959-1524
1873-2771
DOI:10.1016/j.jprocont.2020.02.003