Approximate dynamic programming-based approaches for input–output data-driven control of nonlinear processes

We propose two approximate dynamic programming (ADP)-based strategies for control of nonlinear processes using input–output data. In the first strategy, which we term ‘ J -learning,’ one builds an empirical nonlinear model using closed-loop test data and performs dynamic programming with it to deriv...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Automatica (Oxford) Jg. 41; H. 7; S. 1281 - 1288
Hauptverfasser:	Lee, Jong Min, Lee, Jay H.
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Oxford Elsevier Ltd 01.07.2005 Elsevier
Schlagworte:	[formula omitted]-learning Applied sciences Approximate dynamic programming Computer science; control theory; systems Control theory. Systems Exact sciences and technology NARX model Nonlinear model identification Nonlinear model predictive control Process control. Computer integrated manufacturing Reinforcement learning Nonlinear model identification Approximate dynamic programming Q -learning Nonlinear model predictive control NARX model Reinforcement learning Q-learning Process control Non linear control Autoregressive model Empirical model Modeling Dynamic test Model predictive control Model matching Optimal control Non linear model System identification Dynamic programming Learning algorithm Predictive control
ISSN:	0005-1098, 1873-2836
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We propose two approximate dynamic programming (ADP)-based strategies for control of nonlinear processes using input–output data. In the first strategy, which we term ‘ J -learning,’ one builds an empirical nonlinear model using closed-loop test data and performs dynamic programming with it to derive an improved control policy. In the second strategy, called ‘ Q -learning,’ one tries to learn an improved control policy in a model-less manner. Compared to the conventional model predictive control approach, the new approach offers some practical advantages in using nonlinear empirical models for process control. Besides the potential reduction in the on-line computational burden, it offers a convenient way to control the degree of model extrapolation in the calculation of optimal control moves. One major difficulty associated with using an empirical model within the multi-step predictive control setting is that the model can be excessively extrapolated into regions of the state space where identification data were scarce or nonexistent, leading to performances far worse than predicted by the model. Within the proposed ADP-based strategies, this problem is handled by imposing a penalty term designed on the basis of local data distribution. A CSTR example is provided to illustrate the proposed approaches.
ISSN:	0005-1098 1873-2836
DOI:	10.1016/j.automatica.2005.02.006