A novel sim2real reinforcement learning algorithm for process control

While reinforcement learning (RL) has potential in advanced process control and optimization, its direct interaction with real industrial processes can pose safety concerns. Model-based pre-training of RL may alleviate such risks. However, the intricate nature of industrial processes complicates the...

Full description

Saved in:
Bibliographic Details
Published in:Reliability engineering & system safety Vol. 254; p. 110639
Main Authors: Liang, Huiping, Xie, Junyao, Huang, Biao, Li, Yonggang, Sun, Bei, Yang, Chunhua
Format: Journal Article
Language:English
Published: Elsevier Ltd 01.02.2025
Subjects:
ISSN:0951-8320
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:While reinforcement learning (RL) has potential in advanced process control and optimization, its direct interaction with real industrial processes can pose safety concerns. Model-based pre-training of RL may alleviate such risks. However, the intricate nature of industrial processes complicates the establishment of entirely accurate simulation models. Consequently, RL-based controllers relying on simulation models can easily suffer from model-plant mismatch. On the one hand, utilizing offline data for pre-training of RL can also mitigate safety risks. However, it requires well-represented historical datasets. This is demanding because industrial processes mostly run under a regulatory mode with basic controllers. To handle these issues, this paper proposes a novel sim2real reinforcement learning algorithm. First, a state adaptor (SA) is proposed to align simulated states with real states to mitigate the model-plant mismatch. Then, a fix-horizon return is designed to replace traditional infinite-step return to provide genuine labels for the critic network, enhancing learning efficiency and stability. Finally, applying proximal policy optimization (PPO), the SA-PPO method is introduced to implement the proposed sim2real algorithm. Experimental results show that SA-PPO improves performance in MSE by 1.96% and in R by 21.64% on average for roasting process simulation. This verifies the effectiveness of the proposed method. •A new sim2real RL method is introduced for process control.•State adapter significantly mitigates modeling error impact on RL control performance.•Fixed-horizon return enhances learning efficiency in process control.•Numerical and roasting process simulations validate the method’s effectiveness.
ISSN:0951-8320
DOI:10.1016/j.ress.2024.110639