A novel sim2real reinforcement learning algorithm for process control

While reinforcement learning (RL) has potential in advanced process control and optimization, its direct interaction with real industrial processes can pose safety concerns. Model-based pre-training of RL may alleviate such risks. However, the intricate nature of industrial processes complicates the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Reliability engineering & system safety Jg. 254; S. 110639
Hauptverfasser: Liang, Huiping, Xie, Junyao, Huang, Biao, Li, Yonggang, Sun, Bei, Yang, Chunhua
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier Ltd 01.02.2025
Schlagworte:
ISSN:0951-8320
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:While reinforcement learning (RL) has potential in advanced process control and optimization, its direct interaction with real industrial processes can pose safety concerns. Model-based pre-training of RL may alleviate such risks. However, the intricate nature of industrial processes complicates the establishment of entirely accurate simulation models. Consequently, RL-based controllers relying on simulation models can easily suffer from model-plant mismatch. On the one hand, utilizing offline data for pre-training of RL can also mitigate safety risks. However, it requires well-represented historical datasets. This is demanding because industrial processes mostly run under a regulatory mode with basic controllers. To handle these issues, this paper proposes a novel sim2real reinforcement learning algorithm. First, a state adaptor (SA) is proposed to align simulated states with real states to mitigate the model-plant mismatch. Then, a fix-horizon return is designed to replace traditional infinite-step return to provide genuine labels for the critic network, enhancing learning efficiency and stability. Finally, applying proximal policy optimization (PPO), the SA-PPO method is introduced to implement the proposed sim2real algorithm. Experimental results show that SA-PPO improves performance in MSE by 1.96% and in R by 21.64% on average for roasting process simulation. This verifies the effectiveness of the proposed method. •A new sim2real RL method is introduced for process control.•State adapter significantly mitigates modeling error impact on RL control performance.•Fixed-horizon return enhances learning efficiency in process control.•Numerical and roasting process simulations validate the method’s effectiveness.
ISSN:0951-8320
DOI:10.1016/j.ress.2024.110639