A novel sim2real reinforcement learning algorithm for process control

While reinforcement learning (RL) has potential in advanced process control and optimization, its direct interaction with real industrial processes can pose safety concerns. Model-based pre-training of RL may alleviate such risks. However, the intricate nature of industrial processes complicates the...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Reliability engineering & system safety Ročník 254; s. 110639
Hlavní autoři:	Liang, Huiping, Xie, Junyao, Huang, Biao, Li, Yonggang, Sun, Bei, Yang, Chunhua
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier Ltd 01.02.2025
Témata:	Fix-horizon return Industrial roasting process Model-plant mismatch Process control Reinforcement learning Fix-horizon return Model-plant mismatch Process control Reinforcement learning Industrial roasting process
ISSN:	0951-8320
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	While reinforcement learning (RL) has potential in advanced process control and optimization, its direct interaction with real industrial processes can pose safety concerns. Model-based pre-training of RL may alleviate such risks. However, the intricate nature of industrial processes complicates the establishment of entirely accurate simulation models. Consequently, RL-based controllers relying on simulation models can easily suffer from model-plant mismatch. On the one hand, utilizing offline data for pre-training of RL can also mitigate safety risks. However, it requires well-represented historical datasets. This is demanding because industrial processes mostly run under a regulatory mode with basic controllers. To handle these issues, this paper proposes a novel sim2real reinforcement learning algorithm. First, a state adaptor (SA) is proposed to align simulated states with real states to mitigate the model-plant mismatch. Then, a fix-horizon return is designed to replace traditional infinite-step return to provide genuine labels for the critic network, enhancing learning efficiency and stability. Finally, applying proximal policy optimization (PPO), the SA-PPO method is introduced to implement the proposed sim2real algorithm. Experimental results show that SA-PPO improves performance in MSE by 1.96% and in R by 21.64% on average for roasting process simulation. This verifies the effectiveness of the proposed method. •A new sim2real RL method is introduced for process control.•State adapter significantly mitigates modeling error impact on RL control performance.•Fixed-horizon return enhances learning efficiency in process control.•Numerical and roasting process simulations validate the method’s effectiveness.
ISSN:	0951-8320
DOI:	10.1016/j.ress.2024.110639