A novel sim2real reinforcement learning algorithm for process control
While reinforcement learning (RL) has potential in advanced process control and optimization, its direct interaction with real industrial processes can pose safety concerns. Model-based pre-training of RL may alleviate such risks. However, the intricate nature of industrial processes complicates the...
Uloženo v:
| Vydáno v: | Reliability engineering & system safety Ročník 254; s. 110639 |
|---|---|
| Hlavní autoři: | , , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier Ltd
01.02.2025
|
| Témata: | |
| ISSN: | 0951-8320 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | While reinforcement learning (RL) has potential in advanced process control and optimization, its direct interaction with real industrial processes can pose safety concerns. Model-based pre-training of RL may alleviate such risks. However, the intricate nature of industrial processes complicates the establishment of entirely accurate simulation models. Consequently, RL-based controllers relying on simulation models can easily suffer from model-plant mismatch. On the one hand, utilizing offline data for pre-training of RL can also mitigate safety risks. However, it requires well-represented historical datasets. This is demanding because industrial processes mostly run under a regulatory mode with basic controllers. To handle these issues, this paper proposes a novel sim2real reinforcement learning algorithm. First, a state adaptor (SA) is proposed to align simulated states with real states to mitigate the model-plant mismatch. Then, a fix-horizon return is designed to replace traditional infinite-step return to provide genuine labels for the critic network, enhancing learning efficiency and stability. Finally, applying proximal policy optimization (PPO), the SA-PPO method is introduced to implement the proposed sim2real algorithm. Experimental results show that SA-PPO improves performance in MSE by 1.96% and in R by 21.64% on average for roasting process simulation. This verifies the effectiveness of the proposed method.
•A new sim2real RL method is introduced for process control.•State adapter significantly mitigates modeling error impact on RL control performance.•Fixed-horizon return enhances learning efficiency in process control.•Numerical and roasting process simulations validate the method’s effectiveness. |
|---|---|
| ISSN: | 0951-8320 |
| DOI: | 10.1016/j.ress.2024.110639 |