A reactive power optimization partially observable Markov decision process with data uncertainty using multi-agent actor-attention-critic algorithm
•A novel POMDP modelling method for reactive power optimization of the ADN which fills the gap in the solution to the optimization problem with significant data uncertainty.•Introduce a technique for transforming the POMDP model into the belief state which can be easily solved by the RL algorithm.•U...
Uloženo v:
| Vydáno v: | International journal of electrical power & energy systems Ročník 147; s. 108848 |
|---|---|
| Hlavní autoři: | , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier Ltd
01.05.2023
|
| Témata: | |
| ISSN: | 0142-0615, 1879-3517 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | •A novel POMDP modelling method for reactive power optimization of the ADN which fills the gap in the solution to the optimization problem with significant data uncertainty.•Introduce a technique for transforming the POMDP model into the belief state which can be easily solved by the RL algorithm.•Use a misestimated state probability vector to consider data uncertainty in the belief state update process.•Match multiple agents with multiple reactive compensation devices and adopt a novel multi-agent actor-attention-critic algorithm to solve the proposed belief MDP model.
We present an innovative partially observable Markov decision process (POMDP) modelling method for the reactive power optimization process of the active distribution network (ADN) under the high permeability of the distributed generation. This model is tolerant of the uncertainty resulting from data uncertainty. We believe that the belief state space in the POMDP model corresponds to the state space in the Markov decision process (MDP) model, and we apply the multi-agent actor-attention-critic (MAAC) reinforcement learning (RL) algorithm to the proposed model. This technique extracts the most effective information with the highest quality from the huge historical measurement database, hence enhancing the learning effectiveness of agents and the stability of the optimization strategy. We simulate reactive power optimization in a modified IEEE-33 nodes ADN and a modified IEEE-123nodes ADN. The simulation demonstrates the stability and economic superiority of the proposed approach under varying degrees of data uncertainty relative to previous RL algorithms based on the MDP model. The simulation demonstrates that the proposed POMDP model is more appropriate for the real operation of the partially observable distribution network than the MDP models. And the optimal strategy obtained by the proposed MAAC algorithm is reliable with deteriorating data quality. |
|---|---|
| ISSN: | 0142-0615 1879-3517 |
| DOI: | 10.1016/j.ijepes.2022.108848 |