A reactive power optimization partially observable Markov decision process with data uncertainty using multi-agent actor-attention-critic algorithm

•A novel POMDP modelling method for reactive power optimization of the ADN which fills the gap in the solution to the optimization problem with significant data uncertainty.•Introduce a technique for transforming the POMDP model into the belief state which can be easily solved by the RL algorithm.•U...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	International journal of electrical power & energy systems Ročník 147; s. 108848
Hlavní autoři:	Gu, Yaru, Huang, Xueliang
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier Ltd 01.05.2023
Témata:	Data uncertainty Multi-agent reinforcement learning algorithm Partially Observable Markov Decision Process(POMDP) Reactive power dispatching optimization model Voltage control Reactive power dispatching optimization model Multi-agent reinforcement learning algorithm Partially Observable Markov Decision Process(POMDP) Voltage control Data uncertainty
ISSN:	0142-0615, 1879-3517
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	•A novel POMDP modelling method for reactive power optimization of the ADN which fills the gap in the solution to the optimization problem with significant data uncertainty.•Introduce a technique for transforming the POMDP model into the belief state which can be easily solved by the RL algorithm.•Use a misestimated state probability vector to consider data uncertainty in the belief state update process.•Match multiple agents with multiple reactive compensation devices and adopt a novel multi-agent actor-attention-critic algorithm to solve the proposed belief MDP model. We present an innovative partially observable Markov decision process (POMDP) modelling method for the reactive power optimization process of the active distribution network (ADN) under the high permeability of the distributed generation. This model is tolerant of the uncertainty resulting from data uncertainty. We believe that the belief state space in the POMDP model corresponds to the state space in the Markov decision process (MDP) model, and we apply the multi-agent actor-attention-critic (MAAC) reinforcement learning (RL) algorithm to the proposed model. This technique extracts the most effective information with the highest quality from the huge historical measurement database, hence enhancing the learning effectiveness of agents and the stability of the optimization strategy. We simulate reactive power optimization in a modified IEEE-33 nodes ADN and a modified IEEE-123nodes ADN. The simulation demonstrates the stability and economic superiority of the proposed approach under varying degrees of data uncertainty relative to previous RL algorithms based on the MDP model. The simulation demonstrates that the proposed POMDP model is more appropriate for the real operation of the partially observable distribution network than the MDP models. And the optimal strategy obtained by the proposed MAAC algorithm is reliable with deteriorating data quality.
ISSN:	0142-0615 1879-3517
DOI:	10.1016/j.ijepes.2022.108848