Game theory-based electricity pricing and microgrids management using online deep reinforcement learning

This study addresses a bi-level problem involving a retailer and multiple residential microgrids. The retailer, at the upper level, disseminates selling and buying electricity price signals to maximize profit, while microgrid agents, at the lower level, manage their resources based on these signals...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Applied soft computing Ročník 182; s. 113621
Hlavní autoři:	Shademan, Mahdi, Azizi, Ali, Jadid, Shahram
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier B.V 01.10.2025
Témata:	Deep reinforcement learning Demand response Fuzzy chance constrained programming Microgrids Model-drift Resource management Model-drift Fuzzy chance constrained programming Deep reinforcement learning Demand response Resource management Microgrids
ISSN:	1568-4946
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	This study addresses a bi-level problem involving a retailer and multiple residential microgrids. The retailer, at the upper level, disseminates selling and buying electricity price signals to maximize profit, while microgrid agents, at the lower level, manage their resources based on these signals to minimize costs. Additionally, a distribution system operator oversees network constraints. The interaction between microgrids and the retailer is modeled as a Stackelberg game, allowing for double-sided trading. To deal with uncertainties related to sustainable resources, loads, and wholesale market prices, a hybrid fuzzy/stochastic optimization (HFSO) approach is employed. This method combines fuzzy chance-constrained programming at the upper level with risk-neutral programming at the lower level. Due to privacy-preserving concerns, the deep reinforcement learning approach is used to solve this problem. This approach is evolved to online learning to prevent data drift, especially when the load profile changes, and attain an acceptable answer quickly. To prove this claim, the ability to predict profit over a relatively long period is investigated for both the offline learning method and the proposed online learning method. The results show that the offline learning method has a prediction error of 15.54 %, while the online learning method has only a 1.8 % error. Specifically, the online learning method can predict the profit that the retailer will obtain with 96.75 % accuracy, while the offline learning method's prediction fails with −150.64 % accuracy. Also, the online learning method can predict each microgrid’s power transactions with more than 89.6 % accuracy. [Display omitted] •Proposed an online-iterative DRL that adapts to load growth, enables fast decisions, and ensures data privacy.•Used GMDH network for self-adaptive structure, making DRL retraining faster and more efficient.•Tackled energy price/load/renewable uncertainty via hybrid fuzzy-stochastic optimization in the neural model.•Designed a pricing scheme enabling DSO dual-sided signals while cutting wholesale market dependency.•Designed a pricing scheme enabling DSO dual-sided signals while cutting wholesale market dependency.
ISSN:	1568-4946
DOI:	10.1016/j.asoc.2025.113621