Game theory-based electricity pricing and microgrids management using online deep reinforcement learning

This study addresses a bi-level problem involving a retailer and multiple residential microgrids. The retailer, at the upper level, disseminates selling and buying electricity price signals to maximize profit, while microgrid agents, at the lower level, manage their resources based on these signals...

Full description

Saved in:
Bibliographic Details
Published in:Applied soft computing Vol. 182; p. 113621
Main Authors: Shademan, Mahdi, Azizi, Ali, Jadid, Shahram
Format: Journal Article
Language:English
Published: Elsevier B.V 01.10.2025
Subjects:
ISSN:1568-4946
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This study addresses a bi-level problem involving a retailer and multiple residential microgrids. The retailer, at the upper level, disseminates selling and buying electricity price signals to maximize profit, while microgrid agents, at the lower level, manage their resources based on these signals to minimize costs. Additionally, a distribution system operator oversees network constraints. The interaction between microgrids and the retailer is modeled as a Stackelberg game, allowing for double-sided trading. To deal with uncertainties related to sustainable resources, loads, and wholesale market prices, a hybrid fuzzy/stochastic optimization (HFSO) approach is employed. This method combines fuzzy chance-constrained programming at the upper level with risk-neutral programming at the lower level. Due to privacy-preserving concerns, the deep reinforcement learning approach is used to solve this problem. This approach is evolved to online learning to prevent data drift, especially when the load profile changes, and attain an acceptable answer quickly. To prove this claim, the ability to predict profit over a relatively long period is investigated for both the offline learning method and the proposed online learning method. The results show that the offline learning method has a prediction error of 15.54 %, while the online learning method has only a 1.8 % error. Specifically, the online learning method can predict the profit that the retailer will obtain with 96.75 % accuracy, while the offline learning method's prediction fails with −150.64 % accuracy. Also, the online learning method can predict each microgrid’s power transactions with more than 89.6 % accuracy. [Display omitted] •Proposed an online-iterative DRL that adapts to load growth, enables fast decisions, and ensures data privacy.•Used GMDH network for self-adaptive structure, making DRL retraining faster and more efficient.•Tackled energy price/load/renewable uncertainty via hybrid fuzzy-stochastic optimization in the neural model.•Designed a pricing scheme enabling DSO dual-sided signals while cutting wholesale market dependency.•Designed a pricing scheme enabling DSO dual-sided signals while cutting wholesale market dependency.
ISSN:1568-4946
DOI:10.1016/j.asoc.2025.113621