Energy Regulation-Aware Layered Control Architecture for Building Energy Systems Using Constraint-Aware Deep Reinforcement Learning and Virtual Energy Storage Modeling
In modern intelligent buildings, the control of Building Energy Systems (BES) faces increasing complexity in balancing energy costs, thermal comfort, and operational flexibility. Traditional centralized or flat deep reinforcement learning (DRL) methods often fail to effectively handle the multi-time...
Saved in:
| Published in: | Energies (Basel) Vol. 18; no. 17; p. 4698 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Basel
MDPI AG
01.09.2025
|
| Subjects: | |
| ISSN: | 1996-1073, 1996-1073 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | In modern intelligent buildings, the control of Building Energy Systems (BES) faces increasing complexity in balancing energy costs, thermal comfort, and operational flexibility. Traditional centralized or flat deep reinforcement learning (DRL) methods often fail to effectively handle the multi-timescale dynamics, large state–action spaces, and strict constraint satisfaction required for real-world energy systems. To address these challenges, this paper proposes an energy policy-aware layered control architecture that combines Virtual Energy Storage System (VESS) modeling with a novel Dynamic Constraint-Aware Policy Optimization (DCPO) algorithm. The VESS is modeled based on the thermal inertia of building envelope components, quantifying flexibility in terms of virtual power, capacity, and state of charge, thus enabling BES to behave as if it had embedded, non-physical energy storage. Building on this, the BES control problem is structured using a hierarchical Markov Decision Process, in which the upper level handles strategic decisions (e.g., VESS dispatch, HVAC modes), while the lower level manages real-time control (e.g., temperature adjustments, load balancing). The proposed DCPO algorithm extends actor–critic learning by incorporating dynamic policy constraints, entropy regularization, and adaptive clipping to ensure feasible and efficient policy learning under both operational and comfort-related constraints. Simulation experiments demonstrate that the proposed approach outperforms established algorithms like Deep Q-Networks (DQN), Deep Deterministic Policy Gradient (DDPG), and Twin Delayed DDPG (TD3). Specifically, it achieves a 32.6% reduction in operational costs and over a 51% decrease in thermal comfort violations compared to DQN, while ensuring millisecond-level policy generation suitable for real-time BES deployment. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1996-1073 1996-1073 |
| DOI: | 10.3390/en18174698 |