Bayesian optimization for hyper-parameter tuning of an improved twin delayed deep deterministic policy gradients based energy management strategy for plug-in hybrid electric vehicles
Hybridization and electrification of vehicles are underway to achieve Net-zero emissions for road transport. The upcoming deep reinforcement learning (DRL) algorithm shows great promise for the efficient energy management of PHEVs, as it provides the potential to achieve theoretical optimal performa...
Saved in:
| Published in: | Applied energy Vol. 381; p. 125171 |
|---|---|
| Main Authors: | , , , , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier Ltd
01.03.2025
|
| Subjects: | |
| ISSN: | 0306-2619 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Hybridization and electrification of vehicles are underway to achieve Net-zero emissions for road transport. The upcoming deep reinforcement learning (DRL) algorithm shows great promise for the efficient energy management of PHEVs, as it provides the potential to achieve theoretical optimal performance. However, the brittle convergence properties, high sample complexity, and sensitivity to hyper-parameters of DRL algorithms have been major challenges in this field, limiting the applicability of DRL to real-world tasks. A novel EMS for PHEV based on Bayesian Optimization (BO) and improved Twin Delay Deep Deterministic Policy Gradient (TD3) algorithm is proposed in this paper, in which BO is introduced to optimize the TD3 hyper-parameters and a non-parametric reward function (NRF) is designed to improve the TD3 algorithm (BO-NRTD3). The present work addresses two challenges to contribute to the proposed EMS: (1) By hyper-parameter tuning, the TD3 strategy’s brittle convergence and robustness characteristics have been significantly improved; and (2) By designing the non-parametric reward function (NRF), the TD3 strategy can tackle system uncertainties. These findings are validated by comparing with various cutting-edge DRL and DP strategies using Software-in-the-Loop (SiL) and Hardware-in-the-Loop (HiL) tests. The results show that the energy economy of the BO-NRTD3 strategy is up to 98.15% of DP and 4.23% more robust than the parametric reward function TD3 (PR-TD3) strategy.
•Improved TD3 strategy’s convergence and robustness via Bayesian Optimization.•Objective function ensures fast convergence of the reward function and its magnitude.•Non-parametric reward function enhances TD3’s adaptability to system uncertainties.•BO-NRTD3 strategy validated through Software-in-the-Loop and Hardware-in-the-Loop. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 0306-2619 |
| DOI: | 10.1016/j.apenergy.2024.125171 |