Bayesian optimization for hyper-parameter tuning of an improved twin delayed deep deterministic policy gradients based energy management strategy for plug-in hybrid electric vehicles

Hybridization and electrification of vehicles are underway to achieve Net-zero emissions for road transport. The upcoming deep reinforcement learning (DRL) algorithm shows great promise for the efficient energy management of PHEVs, as it provides the potential to achieve theoretical optimal performa...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Applied energy Ročník 381; s. 125171
Hlavní autori: Wang, Jinhai, Du, Changqing, Yan, Fuwu, Hua, Min, Gongye, Xiangyu, Yuan, Quan, Xu, Hongming, Zhou, Quan
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Elsevier Ltd 01.03.2025
Predmet:
ISSN:0306-2619
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Hybridization and electrification of vehicles are underway to achieve Net-zero emissions for road transport. The upcoming deep reinforcement learning (DRL) algorithm shows great promise for the efficient energy management of PHEVs, as it provides the potential to achieve theoretical optimal performance. However, the brittle convergence properties, high sample complexity, and sensitivity to hyper-parameters of DRL algorithms have been major challenges in this field, limiting the applicability of DRL to real-world tasks. A novel EMS for PHEV based on Bayesian Optimization (BO) and improved Twin Delay Deep Deterministic Policy Gradient (TD3) algorithm is proposed in this paper, in which BO is introduced to optimize the TD3 hyper-parameters and a non-parametric reward function (NRF) is designed to improve the TD3 algorithm (BO-NRTD3). The present work addresses two challenges to contribute to the proposed EMS: (1) By hyper-parameter tuning, the TD3 strategy’s brittle convergence and robustness characteristics have been significantly improved; and (2) By designing the non-parametric reward function (NRF), the TD3 strategy can tackle system uncertainties. These findings are validated by comparing with various cutting-edge DRL and DP strategies using Software-in-the-Loop (SiL) and Hardware-in-the-Loop (HiL) tests. The results show that the energy economy of the BO-NRTD3 strategy is up to 98.15% of DP and 4.23% more robust than the parametric reward function TD3 (PR-TD3) strategy. •Improved TD3 strategy’s convergence and robustness via Bayesian Optimization.•Objective function ensures fast convergence of the reward function and its magnitude.•Non-parametric reward function enhances TD3’s adaptability to system uncertainties.•BO-NRTD3 strategy validated through Software-in-the-Loop and Hardware-in-the-Loop.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0306-2619
DOI:10.1016/j.apenergy.2024.125171