Bayesian optimization for hyper-parameter tuning of an improved twin delayed deep deterministic policy gradients based energy management strategy for plug-in hybrid electric vehicles

Hybridization and electrification of vehicles are underway to achieve Net-zero emissions for road transport. The upcoming deep reinforcement learning (DRL) algorithm shows great promise for the efficient energy management of PHEVs, as it provides the potential to achieve theoretical optimal performa...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied energy Jg. 381; S. 125171
Hauptverfasser: Wang, Jinhai, Du, Changqing, Yan, Fuwu, Hua, Min, Gongye, Xiangyu, Yuan, Quan, Xu, Hongming, Zhou, Quan
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier Ltd 01.03.2025
Schlagworte:
ISSN:0306-2619
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Hybridization and electrification of vehicles are underway to achieve Net-zero emissions for road transport. The upcoming deep reinforcement learning (DRL) algorithm shows great promise for the efficient energy management of PHEVs, as it provides the potential to achieve theoretical optimal performance. However, the brittle convergence properties, high sample complexity, and sensitivity to hyper-parameters of DRL algorithms have been major challenges in this field, limiting the applicability of DRL to real-world tasks. A novel EMS for PHEV based on Bayesian Optimization (BO) and improved Twin Delay Deep Deterministic Policy Gradient (TD3) algorithm is proposed in this paper, in which BO is introduced to optimize the TD3 hyper-parameters and a non-parametric reward function (NRF) is designed to improve the TD3 algorithm (BO-NRTD3). The present work addresses two challenges to contribute to the proposed EMS: (1) By hyper-parameter tuning, the TD3 strategy’s brittle convergence and robustness characteristics have been significantly improved; and (2) By designing the non-parametric reward function (NRF), the TD3 strategy can tackle system uncertainties. These findings are validated by comparing with various cutting-edge DRL and DP strategies using Software-in-the-Loop (SiL) and Hardware-in-the-Loop (HiL) tests. The results show that the energy economy of the BO-NRTD3 strategy is up to 98.15% of DP and 4.23% more robust than the parametric reward function TD3 (PR-TD3) strategy. •Improved TD3 strategy’s convergence and robustness via Bayesian Optimization.•Objective function ensures fast convergence of the reward function and its magnitude.•Non-parametric reward function enhances TD3’s adaptability to system uncertainties.•BO-NRTD3 strategy validated through Software-in-the-Loop and Hardware-in-the-Loop.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0306-2619
DOI:10.1016/j.apenergy.2024.125171