Bayesian optimization for hyper-parameter tuning of an improved twin delayed deep deterministic policy gradients based energy management strategy for plug-in hybrid electric vehicles

Hybridization and electrification of vehicles are underway to achieve Net-zero emissions for road transport. The upcoming deep reinforcement learning (DRL) algorithm shows great promise for the efficient energy management of PHEVs, as it provides the potential to achieve theoretical optimal performa...

Full description

Saved in:

Bibliographic Details
Published in:	Applied energy Vol. 381; p. 125171
Main Authors:	Wang, Jinhai, Du, Changqing, Yan, Fuwu, Hua, Min, Gongye, Xiangyu, Yuan, Quan, Xu, Hongming, Zhou, Quan
Format:	Journal Article
Language:	English
Published:	Elsevier Ltd 01.03.2025
Subjects:	algorithms Bayesian optimization Bayesian theory brittleness energy Energy management strategy hybridization issues and policy Non-parametric reward function Plug-in hybrid electric vehicles road transportation Twin Delayed Deep Deterministic Policy Gradients Twin Delayed Deep Deterministic Policy Gradients Energy management strategy Non-parametric reward function Bayesian optimization Plug-in hybrid electric vehicles
ISSN:	0306-2619
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Hybridization and electrification of vehicles are underway to achieve Net-zero emissions for road transport. The upcoming deep reinforcement learning (DRL) algorithm shows great promise for the efficient energy management of PHEVs, as it provides the potential to achieve theoretical optimal performance. However, the brittle convergence properties, high sample complexity, and sensitivity to hyper-parameters of DRL algorithms have been major challenges in this field, limiting the applicability of DRL to real-world tasks. A novel EMS for PHEV based on Bayesian Optimization (BO) and improved Twin Delay Deep Deterministic Policy Gradient (TD3) algorithm is proposed in this paper, in which BO is introduced to optimize the TD3 hyper-parameters and a non-parametric reward function (NRF) is designed to improve the TD3 algorithm (BO-NRTD3). The present work addresses two challenges to contribute to the proposed EMS: (1) By hyper-parameter tuning, the TD3 strategy’s brittle convergence and robustness characteristics have been significantly improved; and (2) By designing the non-parametric reward function (NRF), the TD3 strategy can tackle system uncertainties. These findings are validated by comparing with various cutting-edge DRL and DP strategies using Software-in-the-Loop (SiL) and Hardware-in-the-Loop (HiL) tests. The results show that the energy economy of the BO-NRTD3 strategy is up to 98.15% of DP and 4.23% more robust than the parametric reward function TD3 (PR-TD3) strategy. •Improved TD3 strategy’s convergence and robustness via Bayesian Optimization.•Objective function ensures fast convergence of the reward function and its magnitude.•Non-parametric reward function enhances TD3’s adaptability to system uncertainties.•BO-NRTD3 strategy validated through Software-in-the-Loop and Hardware-in-the-Loop.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0306-2619
DOI:	10.1016/j.apenergy.2024.125171