Deep deterministic policy gradient algorithm based on dung beetle optimization and priority experience replay mechanism

Reinforcement learning algorithms that handle continuous action spaces have the problem of slow convergence and local optimality. Hence, we propose a deep deterministic policy gradient algorithm based on the dung beetle optimization algorithm (DBOP–DDPG) and priority experience replay mechanism. Thi...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Scientific reports Ročník 15; číslo 1; s. 13978 - 14
Hlavní autori:	Zhu, Hengwei, Rong, Chuiting, Liu, Haorui
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	London Nature Publishing Group UK 22.04.2025 Nature Portfolio
Predmet:	639/705/117 639/705/258 Humanities and Social Sciences multidisciplinary Science Science (multidisciplinary)
ISSN:	2045-2322, 2045-2322
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Reinforcement learning algorithms that handle continuous action spaces have the problem of slow convergence and local optimality. Hence, we propose a deep deterministic policy gradient algorithm based on the dung beetle optimization algorithm (DBOP–DDPG) and priority experience replay mechanism. This method first adopts the simultaneous search policy of multiple populations by introducing the dung beetle optimizer (DBO), which can effectively keep the algorithm from falling into the local optimum solution and improve global optimization capability. Then, we design a criterion for determining the priority of sample data. The experience replay mechanism sampling is improved, and sample data in the experience replay mechanism are stored in three replay mechanisms based on importance for subsequent sampling training to then improve the algorithm’s convergence speed. Finally, tests were conducted in three classic control environments of OpenAI Gym. The results showed that the improved method improved the convergence speed by at least 10% compared with the comparison algorithm, and the cumulative reward value was increased by up to 150.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2045-2322 2045-2322
DOI:	10.1038/s41598-025-99213-3