Relative Entropy of Correct Proximal Policy Optimization Algorithms with Modified Penalty Factor in Complex Environment

In the field of reinforcement learning, we propose a Correct Proximal Policy Optimization (CPPO) algorithm based on the modified penalty factor β and relative entropy in order to solve the robustness and stationarity of traditional algorithms. Firstly, In the process of reinforcement learning, this...

Full description

Saved in:
Bibliographic Details
Published in:Entropy (Basel, Switzerland) Vol. 24; no. 4; p. 440
Main Authors: Chen, Weimin, Wong, Kelvin Kian Loong, Long, Sifan, Sun, Zhili
Format: Journal Article
Language:English
Published: Switzerland MDPI AG 22.03.2022
MDPI
Subjects:
ISSN:1099-4300, 1099-4300
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Be the first to leave a comment!
You must be logged in first