Optimization of reward shaping function based on genetic algorithm applied to a cross validated deep deterministic policy gradient in a powered landing guidance problem

One major capability of a Deep Reinforcement Learning (DRL) agent to control a specific vehicle in an environment without any prior knowledge is decision-making based on a well-designed reward shaping function. An important but little-studied major factor that can alter significantly the training re...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Engineering applications of artificial intelligence Jg. 120; S. 105798
Hauptverfasser: Nugroho, Larasmoyo, Andiarti, Rika, Akmeliawati, Rini, Kutay, Ali Türker, Larasati, Diva Kartika, Wijaya, Sastra Kusuma
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier Ltd 01.04.2023
Schlagworte:
ISSN:0952-1976, 1873-6769
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:One major capability of a Deep Reinforcement Learning (DRL) agent to control a specific vehicle in an environment without any prior knowledge is decision-making based on a well-designed reward shaping function. An important but little-studied major factor that can alter significantly the training reward score and performance outcomes is the reward shaping function. To maximize the control efficacy of a DRL algorithm, an optimized reward shaping function and a solid hyperparameter combination are essential. In order to achieve optimal control during the powered descent guidance (PDG) landing phase of a reusable launch vehicle, the Deep Deterministic Policy Gradient (DDPG) algorithm is used in this paper to discover the best shape of the reward shaping function (RSF). Although DDPG is quite capable of managing complex environments and producing actions intended for continuous spaces, its state and action performance could still be improved. A reference DDPG agent with the original reward shaping function and a PID controller were placed side by side with the GA-DDPG agent using GA-optimized RSF. The best GA-DDPG individual can maximize overall rewards and minimize state errors with the help of the potential-based GA(PbGA) searched RSF, maintaining the highest fitness score among all individuals after has been cross-validated and retested extensively Monte-Carlo experimental results.
ISSN:0952-1976
1873-6769
DOI:10.1016/j.engappai.2022.105798