A Modified ALOS Method of Path Tracking for AUVs with Reinforcement Learning Accelerated by Dynamic Data-Driven AUV Model

Path tracking has a significant impact on the success of long-term autonomous underwater vehicle (AUV) missions in terms of safety, energy-saving, and efficiency. However, it is a challenging problem due to the model uncertainty, and ocean current disturbance. Moreover, the widely used line of sight...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of intelligent & robotic systems Ročník 104; číslo 3; s. 49
Hlavní autoři: Wang, Dianrui, He, Bo, Shen, Yue, Li, Guangliang, Chen, Guanzhong
Médium: Journal Article
Jazyk:angličtina
Vydáno: Dordrecht Springer Netherlands 01.03.2022
Springer
Springer Nature B.V
Témata:
ISSN:0921-0296, 1573-0409
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Path tracking has a significant impact on the success of long-term autonomous underwater vehicle (AUV) missions in terms of safety, energy-saving, and efficiency. However, it is a challenging problem due to the model uncertainty, and ocean current disturbance. Moreover, the widely used line of sight (LOS) algorithm with fixed lookahead distance does not perform well because it requires an urgent need for the automatic adjustment of the parameter. Considering the above, this study proposes an adaptive line-of-sight (ALOS) guidance method with reinforcement learning (RL) based on the dynamic data-driven AUV model (DDDAM). Firstly, we introduced a detailed AUV dynamic model mainly including the models with and without current influence. Next, we conducted a detailed analysis of the path tracking error dynamics and the factors influencing the tracking performance based on the model proposed above. We then used the DDDAM (using long short-term memory (LSTM) neural network) to pre-train the RL framework to generate more samples for online learning in order to speed up the learning process. Finally, the deterministic policy gradient (DPG) based RL was designed to optimize the continuously varying lookahead distance considering the previously analyzed factors. Collectively, this paper presents simulation cases and an evaluation of the algorithm. Our results indicate that the proposed method significantly improves the performance of path tracking with effectiveness and robustness.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0921-0296
1573-0409
DOI:10.1007/s10846-021-01504-0