Autonomous navigation and collision avoidance for unmanned surface vehicle based on TD3-PD algorithm with CNN-GRU network

•A novel state space, action space, and heuristic reward function are developed, leveraging radar sensor data to capture local obstacle information. This design enhances the ability of an unmanned surface vehicle (USV) to acquire information in unknown environments and improves its interaction with...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Ocean engineering Jg. 341; S. 122633
Hauptverfasser:	Wei, Zhengfeng, Wang, Qingling
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Elsevier Ltd 01.12.2025
Schlagworte:	Gated recurrent unit Proportional derivative controller Reinforcement learning Twin delay deep deterministic policy gradient Unmanned surface vessel Twin delay deep deterministic policy gradient Gated recurrent unit Proportional derivative controller Unmanned surface vessel Reinforcement learning
ISSN:	0029-8018
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•A novel state space, action space, and heuristic reward function are developed, leveraging radar sensor data to capture local obstacle information. This design enhances the ability of an unmanned surface vehicle (USV) to acquire information in unknown environments and improves its interaction with dynamic surroundings.•A high-performance fusion network is proposed by integrating convolutional neural networks (CNN) and gated recurrent units (GRU) within the twin delayed deep deterministic policy gradient (TD3) framework. The resulting CGTD3 algorithm efficiently utilizes historical interaction data, accelerating convergence and optimizing overall performance.•A novel action correction mechanism is introduced by incorporating a proportional-derivative (PD) feedback controller into the TD3 algorithm’s action outputs. This mechanism compensates for reinforcement learning limitations, improving exploration efficiency, output accuracy, and decision-making precision.•A multi-stage training strategy is proposed, consisting of a foundational learning phase for target tracking and an online optimization phase for obstacle avoidance. Comprehensive testing in diverse simulation environments demonstrates its superior performance in complex scenarios. This paper addresses the challenges of navigation and obstacle avoidance faced by an unmanned surface vehicle (USV) in complex environments through the application of the twin delayed deep deterministic policy gradient (TD3) algorithm. To improve the algorithm’s efficiency, the convolutional neural network (CNN) is integrated with the gated recurrent unit (GRU), resulting in a CNN-GRU-TD3 (CGTD3) algorithm, which maps system states directly to control commands in an end-to-end manner. Furthermore, the introduction of a proportional-derivative (PD) feedback controller leads to the development of the CGTD3-PD algorithm, which improves both the accuracy and robustness of the control system’s action outputs. For enhanced performance, a multi-stage training framework is introduced, comprising foundational learning followed by online optimization. During the foundational learning phase, a model with strong generalization capabilities is developed, while the online optimization phase focuses on iteratively refining the navigation strategy. Simulation results demonstrate that the proposed algorithm significantly enhances the navigation and obstacle avoidance performance of USVs in complex scenarios.
ISSN:	0029-8018
DOI:	10.1016/j.oceaneng.2025.122633