Dynamic collision avoidance for maritime autonomous surface ships based on deep Q-network with velocity obstacle method
To address the dynamic obstacle environment collision avoidance challenge of the marine autonomous surface ships (MASS), a decision-making method based on the deep Q-learning (DQN) and velocity obstacle (VO) algorithm is proposed. Firstly, the encounter situation identification criteria are optimize...
Saved in:
| Published in: | Ocean engineering Vol. 320; p. 120335 |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier Ltd
15.03.2025
|
| Subjects: | |
| ISSN: | 0029-8018 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | To address the dynamic obstacle environment collision avoidance challenge of the marine autonomous surface ships (MASS), a decision-making method based on the deep Q-learning (DQN) and velocity obstacle (VO) algorithm is proposed. Firstly, the encounter situation identification criteria are optimized, and a method for random collision scenario generation is designed. The model’s performance is comprehensively evaluated by generating a wide variety of random collision scenarios which provide a broader assessment compared to manually set scenarios. Furthermore, a complete reward function for the dynamic collision avoidance problem is proposed, in which combines ship collision risk, the velocity obstacle method, and the International Regulations for Preventing Collisions at Sea (COLREGs). The MASS is not only guided towards the target by this reward function but is also ensured to comply with COLREGs during the collision avoidance process. It is worth noting that the trained model does not require retraining when faced with different numbers of target ships (TS). Simulation experiments are conducted with the trained model, involving random encounters with 1 to 10 TS in open waters. The results indicate that the proposed method demonstrates better collision avoidance performance compared to the DQN and proximal policy optimization algorithms.
•A new DQN-VO method improves the DQN reward function using the VO method.•A state space for dynamic collision avoidance handles varying TS without retraining.•Optimized criteria for encounter scenarios and a scene generation process developed. |
|---|---|
| ISSN: | 0029-8018 |
| DOI: | 10.1016/j.oceaneng.2025.120335 |