Dynamic collision avoidance for maritime autonomous surface ships based on deep Q-network with velocity obstacle method

To address the dynamic obstacle environment collision avoidance challenge of the marine autonomous surface ships (MASS), a decision-making method based on the deep Q-learning (DQN) and velocity obstacle (VO) algorithm is proposed. Firstly, the encounter situation identification criteria are optimize...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Ocean engineering Ročník 320; s. 120335
Hlavní autoři: Li, Yuqin, Wu, Defeng, Wang, Hongdong, Lou, Jiankun
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Ltd 15.03.2025
Témata:
ISSN:0029-8018
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:To address the dynamic obstacle environment collision avoidance challenge of the marine autonomous surface ships (MASS), a decision-making method based on the deep Q-learning (DQN) and velocity obstacle (VO) algorithm is proposed. Firstly, the encounter situation identification criteria are optimized, and a method for random collision scenario generation is designed. The model’s performance is comprehensively evaluated by generating a wide variety of random collision scenarios which provide a broader assessment compared to manually set scenarios. Furthermore, a complete reward function for the dynamic collision avoidance problem is proposed, in which combines ship collision risk, the velocity obstacle method, and the International Regulations for Preventing Collisions at Sea (COLREGs). The MASS is not only guided towards the target by this reward function but is also ensured to comply with COLREGs during the collision avoidance process. It is worth noting that the trained model does not require retraining when faced with different numbers of target ships (TS). Simulation experiments are conducted with the trained model, involving random encounters with 1 to 10 TS in open waters. The results indicate that the proposed method demonstrates better collision avoidance performance compared to the DQN and proximal policy optimization algorithms. •A new DQN-VO method improves the DQN reward function using the VO method.•A state space for dynamic collision avoidance handles varying TS without retraining.•Optimized criteria for encounter scenarios and a scene generation process developed.
ISSN:0029-8018
DOI:10.1016/j.oceaneng.2025.120335