Improving YOLOv8 with Scattering Transform and Attention for Maritime Awareness

Ship recognition and georeferencing using monitoring cameras are crucial to many applications in maritime situational awareness. Although deep learning algorithms are available for ship recognition tasks, there is a need for innovative approaches that attain higher precision rates irrespective of sh...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2023 International Symposium on Image and Signal Processing and Analysis (ISPA) s. 1 - 6
Hlavní autoři: Carrillo-Perez, Borja, Rodriguez, Angel Bueno, Barnes, Sarah, Stephan, Maurice
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 18.09.2023
Témata:
ISSN:1849-2266
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Ship recognition and georeferencing using monitoring cameras are crucial to many applications in maritime situational awareness. Although deep learning algorithms are available for ship recognition tasks, there is a need for innovative approaches that attain higher precision rates irrespective of ship sizes, types, or physical hardware limitations. Furthermore, their deployment in maritime environments requires embedded systems capable of image processing, with balanced accuracy, reduced latency and low energy consumption. To achieve that, we build upon the foundations of the standard YOLOv8 and present a novel architecture that improves the segmentation and georeferencing of ships in the context of maritime awareness using a real-world dataset (ShipSG). Our architecture synergizes global and local features in the image for improved ship segmentation and georeferencing. The 2D scattering-transform enhances the YOLOv8 backbone by extracting global structural features from the image. The addition of convolutional block attention module (CBAM) in the head allows focusing on relevant spatial and channel-wise regions. We achieve mAP of 75.46%, comparable to larger YOLOv8 models at a much faster inference speed, 59.3 milliseconds per image, when deployed on the NVIDIA Jetson Xavier AGX as target embedded system. We applied the modified network to georeference the segmented ship masks, with a georeferencing distance error of 18 meters, which implies comparable georeferencing performance to non-embedded approaches.
ISSN:1849-2266
DOI:10.1109/ISPA58351.2023.10279352