Dual-stage learning framework for underwater acoustic target recognition with cross-attention mechanism and audio-guided contrastive learning

Underwater acoustic target recognition is crucial for marine exploration and environmental monitoring. However, the redundancy in raw time-domain signals and the challenges in effectively integrating time-frequency representations with audio features hinder current methods. To address these challeng...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Neurocomputing (Amsterdam) Ročník 652; s. 131101
Hlavní autoři: Zhao, Rongyao, Liu, Feng, Zhao, Lyufang, Li, Daihui, Xu, Jing, Liu, Yuanxin, Shen, Tongsheng
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.11.2025
Témata:
ISSN:0925-2312
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Underwater acoustic target recognition is crucial for marine exploration and environmental monitoring. However, the redundancy in raw time-domain signals and the challenges in effectively integrating time-frequency representations with audio features hinder current methods. To address these challenges, we introduce a dual-stage learning framework, named audio-guided cross-attention dual-stage network for underwater acoustic target recognition (ACDN-UATR). The approach combines a cross-attention mechanism with audio-guided contrastive learning to improve recognition accuracy. In the first stage, a masked autoencoder (MAE) is used to learn and reconstruct time-frequency features, while a cross-attention mechanism efficiently fuses features from different spectrograms. In the second stage, contrastive learning is employed to align features extracted from time-frequency representations and raw audio signals, enhancing feature consistency and recognition robustness. Experimental results demonstrate that ACDN-UATR effectively integrates both time-frequency and audio-based features, achieving high recognition accuracy on the ShipsEar dataset. •Novel dual-stage learning framework (ACDN-UATR) proposed for UATR.•Cross-attention MAE fuses complementary Mel and CQT spectrogram features.•Audio-guided contrastive learning aligns multimodal acoustic features.•Integrates time-frequency representation learning with raw audio analysis.•Achieves superior underwater target recognition accuracy on ShipsEar dataset.
ISSN:0925-2312
DOI:10.1016/j.neucom.2025.131101