Dual-stage learning framework for underwater acoustic target recognition with cross-attention mechanism and audio-guided contrastive learning
Underwater acoustic target recognition is crucial for marine exploration and environmental monitoring. However, the redundancy in raw time-domain signals and the challenges in effectively integrating time-frequency representations with audio features hinder current methods. To address these challeng...
Uloženo v:
| Vydáno v: | Neurocomputing (Amsterdam) Ročník 652; s. 131101 |
|---|---|
| Hlavní autoři: | , , , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier B.V
01.11.2025
|
| Témata: | |
| ISSN: | 0925-2312 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Underwater acoustic target recognition is crucial for marine exploration and environmental monitoring. However, the redundancy in raw time-domain signals and the challenges in effectively integrating time-frequency representations with audio features hinder current methods. To address these challenges, we introduce a dual-stage learning framework, named audio-guided cross-attention dual-stage network for underwater acoustic target recognition (ACDN-UATR). The approach combines a cross-attention mechanism with audio-guided contrastive learning to improve recognition accuracy. In the first stage, a masked autoencoder (MAE) is used to learn and reconstruct time-frequency features, while a cross-attention mechanism efficiently fuses features from different spectrograms. In the second stage, contrastive learning is employed to align features extracted from time-frequency representations and raw audio signals, enhancing feature consistency and recognition robustness. Experimental results demonstrate that ACDN-UATR effectively integrates both time-frequency and audio-based features, achieving high recognition accuracy on the ShipsEar dataset.
•Novel dual-stage learning framework (ACDN-UATR) proposed for UATR.•Cross-attention MAE fuses complementary Mel and CQT spectrogram features.•Audio-guided contrastive learning aligns multimodal acoustic features.•Integrates time-frequency representation learning with raw audio analysis.•Achieves superior underwater target recognition accuracy on ShipsEar dataset. |
|---|---|
| ISSN: | 0925-2312 |
| DOI: | 10.1016/j.neucom.2025.131101 |