Learning discriminative representations from integrated features for DOA estimation

As a fundamental step in array signal processing, accurate direction-of-arrival (DOA) estimation is crucial for speaker localization using microphone arrays. Noise, reverberation, and an unknown number of sources in realistic environments pose significant challenges, making the extraction of discrim...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of King Saud University. Computer and information sciences Ročník 37; číslo 10; s. 323 - 20
Hlavní autoři: You, Qi, Huang, Qinghua
Médium: Journal Article
Jazyk:angličtina
Vydáno: Cham Springer International Publishing 01.12.2025
Springer Nature B.V
Springer
Témata:
ISSN:1319-1578, 2213-1248, 1319-1578
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:As a fundamental step in array signal processing, accurate direction-of-arrival (DOA) estimation is crucial for speaker localization using microphone arrays. Noise, reverberation, and an unknown number of sources in realistic environments pose significant challenges, making the extraction of discriminative representations a key step in DOA estimation. These representations need to reduce the influence of redundant information unrelated to localization, yet recent methods have largely overlooked this important characteristic. To address these issues, we propose an end-to-end feature integration and discriminative learning network (FID-Net) for multi-source DOA estimation. Specifically, our approach consists of three stages: the feature integration stage, the discriminative learning stage, and the temporal modeling stage. In the feature integration stage, we aim to capture multi-scale spatial information that is critical for localization. In the discriminative learning stage, we introduce a discriminative representation learning strategy and design a mutual information-based loss to guide the network to better capture the differences among diverse features. The discriminative features are further utilized in the temporal modeling stage to enhance the global contextual representation. Experimental results on both simulated and real-world datasets demonstrate the superior performance of the proposed method compared with other advanced methods.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1319-1578
2213-1248
1319-1578
DOI:10.1007/s44443-025-00356-0