Learning discriminative representations from integrated features for DOA estimation

As a fundamental step in array signal processing, accurate direction-of-arrival (DOA) estimation is crucial for speaker localization using microphone arrays. Noise, reverberation, and an unknown number of sources in realistic environments pose significant challenges, making the extraction of discrim...

Full description

Saved in:
Bibliographic Details
Published in:Journal of King Saud University. Computer and information sciences Vol. 37; no. 10; pp. 323 - 20
Main Authors: You, Qi, Huang, Qinghua
Format: Journal Article
Language:English
Published: Cham Springer International Publishing 01.12.2025
Springer Nature B.V
Springer
Subjects:
ISSN:1319-1578, 2213-1248, 1319-1578
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:As a fundamental step in array signal processing, accurate direction-of-arrival (DOA) estimation is crucial for speaker localization using microphone arrays. Noise, reverberation, and an unknown number of sources in realistic environments pose significant challenges, making the extraction of discriminative representations a key step in DOA estimation. These representations need to reduce the influence of redundant information unrelated to localization, yet recent methods have largely overlooked this important characteristic. To address these issues, we propose an end-to-end feature integration and discriminative learning network (FID-Net) for multi-source DOA estimation. Specifically, our approach consists of three stages: the feature integration stage, the discriminative learning stage, and the temporal modeling stage. In the feature integration stage, we aim to capture multi-scale spatial information that is critical for localization. In the discriminative learning stage, we introduce a discriminative representation learning strategy and design a mutual information-based loss to guide the network to better capture the differences among diverse features. The discriminative features are further utilized in the temporal modeling stage to enhance the global contextual representation. Experimental results on both simulated and real-world datasets demonstrate the superior performance of the proposed method compared with other advanced methods.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1319-1578
2213-1248
1319-1578
DOI:10.1007/s44443-025-00356-0