Synthetic Aperture Local Conformal Autoencoder for Semi-Supervised Speaker's DOA Tracking
In this article, we address the problem of tracking the direction of arrival (DOA) of a moving speaker in noisy and reverberant environments. We aim to achieve this in real-time, using as few measurements as possible. Toward this goal, we present a semi-supervised learning scheme that tracks the spe...
Gespeichert in:
| Veröffentlicht in: | IEEE Transactions on Audio, Speech and Language Processing Jg. 33; S. 2918 - 2931 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
2025
|
| Schlagworte: | |
| ISSN: | 2998-4173, 2998-4173 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | In this article, we address the problem of tracking the direction of arrival (DOA) of a moving speaker in noisy and reverberant environments. We aim to achieve this in real-time, using as few measurements as possible. Toward this goal, we present a semi-supervised learning scheme that tracks the speaker using acoustic features only. Specifically, we extend a recently proposed method called local conformal autoencoder (LOCA), a deep neural network (DNN)-based dimensionality reduction technique that considers the local information from adjacent measurements. We design a unique training procedure for LOCA, which uses a synthetic aperture training paradigm to take advantage of the speaker's movement during training. We also add an anchoring loss term to the unsupervised LOCA model, which improves training stability and establishes a connection between the encoder mapping and the real-world position of the speaker. Finally, we conduct a comprehensive simulation study to demonstrate the effectiveness of our proposed method in dynamic environments with various levels of noise and reverberation. |
|---|---|
| ISSN: | 2998-4173 2998-4173 |
| DOI: | 10.1109/TASLPRO.2025.3587465 |