Synthetic Aperture Local Conformal Autoencoder for Semi-Supervised Speaker's DOA Tracking

In this article, we address the problem of tracking the direction of arrival (DOA) of a moving speaker in noisy and reverberant environments. We aim to achieve this in real-time, using as few measurements as possible. Toward this goal, we present a semi-supervised learning scheme that tracks the spe...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE Transactions on Audio, Speech and Language Processing Jg. 33; S. 2918 - 2931
Hauptverfasser:	Cohen, Idan, Gannot, Sharon, Lindenbaum, Ofir
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	IEEE 2025
Schlagworte:	Acoustic measurements Acoustics Apertures Autoencoders Hidden Markov models Kalman filters LOCA Location awareness Manifolds RTF Semi-supervised learning speaker tracking Speech processing synthetic aperture Training
ISSN:	2998-4173, 2998-4173
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this article, we address the problem of tracking the direction of arrival (DOA) of a moving speaker in noisy and reverberant environments. We aim to achieve this in real-time, using as few measurements as possible. Toward this goal, we present a semi-supervised learning scheme that tracks the speaker using acoustic features only. Specifically, we extend a recently proposed method called local conformal autoencoder (LOCA), a deep neural network (DNN)-based dimensionality reduction technique that considers the local information from adjacent measurements. We design a unique training procedure for LOCA, which uses a synthetic aperture training paradigm to take advantage of the speaker's movement during training. We also add an anchoring loss term to the unsupervised LOCA model, which improves training stability and establishes a connection between the encoder mapping and the real-world position of the speaker. Finally, we conduct a comprehensive simulation study to demonstrate the effectiveness of our proposed method in dynamic environments with various levels of noise and reverberation.
ISSN:	2998-4173 2998-4173
DOI:	10.1109/TASLPRO.2025.3587465