Model-Based Expectation-Maximization Source Separation and Localization

This paper describes a system, referred to as model-based expectation-maximization source separation and localization (MESSL), for separating and localizing multiple sound sources from an underdetermined reverberant two-channel recording. By clustering individual spectrogram points based on their in...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE transactions on audio, speech, and language processing Ročník 18; číslo 2; s. 382 - 394
Hlavní autoři:	Mandel, M.I., Weiss, R.J., Ellis, D.
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Piscataway, NJ IEEE 01.02.2010 Institute of Electrical and Electronics Engineers
Témata:	Applied sciences Background noise Delay Detection, estimation, filtering, equalization, prediction Digital recording Exact sciences and technology Expectation-maximization algorithms Information, signal and communications theory Iterative algorithms Maximum-likelihood estimation Miscellaneous Parameter estimation Predictive models Signal and communications theory Signal processing Signal representation. Spectral analysis Signal, noise Source separation Spectrogram Speech enhancement Speech processing Telecommunications and information theory time-frequency masking underdetermined source separation Automatic classification Parameter estimation Source separation Signal estimation Mixture theory Sound quality Spectrum analysis Speech enhancement time-frequency masking Masking Delay time Target detection Probabilistic approach Two channel system Maximum-likelihood estimation Sound source Signal classification Source localization Signal processing underdetermined source separation Phase delay Maximum likelihood EM algorithm Signal analysis Speech processing Signal distortion
ISSN:	1558-7916
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	This paper describes a system, referred to as model-based expectation-maximization source separation and localization (MESSL), for separating and localizing multiple sound sources from an underdetermined reverberant two-channel recording. By clustering individual spectrogram points based on their interaural phase and level differences, MESSL generates masks that can be used to isolate individual sound sources. We first describe a probabilistic model of interaural parameters that can be evaluated at individual spectrogram points. By creating a mixture of these models over sources and delays, the multi-source localization problem is reduced to a collection of single source problems. We derive an expectation-maximization algorithm for computing the maximum-likelihood parameters of this mixture model, and show that these parameters correspond well with interaural parameters measured in isolation. As a byproduct of fitting this mixture model, the algorithm creates probabilistic spectrogram masks that can be used for source separation. In simulated anechoic and reverberant environments, separations using MESSL produced on average a signal-to-distortion ratio 1.6 dB greater and perceptual evaluation of speech quality (PESQ) results 0.27 mean opinion score units greater than four comparable algorithms.
Bibliografie:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	1558-7916
DOI:	10.1109/TASL.2009.2029711