Coupled Dictionaries for Exemplar-Based Speech Enhancement and Automatic Speech Recognition

Exemplar-based speech enhancement systems work by decomposing the noisy speech as a weighted sum of speech and noise exemplars stored in a dictionary and use the resulting speech and noise estimates to obtain a time-varying filter in the full-resolution frequency domain to enhance the noisy speech....

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	IEEE/ACM transactions on audio, speech, and language processing Ročník 23; číslo 11; s. 1788 - 1799
Hlavní autori:	Baby, Deepak, Virtanen, Tuomas, Gemmeke, Jort F., Van hamme, Hugo
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	IEEE 01.11.2015
Predmet:	Dictionaries Discrete Fourier transforms Exemplar-based Modulation modulation envelope Noise noise robust automatic speech recognition non-negative sparse coding Speech Speech enhancement
ISSN:	2329-9290, 2329-9304
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Exemplar-based speech enhancement systems work by decomposing the noisy speech as a weighted sum of speech and noise exemplars stored in a dictionary and use the resulting speech and noise estimates to obtain a time-varying filter in the full-resolution frequency domain to enhance the noisy speech. To obtain the decomposition, exemplars sampled in lower dimensional spaces are preferred over the full-resolution frequency domain for their reduced computational complexity and the ability to better generalize to unseen cases. But the resulting filter may be sub-optimal as the mapping of the obtained speech and noise estimates to the full-resolution frequency domain yields a low-rank approximation. This paper proposes an efficient way to directly compute the full-resolution frequency estimates of speech and noise using coupled dictionaries: an input dictionary containing atoms from the desired exemplar space to obtain the decomposition and a coupled output dictionary containing exemplars from the full-resolution frequency domain. We also introduce modulation spectrogram features for the exemplar-based tasks using this approach. The proposed system was evaluated for various choices of input exemplars and yielded improved speech enhancement performances on the AURORA-2 and AURORA-4 databases. We further show that the proposed approach also results in improved word error rates (WERs) for the speech recognition tasks using HMM-GMM and deep-neural network (DNN) based systems.
ISSN:	2329-9290 2329-9304
DOI:	10.1109/TASLP.2015.2450491