Coupled Dictionaries for Exemplar-Based Speech Enhancement and Automatic Speech Recognition

Exemplar-based speech enhancement systems work by decomposing the noisy speech as a weighted sum of speech and noise exemplars stored in a dictionary and use the resulting speech and noise estimates to obtain a time-varying filter in the full-resolution frequency domain to enhance the noisy speech....

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE/ACM transactions on audio, speech, and language processing Ročník 23; číslo 11; s. 1788 - 1799
Hlavní autori: Baby, Deepak, Virtanen, Tuomas, Gemmeke, Jort F., Van hamme, Hugo
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: IEEE 01.11.2015
Predmet:
ISSN:2329-9290, 2329-9304
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Exemplar-based speech enhancement systems work by decomposing the noisy speech as a weighted sum of speech and noise exemplars stored in a dictionary and use the resulting speech and noise estimates to obtain a time-varying filter in the full-resolution frequency domain to enhance the noisy speech. To obtain the decomposition, exemplars sampled in lower dimensional spaces are preferred over the full-resolution frequency domain for their reduced computational complexity and the ability to better generalize to unseen cases. But the resulting filter may be sub-optimal as the mapping of the obtained speech and noise estimates to the full-resolution frequency domain yields a low-rank approximation. This paper proposes an efficient way to directly compute the full-resolution frequency estimates of speech and noise using coupled dictionaries: an input dictionary containing atoms from the desired exemplar space to obtain the decomposition and a coupled output dictionary containing exemplars from the full-resolution frequency domain. We also introduce modulation spectrogram features for the exemplar-based tasks using this approach. The proposed system was evaluated for various choices of input exemplars and yielded improved speech enhancement performances on the AURORA-2 and AURORA-4 databases. We further show that the proposed approach also results in improved word error rates (WERs) for the speech recognition tasks using HMM-GMM and deep-neural network (DNN) based systems.
AbstractList Exemplar-based speech enhancement systems work by decomposing the noisy speech as a weighted sum of speech and noise exemplars stored in a dictionary and use the resulting speech and noise estimates to obtain a time-varying filter in the full-resolution frequency domain to enhance the noisy speech. To obtain the decomposition, exemplars sampled in lower dimensional spaces are preferred over the full-resolution frequency domain for their reduced computational complexity and the ability to better generalize to unseen cases. But the resulting filter may be sub-optimal as the mapping of the obtained speech and noise estimates to the full-resolution frequency domain yields a low-rank approximation. This paper proposes an efficient way to directly compute the full-resolution frequency estimates of speech and noise using coupled dictionaries: an input dictionary containing atoms from the desired exemplar space to obtain the decomposition and a coupled output dictionary containing exemplars from the full-resolution frequency domain. We also introduce modulation spectrogram features for the exemplar-based tasks using this approach. The proposed system was evaluated for various choices of input exemplars and yielded improved speech enhancement performances on the AURORA-2 and AURORA-4 databases. We further show that the proposed approach also results in improved word error rates (WERs) for the speech recognition tasks using HMM-GMM and deep-neural network (DNN) based systems.
Author Van hamme, Hugo
Gemmeke, Jort F.
Baby, Deepak
Virtanen, Tuomas
Author_xml – sequence: 1
  givenname: Deepak
  orcidid: 0000-0001-8935-9068
  surname: Baby
  fullname: Baby, Deepak
  email: Deepak.Baby@esat.kuleuven.be
  organization: KU Leuven, Speech Processing Research Group, Electrical Engineering Department (ESAT), Leuven, Belgium
– sequence: 2
  givenname: Tuomas
  surname: Virtanen
  fullname: Virtanen, Tuomas
  email: Tuomas.Virtanen@tut.fi
  organization: Department of Signal Processing, Tampere University of Technology, Tampere, Finland
– sequence: 3
  givenname: Jort F.
  surname: Gemmeke
  fullname: Gemmeke, Jort F.
  email: jgemmeke@amadana.nl
  organization: KU Leuven, Speech Processing Research Group, Electrical Engineering Department (ESAT), Leuven, Belgium
– sequence: 4
  givenname: Hugo
  surname: Van hamme
  fullname: Van hamme, Hugo
  email: Hugo.Vanhamme@esat.kuleuven.be
  organization: KU Leuven, Speech Processing Research Group, Electrical Engineering Department (ESAT), Leuven, Belgium
BookMark eNp9kM9OwzAMhyM0JMbYC8ClL9ARJ-3SHMcYf6RJIDZOHKo0cVlQm1ZJJ8Hbs7KNAwdOtmR_P1vfORm4xiEhl0AnAFRer2er5fOEUUgnLElpIuGEDBlnMpacJoNjzyQ9I-MQPiilQIWUIhmSt3mzbSs00a3VnW2c8hZDVDY-Wnxi3VbKxzcq7OarFlFvooXbKKexRtdFyplotu2aWnVWHxdeUDfvzvZZF-S0VFXA8aGOyOvdYj1_iJdP94_z2TLWTKZdnBmOCU8KaaRgxhglUpiyYgqiLA2gQmO4xJRxLotpiUkGZSYECqpAgAbgI8L2udo3IXgs89bbWvmvHGjeG8p_DOW9ofxgaAdlfyBtO9W_3Xllq__Rqz1qEfH3lgCepTLj378Vdww
CODEN ITASD8
CitedBy_id crossref_primary_10_1109_TASLP_2017_2709909
crossref_primary_10_1121_10_0007133
crossref_primary_10_1186_s13636_021_00218_3
crossref_primary_10_1109_TASLP_2023_3260709
crossref_primary_10_1109_TASLP_2017_2651406
crossref_primary_10_1016_j_csl_2017_08_004
crossref_primary_10_3233_JIFS_211249
crossref_primary_10_1007_s10044_018_00768_x
crossref_primary_10_1016_j_csl_2017_01_009
crossref_primary_10_1109_TASLP_2017_2748240
crossref_primary_10_1016_j_csl_2021_101223
crossref_primary_10_1109_ACCESS_2023_3328208
crossref_primary_10_1080_0954898X_2025_2533866
Cites_doi 10.1038/44565
10.1109/ICASSP.1992.225984
10.1016/0378-5955(86)90221-2
10.1109/TASL.2013.2270369
10.1109/ICASSP.2012.6288823
10.1109/MSP.2012.2205597
10.21437/Interspeech.2010-488
10.1109/TSA.2005.858005
10.1109/SLT.2014.7078628
10.1109/ICASSP.1997.598826
10.1109/ASRU.1997.659110
10.4324/9781410613745
10.1109/ASPAA.2011.6082303
10.1109/TASSP.1985.1164550
10.1109/ICASSP.2000.859163
10.1109/ICASSP.1983.1172092
10.1109/TASSP.1979.1163209
10.1109/78.127947
10.1109/TSA.2003.811544
10.21437/Interspeech.2010-619
10.1016/j.patrec.2013.11.021
10.1109/TASL.2012.2191957
10.1016/S0167-6393(98)00032-6
10.1109/ICASSP.2014.6854127
10.1109/TSA.2005.857802
10.1109/TASL.2011.2112350
10.1109/ICASSP.2010.5495580
10.1109/MLSP.2008.4685528
10.21437/Interspeech.2010-268
10.21437/Interspeech.2012-570
10.1109/89.536932
10.1201/9781420015836
10.7551/mitpress/1486.001.0001
10.1007/978-3-642-35289-8_32
10.1109/TASL.2006.885253
10.1109/TASL.2006.876726
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
DOI 10.1109/TASLP.2015.2450491
DatabaseName IEEE Xplore (IEEE)
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2329-9304
EndPage 1799
ExternalDocumentID 10_1109_TASLP_2015_2450491
7138598
Genre orig-research
GrantInformation_xml – fundername: European Commission; European Commission under
  grantid: FP7-PEOPLE-2011-290000
  funderid: 10.13039/501100000780
– fundername: IWT-SBO; IWT-SBO
  grantid: 100049
GroupedDBID 0R~
4.4
6IK
97E
AAJGR
AAKMM
AALFJ
AARMG
AASAJ
AAWTH
AAWTV
ABAZT
ABQJQ
ABVLG
ACIWK
ACM
ADBCU
AEBYY
AEFXT
AEJOY
AENSD
AFWIH
AFWXC
AGQYO
AGSQL
AHBIQ
AIKLT
AKJIK
AKQYR
AKRVB
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CCLIF
EBS
EJD
GUFHI
HGAVV
IFIPE
IPLJI
JAVBF
LHSKQ
M43
OCL
PQQKQ
RIA
RIE
RNS
ROL
AAYXX
CITATION
ID FETCH-LOGICAL-c295t-8d3e434b9d972ddda75162b617ffd1eaedd39e52339b6fe481f877e70a171c113
IEDL.DBID RIE
ISICitedReferencesCount 22
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000360835000007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2329-9290
IngestDate Sat Nov 29 02:43:46 EST 2025
Tue Nov 18 22:41:40 EST 2025
Wed Aug 27 08:36:23 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 11
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c295t-8d3e434b9d972ddda75162b617ffd1eaedd39e52339b6fe481f877e70a171c113
ORCID 0000-0001-8935-9068
PageCount 12
ParticipantIDs crossref_primary_10_1109_TASLP_2015_2450491
ieee_primary_7138598
crossref_citationtrail_10_1109_TASLP_2015_2450491
PublicationCentury 2000
PublicationDate 2015-Nov.
2015-11-00
PublicationDateYYYYMMDD 2015-11-01
PublicationDate_xml – month: 11
  year: 2015
  text: 2015-Nov.
PublicationDecade 2010
PublicationTitle IEEE/ACM transactions on audio, speech, and language processing
PublicationTitleAbbrev TASLP
PublicationYear 2015
Publisher IEEE
Publisher_xml – name: IEEE
References ref15
lee (ref6) 1999; 401
ref14
vesel (ref45) 2013
ref11
hirsch (ref35) 2000
smaragdis (ref17) 2009
plack (ref28) 2005
povey (ref40) 2011
ref48
ref47
ref42
ref41
ref43
slaney (ref36) 1998
sainath (ref49) 2010
schmidt (ref9) 2006
ref8
ref7
ref4
ref3
ref5
geiger (ref12) 2014
raj (ref19) 2010
ref34
ref37
ref30
ref33
ref32
ref2
ref1
ref39
bregman (ref29) 1990
wu (ref23) 2013
nielsen (ref44) 2014
nam (ref22) 2010
ref24
ref26
ref25
ref20
loizou (ref38) 2007
ref21
gemmeke (ref10) 2012
lee (ref13) 2001
ref27
mohammadiha (ref18) 2014
le roux (ref16) 2015
hinton (ref46) 2012
barker (ref31) 2013
References_xml – volume: 401
  start-page: 788
  year: 1999
  ident: ref6
  article-title: Learning the parts of objects by non-negative matrix factorization
  publication-title: Nature
  doi: 10.1038/44565
– ident: ref42
  doi: 10.1109/ICASSP.1992.225984
– ident: ref30
  doi: 10.1016/0378-5955(86)90221-2
– ident: ref20
  doi: 10.1109/TASL.2013.2270369
– ident: ref11
  doi: 10.1109/ICASSP.2012.6288823
– year: 2011
  ident: ref40
  article-title: The Kaldi speech recognition toolkit
  publication-title: Proc IEEE Workshop Autom Speech Recogn Understand
– ident: ref43
  doi: 10.1109/MSP.2012.2205597
– start-page: 1705
  year: 2009
  ident: ref17
  article-title: A sparse non-parametric approach for single channel separation of known sounds
  publication-title: Adv Neural Inf Process Syst (NIPS)
– start-page: 1696
  year: 2010
  ident: ref22
  article-title: A super-resolution spectrogram using coupled PLCA
  publication-title: Proc INTERSPEECH
  doi: 10.21437/Interspeech.2010-488
– ident: ref37
  doi: 10.1109/TSA.2005.858005
– ident: ref33
  doi: 10.1109/SLT.2014.7078628
– ident: ref25
  doi: 10.1109/ICASSP.1997.598826
– year: 2013
  ident: ref31
  article-title: Non-negative tensor factorization of modulation spectrograms for monaural sound source separation
  publication-title: Proc INTERSPEECH
– start-page: 3057
  year: 2013
  ident: ref23
  article-title: Exemplar-based unit selection for voice conversion utilizing temporal information
  publication-title: Proc INTERSPEECH
– ident: ref39
  doi: 10.1109/ASRU.1997.659110
– year: 2005
  ident: ref28
  publication-title: The Sense of Hearing
  doi: 10.4324/9781410613745
– ident: ref15
  doi: 10.1109/ASPAA.2011.6082303
– ident: ref47
  doi: 10.1109/TASSP.1985.1164550
– ident: ref41
  doi: 10.1109/ICASSP.2000.859163
– ident: ref34
  doi: 10.1109/ICASSP.1983.1172092
– ident: ref1
  doi: 10.1109/TASSP.1979.1163209
– ident: ref5
  doi: 10.1109/78.127947
– start-page: 2345
  year: 2013
  ident: ref45
  article-title: Sequence-discriminative training of deep neural networks
  publication-title: Proc INTERSPEECH
– year: 2014
  ident: ref44
  publication-title: Neural Networks and Deep Learning
– ident: ref48
  doi: 10.1109/TSA.2003.811544
– start-page: 2254
  year: 2010
  ident: ref49
  article-title: Sparse representation features for speech recognition
  publication-title: Proc INTERSPEECH
  doi: 10.21437/Interspeech.2010-619
– ident: ref24
  doi: 10.1016/j.patrec.2013.11.021
– ident: ref3
  doi: 10.1109/TASL.2012.2191957
– ident: ref32
  doi: 10.1016/S0167-6393(98)00032-6
– ident: ref21
  doi: 10.1109/ICASSP.2014.6854127
– ident: ref2
  doi: 10.1109/TSA.2005.857802
– year: 2014
  ident: ref12
  article-title: Investigating NMF speech enhancement for neural network based acoustic models
  publication-title: Proc INTERSPEECH
– ident: ref27
  doi: 10.1109/TASL.2011.2112350
– ident: ref26
  doi: 10.1109/ICASSP.2010.5495580
– year: 2014
  ident: ref18
  article-title: Single-channel dynamic exemplar-based speech enhancement
  publication-title: Proc INTERSPEECH
– ident: ref14
  doi: 10.1109/MLSP.2008.4685528
– year: 2010
  ident: ref19
  article-title: Non-negative matrix factorization based compensation of music for automatic speech recognition
  publication-title: Proc INTERSPEECH
  doi: 10.21437/Interspeech.2010-268
– year: 2012
  ident: ref10
  article-title: Advances in noise robust digit recognition using hybrid exemplar based systems
  publication-title: Proc INTERSPEECH
  doi: 10.21437/Interspeech.2012-570
– year: 2015
  ident: ref16
  publication-title: ?Sparse NMF?half-baked or well done??
– year: 2006
  ident: ref9
  article-title: Single-channel speech separation using sparse non-negative matrix factorization
  publication-title: Proc Int Conf Spoken Lang Process
– ident: ref4
  doi: 10.1109/89.536932
– year: 2007
  ident: ref38
  publication-title: Speech Enhancement Theory and Practice (Signal Processing and Communications)
  doi: 10.1201/9781420015836
– year: 1990
  ident: ref29
  publication-title: Auditory Scene Analysis The Perceptual Organization of Sound
  doi: 10.7551/mitpress/1486.001.0001
– start-page: 599
  year: 2012
  ident: ref46
  publication-title: Neural Networks Tricks of the Trade
  doi: 10.1007/978-3-642-35289-8_32
– start-page: 556
  year: 2001
  ident: ref13
  publication-title: Neural Information Processing Systems
– ident: ref7
  doi: 10.1109/TASL.2006.885253
– year: 1998
  ident: ref36
  publication-title: Auditory Toolbox Version 2
– ident: ref8
  doi: 10.1109/TASL.2006.876726
– start-page: 29
  year: 2000
  ident: ref35
  article-title: The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions
  publication-title: Proc ISCA ITRW ASR2000
SSID ssj0001079974
Score 2.1656895
Snippet Exemplar-based speech enhancement systems work by decomposing the noisy speech as a weighted sum of speech and noise exemplars stored in a dictionary and use...
SourceID crossref
ieee
SourceType Enrichment Source
Index Database
Publisher
StartPage 1788
SubjectTerms Dictionaries
Discrete Fourier transforms
Exemplar-based
Modulation
modulation envelope
Noise
noise robust automatic speech recognition
non-negative sparse coding
Speech
Speech enhancement
Title Coupled Dictionaries for Exemplar-Based Speech Enhancement and Automatic Speech Recognition
URI https://ieeexplore.ieee.org/document/7138598
Volume 23
WOSCitedRecordID wos000360835000007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 2329-9304
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001079974
  issn: 2329-9290
  databaseCode: RIE
  dateStart: 20140101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEF7a4kEPvqpYX-zBm6bN5rXZY60tHqQUW6HgISS7E1qoSaiJ-PPd3SS1ggjeQphlw3wL881m5huEbiiJTTPmqk6KmYYDDjciO7QNQUMq4zHIiMz1sAk6HvvzOZs00N2mFwYAdPEZdNWj_pcvUl6oq7KeTKh8l_lN1KTUK3u1vu9TTMqYFl2WHIEZcg-z7pExWW_Wnz5NVCGX27UcV7Ji8iMObQ1W0XFldPC_LzpE-xV_xP0S8CPUgOQY7W2pCrbR6yAtshUI_LDUTQs6G8aSnOLhJ7xlMpU17mXsEniaAfAFHiYLBb3aCYeJwP0iT7WQa23wXBcZpckJehkNZ4NHo5qhYHCLubnhCxsc24mYYNQSQoTUJZ4VSd4Sx4JACEJIRGQ2arPIi8HxSexTCtQMCSWcEPsUtZI0gTOEPV9ALBmEsD3PCTmLXGq7hEtf-m7sEKuDSO3RgFcC42rOxSrQiYbJAo1CoFAIKhQ66HazJivlNf60bisINpaV989_f32BdtXisnHwErXydQFXaId_5Mv39bU-PV9GrMFx
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEB7qA9SDryq-3YM3jc3mtdljrZWKtRRboeAhJLsTWqhJ0Vb8-e5u01pBBG8hzGbDfIH5ZjMzH8AFo6ltp0LXSXHb8tATVuLGriVZzFQ8RhWRhRGbYK1W2Ovxdgmu5r0wiGiKz_BaX5p_-TIXE31UVlEJVejzcAlWtHJW0a31faJiM87N2GXFErildrFnXTI2r3SrnWZbl3L5147nK15Mf0SiBWkVE1nutv73TtuwWTBIUp1CvgMlzHZhY2GuYBleavlkNERJbgembcHkw0TRU1L_xNeRSmatGxW9JOmMEEWf1LO-Bl_vROJMkupknJtRrjODp1mZUZ7twfNdvVtrWIWKgiUc7o-tULrouV7CJWeOlDJmPg2cRDGXNJUUY5RSYaLyUZcnQYpeSNOQMWR2TBkVlLr7sJzlGR4ACUKJqeIQ0g0CLxY88ZnrU6F8GfqpR51DoDOPRqIYMa6VLoaRSTVsHhkUIo1CVKBwCJfzNaPpgI0_rcsagrll4f2j32-fw1qj-9iMmveth2NY1w-athGewPL4bYKnsCo-xoP3tzPzJX0Bh8XEug
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Coupled+Dictionaries+for+Exemplar-Based+Speech+Enhancement+and+Automatic+Speech+Recognition&rft.jtitle=IEEE%2FACM+transactions+on+audio%2C+speech%2C+and+language+processing&rft.au=Baby%2C+Deepak&rft.au=Virtanen%2C+Tuomas&rft.au=Gemmeke%2C+Jort+F.&rft.au=Van+hamme%2C+Hugo&rft.date=2015-11-01&rft.pub=IEEE&rft.issn=2329-9290&rft.volume=23&rft.issue=11&rft.spage=1788&rft.epage=1799&rft_id=info:doi/10.1109%2FTASLP.2015.2450491&rft.externalDocID=7138598
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2329-9290&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2329-9290&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2329-9290&client=summon