Coupled Dictionaries for Exemplar-Based Speech Enhancement and Automatic Speech Recognition
Exemplar-based speech enhancement systems work by decomposing the noisy speech as a weighted sum of speech and noise exemplars stored in a dictionary and use the resulting speech and noise estimates to obtain a time-varying filter in the full-resolution frequency domain to enhance the noisy speech....
Uložené v:
| Vydané v: | IEEE/ACM transactions on audio, speech, and language processing Ročník 23; číslo 11; s. 1788 - 1799 |
|---|---|
| Hlavní autori: | , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
01.11.2015
|
| Predmet: | |
| ISSN: | 2329-9290, 2329-9304 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Exemplar-based speech enhancement systems work by decomposing the noisy speech as a weighted sum of speech and noise exemplars stored in a dictionary and use the resulting speech and noise estimates to obtain a time-varying filter in the full-resolution frequency domain to enhance the noisy speech. To obtain the decomposition, exemplars sampled in lower dimensional spaces are preferred over the full-resolution frequency domain for their reduced computational complexity and the ability to better generalize to unseen cases. But the resulting filter may be sub-optimal as the mapping of the obtained speech and noise estimates to the full-resolution frequency domain yields a low-rank approximation. This paper proposes an efficient way to directly compute the full-resolution frequency estimates of speech and noise using coupled dictionaries: an input dictionary containing atoms from the desired exemplar space to obtain the decomposition and a coupled output dictionary containing exemplars from the full-resolution frequency domain. We also introduce modulation spectrogram features for the exemplar-based tasks using this approach. The proposed system was evaluated for various choices of input exemplars and yielded improved speech enhancement performances on the AURORA-2 and AURORA-4 databases. We further show that the proposed approach also results in improved word error rates (WERs) for the speech recognition tasks using HMM-GMM and deep-neural network (DNN) based systems. |
|---|---|
| AbstractList | Exemplar-based speech enhancement systems work by decomposing the noisy speech as a weighted sum of speech and noise exemplars stored in a dictionary and use the resulting speech and noise estimates to obtain a time-varying filter in the full-resolution frequency domain to enhance the noisy speech. To obtain the decomposition, exemplars sampled in lower dimensional spaces are preferred over the full-resolution frequency domain for their reduced computational complexity and the ability to better generalize to unseen cases. But the resulting filter may be sub-optimal as the mapping of the obtained speech and noise estimates to the full-resolution frequency domain yields a low-rank approximation. This paper proposes an efficient way to directly compute the full-resolution frequency estimates of speech and noise using coupled dictionaries: an input dictionary containing atoms from the desired exemplar space to obtain the decomposition and a coupled output dictionary containing exemplars from the full-resolution frequency domain. We also introduce modulation spectrogram features for the exemplar-based tasks using this approach. The proposed system was evaluated for various choices of input exemplars and yielded improved speech enhancement performances on the AURORA-2 and AURORA-4 databases. We further show that the proposed approach also results in improved word error rates (WERs) for the speech recognition tasks using HMM-GMM and deep-neural network (DNN) based systems. |
| Author | Van hamme, Hugo Gemmeke, Jort F. Baby, Deepak Virtanen, Tuomas |
| Author_xml | – sequence: 1 givenname: Deepak orcidid: 0000-0001-8935-9068 surname: Baby fullname: Baby, Deepak email: Deepak.Baby@esat.kuleuven.be organization: KU Leuven, Speech Processing Research Group, Electrical Engineering Department (ESAT), Leuven, Belgium – sequence: 2 givenname: Tuomas surname: Virtanen fullname: Virtanen, Tuomas email: Tuomas.Virtanen@tut.fi organization: Department of Signal Processing, Tampere University of Technology, Tampere, Finland – sequence: 3 givenname: Jort F. surname: Gemmeke fullname: Gemmeke, Jort F. email: jgemmeke@amadana.nl organization: KU Leuven, Speech Processing Research Group, Electrical Engineering Department (ESAT), Leuven, Belgium – sequence: 4 givenname: Hugo surname: Van hamme fullname: Van hamme, Hugo email: Hugo.Vanhamme@esat.kuleuven.be organization: KU Leuven, Speech Processing Research Group, Electrical Engineering Department (ESAT), Leuven, Belgium |
| BookMark | eNp9kM9OwzAMhyM0JMbYC8ClL9ARJ-3SHMcYf6RJIDZOHKo0cVlQm1ZJJ8Hbs7KNAwdOtmR_P1vfORm4xiEhl0AnAFRer2er5fOEUUgnLElpIuGEDBlnMpacJoNjzyQ9I-MQPiilQIWUIhmSt3mzbSs00a3VnW2c8hZDVDY-Wnxi3VbKxzcq7OarFlFvooXbKKexRtdFyplotu2aWnVWHxdeUDfvzvZZF-S0VFXA8aGOyOvdYj1_iJdP94_z2TLWTKZdnBmOCU8KaaRgxhglUpiyYgqiLA2gQmO4xJRxLotpiUkGZSYECqpAgAbgI8L2udo3IXgs89bbWvmvHGjeG8p_DOW9ofxgaAdlfyBtO9W_3Xllq__Rqz1qEfH3lgCepTLj378Vdww |
| CODEN | ITASD8 |
| CitedBy_id | crossref_primary_10_1109_TASLP_2017_2709909 crossref_primary_10_1121_10_0007133 crossref_primary_10_1186_s13636_021_00218_3 crossref_primary_10_1109_TASLP_2023_3260709 crossref_primary_10_1109_TASLP_2017_2651406 crossref_primary_10_1016_j_csl_2017_08_004 crossref_primary_10_3233_JIFS_211249 crossref_primary_10_1007_s10044_018_00768_x crossref_primary_10_1016_j_csl_2017_01_009 crossref_primary_10_1109_TASLP_2017_2748240 crossref_primary_10_1016_j_csl_2021_101223 crossref_primary_10_1109_ACCESS_2023_3328208 crossref_primary_10_1080_0954898X_2025_2533866 |
| Cites_doi | 10.1038/44565 10.1109/ICASSP.1992.225984 10.1016/0378-5955(86)90221-2 10.1109/TASL.2013.2270369 10.1109/ICASSP.2012.6288823 10.1109/MSP.2012.2205597 10.21437/Interspeech.2010-488 10.1109/TSA.2005.858005 10.1109/SLT.2014.7078628 10.1109/ICASSP.1997.598826 10.1109/ASRU.1997.659110 10.4324/9781410613745 10.1109/ASPAA.2011.6082303 10.1109/TASSP.1985.1164550 10.1109/ICASSP.2000.859163 10.1109/ICASSP.1983.1172092 10.1109/TASSP.1979.1163209 10.1109/78.127947 10.1109/TSA.2003.811544 10.21437/Interspeech.2010-619 10.1016/j.patrec.2013.11.021 10.1109/TASL.2012.2191957 10.1016/S0167-6393(98)00032-6 10.1109/ICASSP.2014.6854127 10.1109/TSA.2005.857802 10.1109/TASL.2011.2112350 10.1109/ICASSP.2010.5495580 10.1109/MLSP.2008.4685528 10.21437/Interspeech.2010-268 10.21437/Interspeech.2012-570 10.1109/89.536932 10.1201/9781420015836 10.7551/mitpress/1486.001.0001 10.1007/978-3-642-35289-8_32 10.1109/TASL.2006.885253 10.1109/TASL.2006.876726 |
| ContentType | Journal Article |
| DBID | 97E RIA RIE AAYXX CITATION |
| DOI | 10.1109/TASLP.2015.2450491 |
| DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 2329-9304 |
| EndPage | 1799 |
| ExternalDocumentID | 10_1109_TASLP_2015_2450491 7138598 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: European Commission; European Commission under grantid: FP7-PEOPLE-2011-290000 funderid: 10.13039/501100000780 – fundername: IWT-SBO; IWT-SBO grantid: 100049 |
| GroupedDBID | 0R~ 4.4 6IK 97E AAJGR AAKMM AALFJ AARMG AASAJ AAWTH AAWTV ABAZT ABQJQ ABVLG ACIWK ACM ADBCU AEBYY AEFXT AEJOY AENSD AFWIH AFWXC AGQYO AGSQL AHBIQ AIKLT AKJIK AKQYR AKRVB ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CCLIF EBS EJD GUFHI HGAVV IFIPE IPLJI JAVBF LHSKQ M43 OCL PQQKQ RIA RIE RNS ROL AAYXX CITATION |
| ID | FETCH-LOGICAL-c295t-8d3e434b9d972ddda75162b617ffd1eaedd39e52339b6fe481f877e70a171c113 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 22 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000360835000007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2329-9290 |
| IngestDate | Sat Nov 29 02:43:46 EST 2025 Tue Nov 18 22:41:40 EST 2025 Wed Aug 27 08:36:23 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 11 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c295t-8d3e434b9d972ddda75162b617ffd1eaedd39e52339b6fe481f877e70a171c113 |
| ORCID | 0000-0001-8935-9068 |
| PageCount | 12 |
| ParticipantIDs | crossref_primary_10_1109_TASLP_2015_2450491 ieee_primary_7138598 crossref_citationtrail_10_1109_TASLP_2015_2450491 |
| PublicationCentury | 2000 |
| PublicationDate | 2015-Nov. 2015-11-00 |
| PublicationDateYYYYMMDD | 2015-11-01 |
| PublicationDate_xml | – month: 11 year: 2015 text: 2015-Nov. |
| PublicationDecade | 2010 |
| PublicationTitle | IEEE/ACM transactions on audio, speech, and language processing |
| PublicationTitleAbbrev | TASLP |
| PublicationYear | 2015 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| References | ref15 lee (ref6) 1999; 401 ref14 vesel (ref45) 2013 ref11 hirsch (ref35) 2000 smaragdis (ref17) 2009 plack (ref28) 2005 povey (ref40) 2011 ref48 ref47 ref42 ref41 ref43 slaney (ref36) 1998 sainath (ref49) 2010 schmidt (ref9) 2006 ref8 ref7 ref4 ref3 ref5 geiger (ref12) 2014 raj (ref19) 2010 ref34 ref37 ref30 ref33 ref32 ref2 ref1 ref39 bregman (ref29) 1990 wu (ref23) 2013 nielsen (ref44) 2014 nam (ref22) 2010 ref24 ref26 ref25 ref20 loizou (ref38) 2007 ref21 gemmeke (ref10) 2012 lee (ref13) 2001 ref27 mohammadiha (ref18) 2014 le roux (ref16) 2015 hinton (ref46) 2012 barker (ref31) 2013 |
| References_xml | – volume: 401 start-page: 788 year: 1999 ident: ref6 article-title: Learning the parts of objects by non-negative matrix factorization publication-title: Nature doi: 10.1038/44565 – ident: ref42 doi: 10.1109/ICASSP.1992.225984 – ident: ref30 doi: 10.1016/0378-5955(86)90221-2 – ident: ref20 doi: 10.1109/TASL.2013.2270369 – ident: ref11 doi: 10.1109/ICASSP.2012.6288823 – year: 2011 ident: ref40 article-title: The Kaldi speech recognition toolkit publication-title: Proc IEEE Workshop Autom Speech Recogn Understand – ident: ref43 doi: 10.1109/MSP.2012.2205597 – start-page: 1705 year: 2009 ident: ref17 article-title: A sparse non-parametric approach for single channel separation of known sounds publication-title: Adv Neural Inf Process Syst (NIPS) – start-page: 1696 year: 2010 ident: ref22 article-title: A super-resolution spectrogram using coupled PLCA publication-title: Proc INTERSPEECH doi: 10.21437/Interspeech.2010-488 – ident: ref37 doi: 10.1109/TSA.2005.858005 – ident: ref33 doi: 10.1109/SLT.2014.7078628 – ident: ref25 doi: 10.1109/ICASSP.1997.598826 – year: 2013 ident: ref31 article-title: Non-negative tensor factorization of modulation spectrograms for monaural sound source separation publication-title: Proc INTERSPEECH – start-page: 3057 year: 2013 ident: ref23 article-title: Exemplar-based unit selection for voice conversion utilizing temporal information publication-title: Proc INTERSPEECH – ident: ref39 doi: 10.1109/ASRU.1997.659110 – year: 2005 ident: ref28 publication-title: The Sense of Hearing doi: 10.4324/9781410613745 – ident: ref15 doi: 10.1109/ASPAA.2011.6082303 – ident: ref47 doi: 10.1109/TASSP.1985.1164550 – ident: ref41 doi: 10.1109/ICASSP.2000.859163 – ident: ref34 doi: 10.1109/ICASSP.1983.1172092 – ident: ref1 doi: 10.1109/TASSP.1979.1163209 – ident: ref5 doi: 10.1109/78.127947 – start-page: 2345 year: 2013 ident: ref45 article-title: Sequence-discriminative training of deep neural networks publication-title: Proc INTERSPEECH – year: 2014 ident: ref44 publication-title: Neural Networks and Deep Learning – ident: ref48 doi: 10.1109/TSA.2003.811544 – start-page: 2254 year: 2010 ident: ref49 article-title: Sparse representation features for speech recognition publication-title: Proc INTERSPEECH doi: 10.21437/Interspeech.2010-619 – ident: ref24 doi: 10.1016/j.patrec.2013.11.021 – ident: ref3 doi: 10.1109/TASL.2012.2191957 – ident: ref32 doi: 10.1016/S0167-6393(98)00032-6 – ident: ref21 doi: 10.1109/ICASSP.2014.6854127 – ident: ref2 doi: 10.1109/TSA.2005.857802 – year: 2014 ident: ref12 article-title: Investigating NMF speech enhancement for neural network based acoustic models publication-title: Proc INTERSPEECH – ident: ref27 doi: 10.1109/TASL.2011.2112350 – ident: ref26 doi: 10.1109/ICASSP.2010.5495580 – year: 2014 ident: ref18 article-title: Single-channel dynamic exemplar-based speech enhancement publication-title: Proc INTERSPEECH – ident: ref14 doi: 10.1109/MLSP.2008.4685528 – year: 2010 ident: ref19 article-title: Non-negative matrix factorization based compensation of music for automatic speech recognition publication-title: Proc INTERSPEECH doi: 10.21437/Interspeech.2010-268 – year: 2012 ident: ref10 article-title: Advances in noise robust digit recognition using hybrid exemplar based systems publication-title: Proc INTERSPEECH doi: 10.21437/Interspeech.2012-570 – year: 2015 ident: ref16 publication-title: ?Sparse NMF?half-baked or well done?? – year: 2006 ident: ref9 article-title: Single-channel speech separation using sparse non-negative matrix factorization publication-title: Proc Int Conf Spoken Lang Process – ident: ref4 doi: 10.1109/89.536932 – year: 2007 ident: ref38 publication-title: Speech Enhancement Theory and Practice (Signal Processing and Communications) doi: 10.1201/9781420015836 – year: 1990 ident: ref29 publication-title: Auditory Scene Analysis The Perceptual Organization of Sound doi: 10.7551/mitpress/1486.001.0001 – start-page: 599 year: 2012 ident: ref46 publication-title: Neural Networks Tricks of the Trade doi: 10.1007/978-3-642-35289-8_32 – start-page: 556 year: 2001 ident: ref13 publication-title: Neural Information Processing Systems – ident: ref7 doi: 10.1109/TASL.2006.885253 – year: 1998 ident: ref36 publication-title: Auditory Toolbox Version 2 – ident: ref8 doi: 10.1109/TASL.2006.876726 – start-page: 29 year: 2000 ident: ref35 article-title: The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions publication-title: Proc ISCA ITRW ASR2000 |
| SSID | ssj0001079974 |
| Score | 2.1656895 |
| Snippet | Exemplar-based speech enhancement systems work by decomposing the noisy speech as a weighted sum of speech and noise exemplars stored in a dictionary and use... |
| SourceID | crossref ieee |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 1788 |
| SubjectTerms | Dictionaries Discrete Fourier transforms Exemplar-based Modulation modulation envelope Noise noise robust automatic speech recognition non-negative sparse coding Speech Speech enhancement |
| Title | Coupled Dictionaries for Exemplar-Based Speech Enhancement and Automatic Speech Recognition |
| URI | https://ieeexplore.ieee.org/document/7138598 |
| Volume | 23 |
| WOSCitedRecordID | wos000360835000007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 2329-9304 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001079974 issn: 2329-9290 databaseCode: RIE dateStart: 20140101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEF7a4kEPvqpYX-zBm6bN5rXZY60tHqQUW6HgISS7E1qoSaiJ-PPd3SS1ggjeQphlw3wL881m5huEbiiJTTPmqk6KmYYDDjciO7QNQUMq4zHIiMz1sAk6HvvzOZs00N2mFwYAdPEZdNWj_pcvUl6oq7KeTKh8l_lN1KTUK3u1vu9TTMqYFl2WHIEZcg-z7pExWW_Wnz5NVCGX27UcV7Ji8iMObQ1W0XFldPC_LzpE-xV_xP0S8CPUgOQY7W2pCrbR6yAtshUI_LDUTQs6G8aSnOLhJ7xlMpU17mXsEniaAfAFHiYLBb3aCYeJwP0iT7WQa23wXBcZpckJehkNZ4NHo5qhYHCLubnhCxsc24mYYNQSQoTUJZ4VSd4Sx4JACEJIRGQ2arPIi8HxSexTCtQMCSWcEPsUtZI0gTOEPV9ALBmEsD3PCTmLXGq7hEtf-m7sEKuDSO3RgFcC42rOxSrQiYbJAo1CoFAIKhQ66HazJivlNf60bisINpaV989_f32BdtXisnHwErXydQFXaId_5Mv39bU-PV9GrMFx |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEB7qA9SDryq-3YM3jc3mtdljrZWKtRRboeAhJLsTWqhJ0Vb8-e5u01pBBG8hzGbDfIH5ZjMzH8AFo6ltp0LXSXHb8tATVuLGriVZzFQ8RhWRhRGbYK1W2Ovxdgmu5r0wiGiKz_BaX5p_-TIXE31UVlEJVejzcAlWtHJW0a31faJiM87N2GXFErildrFnXTI2r3SrnWZbl3L5147nK15Mf0SiBWkVE1nutv73TtuwWTBIUp1CvgMlzHZhY2GuYBleavlkNERJbgembcHkw0TRU1L_xNeRSmatGxW9JOmMEEWf1LO-Bl_vROJMkupknJtRrjODp1mZUZ7twfNdvVtrWIWKgiUc7o-tULrouV7CJWeOlDJmPg2cRDGXNJUUY5RSYaLyUZcnQYpeSNOQMWR2TBkVlLr7sJzlGR4ACUKJqeIQ0g0CLxY88ZnrU6F8GfqpR51DoDOPRqIYMa6VLoaRSTVsHhkUIo1CVKBwCJfzNaPpgI0_rcsagrll4f2j32-fw1qj-9iMmveth2NY1w-athGewPL4bYKnsCo-xoP3tzPzJX0Bh8XEug |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Coupled+Dictionaries+for+Exemplar-Based+Speech+Enhancement+and+Automatic+Speech+Recognition&rft.jtitle=IEEE%2FACM+transactions+on+audio%2C+speech%2C+and+language+processing&rft.au=Baby%2C+Deepak&rft.au=Virtanen%2C+Tuomas&rft.au=Gemmeke%2C+Jort+F.&rft.au=Van+hamme%2C+Hugo&rft.date=2015-11-01&rft.pub=IEEE&rft.issn=2329-9290&rft.volume=23&rft.issue=11&rft.spage=1788&rft.epage=1799&rft_id=info:doi/10.1109%2FTASLP.2015.2450491&rft.externalDocID=7138598 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2329-9290&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2329-9290&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2329-9290&client=summon |