Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse Filtering
Due to the profound differences between acoustic characteristics of neutral and whispered speech, the performance of traditional automatic speech recognition (ASR) systems trained on neutral speech degrades significantly when whisper is applied. In order to deeply analyze this mismatched train/test...
Gespeichert in:
| Veröffentlicht in: | IEEE/ACM transactions on audio, speech, and language processing Jg. 25; H. 12; S. 2313 - 2322 |
|---|---|
| Hauptverfasser: | , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
01.12.2017
|
| Schlagworte: | |
| ISSN: | 2329-9290, 2329-9304 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Due to the profound differences between acoustic characteristics of neutral and whispered speech, the performance of traditional automatic speech recognition (ASR) systems trained on neutral speech degrades significantly when whisper is applied. In order to deeply analyze this mismatched train/test situation and to develop an efficient way for whisper recognition, this study first analyzes acoustic characteristics of whispered speech, addresses the problems of whispered speech recognition in mismatched conditions, and then proposes a new robust cepstral features and preprocessing approach based on deep denoising autoencoder (DDAE) that enhance whisper recognition. The experimental results confirm that Teager-energy-based cepstral features, especially TECCs, are more robust and better whisper descriptors than traditional Mel-frequency cepstral coefficients (MFCC). Further detailed analysis of cepstral distances, distributions of cepstral coefficients, confusion matrices, and experiments with inverse filtering, prove that voicing in speech stimuli is the main cause of word misclassification in mismatched train/test scenarios. The new framework based on DDAE and TECC feature, significantly improves whisper recognition accuracy and outperforms traditional MFCC and GMM-HMM (Gaussian mixture density-Hidden Markov model) baseline, resulting in an absolute 31% improvement of whisper recognition accuracy. The achieved word recognition rate in neutral/whisper scenario is 92.81%. |
|---|---|
| AbstractList | Due to the profound differences between acoustic characteristics of neutral and whispered speech, the performance of traditional automatic speech recognition (ASR) systems trained on neutral speech degrades significantly when whisper is applied. In order to deeply analyze this mismatched train/test situation and to develop an efficient way for whisper recognition, this study first analyzes acoustic characteristics of whispered speech, addresses the problems of whispered speech recognition in mismatched conditions, and then proposes a new robust cepstral features and preprocessing approach based on deep denoising autoencoder (DDAE) that enhance whisper recognition. The experimental results confirm that Teager-energy-based cepstral features, especially TECCs, are more robust and better whisper descriptors than traditional Mel-frequency cepstral coefficients (MFCC). Further detailed analysis of cepstral distances, distributions of cepstral coefficients, confusion matrices, and experiments with inverse filtering, prove that voicing in speech stimuli is the main cause of word misclassification in mismatched train/test scenarios. The new framework based on DDAE and TECC feature, significantly improves whisper recognition accuracy and outperforms traditional MFCC and GMM-HMM (Gaussian mixture density-Hidden Markov model) baseline, resulting in an absolute 31% improvement of whisper recognition accuracy. The achieved word recognition rate in neutral/whisper scenario is 92.81%. |
| Author | Jovicic, Slobodan T. Grozdic, Dorde T. |
| Author_xml | – sequence: 1 givenname: Dorde T. surname: Grozdic fullname: Grozdic, Dorde T. email: djordjegrozdic@gmail.com organization: Sch. of Electr. Eng., Univ. of Belgrade, Belgrade, Serbia – sequence: 2 givenname: Slobodan T. surname: Jovicic fullname: Jovicic, Slobodan T. email: jovicic@etf.rs organization: Sch. of Electr. Eng., Univ. of Belgrade, Belgrade, Serbia |
| BookMark | eNp9kMlOwzAQQC1UJErpD8DFP5DitY6PUaFQKRKItuIYJc6kNSp2ZAck_p50gQMHLrNo5o1G7xINnHeA0DUlE0qJvl1ly_x5wghVE6Z4KqU-Q0PGmU40J2LwUzNNLtA4xjdCCCVKayWGaPm6tbGFADVetgBmi1_A-I2znfUOr6N1G3wH0PbBeXtos4_OgzO-hoBLV-OF-4QQAc_troPQb1yh86bcRRif8git5_er2WOSPz0sZlmeGDZVXcKkAVUboqQUoqkqolSjVE2BMyWJkVSoxkx5VQshJDAmteGyqYwoU8N1Pxmh9HjXBB9jgKYwtiv3j3ehtLuCkmLvpzj4KfZ-ipOfHmV_0DbY9zJ8_Q_dHCELAL9ASqngUvJvdDdz5Q |
| CODEN | ITASD8 |
| CitedBy_id | crossref_primary_10_1007_s12065_019_00306_6 crossref_primary_10_1016_j_csl_2021_101281 crossref_primary_10_1007_s00034_019_01164_4 crossref_primary_10_1007_s10772_023_10034_z crossref_primary_10_1109_ACCESS_2022_3164897 crossref_primary_10_1016_j_specom_2020_10_003 crossref_primary_10_1109_MGRS_2018_2853555 crossref_primary_10_3390_a15020068 crossref_primary_10_1007_s11063_020_10223_w crossref_primary_10_1007_s00034_022_02263_5 crossref_primary_10_1109_ACCESS_2021_3112535 crossref_primary_10_1109_TIM_2018_2868490 crossref_primary_10_1109_TGRS_2020_3019313 crossref_primary_10_1016_j_csl_2022_101477 crossref_primary_10_1109_ACCESS_2019_2940700 crossref_primary_10_1109_ACCESS_2023_3299814 crossref_primary_10_1016_j_dsp_2024_104811 crossref_primary_10_3390_e21100963 crossref_primary_10_3390_sym14040777 crossref_primary_10_1145_3351276 crossref_primary_10_3390_app14188223 crossref_primary_10_3389_fpubh_2022_941083 crossref_primary_10_1080_23311983_2023_2290786 crossref_primary_10_1111_coin_12281 crossref_primary_10_1016_j_inffus_2017_12_007 crossref_primary_10_1088_1742_6596_1237_2_022106 |
| Cites_doi | 10.1016/j.jvoice.2006.08.012 10.1145/1390156.1390294 10.1007/978-3-319-11581-8_31 10.1109/ICASSP.2010.5495022 10.1109/ISCSLP.2012.6423522 10.1109/ICASSP.2015.7178927 10.1109/TASLP.2016.2580944 10.1109/TASSP.1980.1163453 10.1109/TASL.2009.2034770 10.1007/978-0-387-30441-0 10.1044/jshr.2702.251 10.1109/TASL.2010.2066967 10.1109/78.277799 10.1109/ICASSP.2013.6639243 10.1186/1687-6180-2012-157 10.1109/ICASSP.1993.319457 10.1162/089976602760128018 10.1109/ICASSP.2014.6854059 10.1016/j.specom.2003.10.005 10.1007/978-3-642-40585-3_74 10.1109/TASL.2010.2091631 |
| ContentType | Journal Article |
| DBID | 97E RIA RIE AAYXX CITATION |
| DOI | 10.1109/TASLP.2017.2738559 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE Electronic Library (IEL) CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Explore url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Biology |
| EISSN | 2329-9304 |
| EndPage | 2322 |
| ExternalDocumentID | 10_1109_TASLP_2017_2738559 8114355 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: Serbian Ministry of Education – fundername: Science and Technological Development grantid: TR 32032; OI 178027 |
| GroupedDBID | 0R~ 4.4 6IK 97E AAJGR AAKMM AALFJ AARMG AASAJ AAWTH AAWTV ABAZT ABQJQ ABVLG ACIWK ACM ADBCU AEBYY AEFXT AEJOY AENSD AFWIH AFWXC AGQYO AGSQL AHBIQ AIKLT AKJIK AKQYR AKRVB ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CCLIF EBS EJD GUFHI HGAVV IFIPE IPLJI JAVBF LHSKQ M43 OCL PQQKQ RIA RIE RNS ROL AAYXX CITATION |
| ID | FETCH-LOGICAL-c267t-25ce7dc075544fbb077f77d1e32750c5147fc63bd4445e2259c35fbc4a8c39c63 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 49 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000417743800007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2329-9290 |
| IngestDate | Tue Nov 18 20:53:17 EST 2025 Sat Nov 29 02:43:50 EST 2025 Tue Aug 26 16:55:38 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 12 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c267t-25ce7dc075544fbb077f77d1e32750c5147fc63bd4445e2259c35fbc4a8c39c63 |
| PageCount | 10 |
| ParticipantIDs | crossref_citationtrail_10_1109_TASLP_2017_2738559 ieee_primary_8114355 crossref_primary_10_1109_TASLP_2017_2738559 |
| PublicationCentury | 2000 |
| PublicationDate | 2017-Dec. 2017-12-00 |
| PublicationDateYYYYMMDD | 2017-12-01 |
| PublicationDate_xml | – month: 12 year: 2017 text: 2017-Dec. |
| PublicationDecade | 2010 |
| PublicationTitle | IEEE/ACM transactions on audio, speech, and language processing |
| PublicationTitleAbbrev | TASLP |
| PublicationYear | 2017 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| References | ref35 ref13 jovicic (ref25) 1998; 84 ref15 ref14 ref31 ref30 leggetter (ref37) 0 ref33 lim (ref4) 2011 ref1 ghaffarzadegan (ref20) 0 morris (ref2) 2003 ref38 ref16 ref19 ref18 young (ref36) 2002 dimitriadis (ref10) 0 lee (ref17) 0 zhang (ref27) 0 tao (ref9) 0 heracleous (ref11) 2009; 5 jou (ref8) 0 ref26 ref22 grozdi? (ref23) 0 ref21 ref28 zhou (ref12) 0 ref29 ref7 grozdi? (ref34) 0 ref3 ref6 ref5 kvedalen (ref32) 2003 jovi?i? (ref24) 0 |
| References_xml | – ident: ref1 doi: 10.1016/j.jvoice.2006.08.012 – start-page: 157 year: 0 ident: ref34 article-title: Application of inverse filtering in enhancement of whisper recognition publication-title: Proc IEEE Neural Netw Appl Elect Eng – ident: ref33 doi: 10.1145/1390156.1390294 – year: 2003 ident: ref32 article-title: Signal processing using the Teager energy operator and other nonlinear operators – start-page: 77 year: 0 ident: ref24 article-title: Serbian emotional speech database: Design, processing and evaluation publication-title: Proc 9th Conf Speech Comput – ident: ref18 doi: 10.1007/978-3-319-11581-8_31 – start-page: 2420 year: 0 ident: ref20 article-title: Model and feature based compensation for whispered speech recognition publication-title: Proc INTERSPEECH – ident: ref19 doi: 10.1109/ICASSP.2010.5495022 – start-page: 110 year: 0 ident: ref37 article-title: Flexible speaker adaptation using maximum likelihood linear regression publication-title: Proc ARPA Spoken Lang Technol Workshop – ident: ref5 doi: 10.1109/ISCSLP.2012.6423522 – start-page: 5 year: 0 ident: ref8 article-title: Adaptation for soft whisper recognition using a throat microphone publication-title: Proc INTERSPEECH – ident: ref7 doi: 10.1109/ICASSP.2015.7178927 – ident: ref22 doi: 10.1109/TASLP.2016.2580944 – start-page: 549 year: 0 ident: ref12 article-title: Classification of speech under stress based on features derived from the nonlinear Teager energy operator publication-title: Proc IEEE Int Conf Acoust Speech Signal Process – ident: ref30 doi: 10.1109/TASSP.1980.1163453 – ident: ref21 doi: 10.1109/TASL.2009.2034770 – year: 2002 ident: ref36 publication-title: The HTK Book (for HTK Version 3 2) – ident: ref38 doi: 10.1007/978-0-387-30441-0 – start-page: 1598 year: 0 ident: ref17 article-title: A whispered mandarin corpus for speech technology applications publication-title: Proc INTERSPEECH – ident: ref28 doi: 10.1044/jshr.2702.251 – year: 2011 ident: ref4 article-title: Computational differences between whispered and non-whispered speech – volume: 5 start-page: 31 year: 2009 ident: ref11 article-title: Using teager energy cepstrum and HMM distances publication-title: Int J Inform Commun Eng – ident: ref14 doi: 10.1109/TASL.2010.2066967 – ident: ref31 doi: 10.1109/78.277799 – start-page: 3013 year: 0 ident: ref10 article-title: Auditory teager energy cepstrum coefficients for robust speech recognition publication-title: Proc EUSPICO – ident: ref16 doi: 10.1109/ICASSP.2013.6639243 – start-page: 2396 year: 0 ident: ref27 article-title: Analysis and classification of speech mode: Whispered through shouted publication-title: Proc INTERSPEECH – ident: ref6 doi: 10.1186/1687-6180-2012-157 – ident: ref29 doi: 10.1109/ICASSP.1993.319457 – start-page: 1154 year: 0 ident: ref9 article-title: Lipreading approach for isolated digits recognition under whisper and neutral speech publication-title: Proc ISCA Interspeech – ident: ref35 doi: 10.1162/089976602760128018 – year: 2003 ident: ref2 article-title: Enhancement and recognition of whispered speech – ident: ref15 doi: 10.1109/ICASSP.2014.6854059 – start-page: 728 year: 0 ident: ref23 article-title: Application of neural networks in whispered speech recognition publication-title: 20th Telecommunication Forum – volume: 84 start-page: 739 year: 1998 ident: ref25 article-title: Formant feature differences between whispered and voiced sustained vowels publication-title: Acta Acust United with Acust – ident: ref3 doi: 10.1016/j.specom.2003.10.005 – ident: ref13 doi: 10.1007/978-3-642-40585-3_74 – ident: ref26 doi: 10.1109/TASL.2010.2091631 |
| SSID | ssj0001079974 |
| Score | 2.2889292 |
| Snippet | Due to the profound differences between acoustic characteristics of neutral and whispered speech, the performance of traditional automatic speech recognition... |
| SourceID | crossref ieee |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 2313 |
| SubjectTerms | Automatic speech recognition Biology Character recognition deep denoising autoencoder Encoding Inverse filtering Mel frequency cepstral coefficient Noise levels Speech recognition Teager-energy operator whispered speech recognition |
| Title | Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse Filtering |
| URI | https://ieeexplore.ieee.org/document/8114355 |
| Volume | 25 |
| WOSCitedRecordID | wos000417743800007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Explore customDbUrl: eissn: 2329-9304 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001079974 issn: 2329-9290 databaseCode: RIE dateStart: 20140101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5qUdCDj1axvsjBm267j2zTHItaPJRSbNXelk0ySxdkt_Qh-O9Nstvagwheln1MIGQ2fJPkm28AbrmrmJ946LixChyN13pKadR1pEAufKoSbsmYb302GHQmEz6swP0mFwYRLfkMm-bWnuWrXK7MVlmr4xl0D3dghzFW5Gr97Ke4jHMruqxjBO5o1HfXOTIub427o_7QELlY0-q3GGnSLRzaKqxicaV39L8eHcNhGT-SbuHwE6hgVoO9oqLkVw0OtvQF6zB6n6ZGCRwVGc0Q5ZS8rPlCeUYsW4A8Is70JctT-9hdLXOjbalwTuJMEaPDMV8g6aXmWF1bnMJr72n88OyUVRQc6bfZ0vFDiUxJHRqElCZCuIwljCkPAyPtLnXAxBLZDoSilIaopzeXQZgISeOODLj-cgbVLM_wHIgKAiXiWHhtHlIfqTlz9JSfuGaV2BasAd56TCNZSoybShcfkV1quDyyfoiMH6LSDw2427SZFQIbf1rXjRM2luX4X_z--hL2TeOCfXIF1eV8hdewKz-X6WJ-Y_-fb4h5wTY |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8MwDLbGAAEHHhuI8cyBG3TrI12W4wRMIMaE2HjcqiZxxSTUTmND4t-TpN3YASFxqfpwo8puZCf-_BngjLuK-YmHjhurwNH-Wk8p7XUdKZALn6qEWzDmc5f1eq3XV_5Qgot5LQwiWvAZ1s2pzeWrTE7NVlmj5RnvHi7Bckip7-XVWj87Ki7j3NIu6yiBO9rvu7MqGZc3Bu1-98FAuVjdMrgYctIFT7TQWsV6ls7W_75pGzaLCJK0c5PvQAnTCqzmPSW_KrCxwDBYhf7L29BwgaMi_RGifCOPM8RQlhKLFyBXiCN9SLOhvWxPJ5lht1Q4JnGqiGHiGH8g6QxNYl1L7MJT53pweeMUfRQc6TfZxPFDiUxJHRxotSVCuIwljCkPA0PuLnXIxBLZDISilIaoJziXQZgISeOWDLh-sgflNEtxH4gKAiXiWHhNHlIfqck6espPXLNObApWA2-m00gWJOOm18V7ZBcbLo-sHSJjh6iwQw3O5--McoqNP6WrxghzyUL_B7_fPoW1m8F9N-re9u4OYd0MlGNRjqA8GU_xGFbk52T4MT6x_9I3wqnEfQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Whispered+Speech+Recognition+Using+Deep+Denoising+Autoencoder+and+Inverse+Filtering&rft.jtitle=IEEE%2FACM+transactions+on+audio%2C+speech%2C+and+language+processing&rft.au=Grozdic%2C+Dorde+T.&rft.au=Jovicic%2C+Slobodan+T.&rft.date=2017-12-01&rft.pub=IEEE&rft.issn=2329-9290&rft.volume=25&rft.issue=12&rft.spage=2313&rft.epage=2322&rft_id=info:doi/10.1109%2FTASLP.2017.2738559&rft.externalDocID=8114355 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2329-9290&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2329-9290&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2329-9290&client=summon |