Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse Filtering

Due to the profound differences between acoustic characteristics of neutral and whispered speech, the performance of traditional automatic speech recognition (ASR) systems trained on neutral speech degrades significantly when whisper is applied. In order to deeply analyze this mismatched train/test...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM transactions on audio, speech, and language processing Jg. 25; H. 12; S. 2313 - 2322
Hauptverfasser: Grozdic, Dorde T., Jovicic, Slobodan T.
Format: Journal Article
Sprache:Englisch
Veröffentlicht: IEEE 01.12.2017
Schlagworte:
ISSN:2329-9290, 2329-9304
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Due to the profound differences between acoustic characteristics of neutral and whispered speech, the performance of traditional automatic speech recognition (ASR) systems trained on neutral speech degrades significantly when whisper is applied. In order to deeply analyze this mismatched train/test situation and to develop an efficient way for whisper recognition, this study first analyzes acoustic characteristics of whispered speech, addresses the problems of whispered speech recognition in mismatched conditions, and then proposes a new robust cepstral features and preprocessing approach based on deep denoising autoencoder (DDAE) that enhance whisper recognition. The experimental results confirm that Teager-energy-based cepstral features, especially TECCs, are more robust and better whisper descriptors than traditional Mel-frequency cepstral coefficients (MFCC). Further detailed analysis of cepstral distances, distributions of cepstral coefficients, confusion matrices, and experiments with inverse filtering, prove that voicing in speech stimuli is the main cause of word misclassification in mismatched train/test scenarios. The new framework based on DDAE and TECC feature, significantly improves whisper recognition accuracy and outperforms traditional MFCC and GMM-HMM (Gaussian mixture density-Hidden Markov model) baseline, resulting in an absolute 31% improvement of whisper recognition accuracy. The achieved word recognition rate in neutral/whisper scenario is 92.81%.
AbstractList Due to the profound differences between acoustic characteristics of neutral and whispered speech, the performance of traditional automatic speech recognition (ASR) systems trained on neutral speech degrades significantly when whisper is applied. In order to deeply analyze this mismatched train/test situation and to develop an efficient way for whisper recognition, this study first analyzes acoustic characteristics of whispered speech, addresses the problems of whispered speech recognition in mismatched conditions, and then proposes a new robust cepstral features and preprocessing approach based on deep denoising autoencoder (DDAE) that enhance whisper recognition. The experimental results confirm that Teager-energy-based cepstral features, especially TECCs, are more robust and better whisper descriptors than traditional Mel-frequency cepstral coefficients (MFCC). Further detailed analysis of cepstral distances, distributions of cepstral coefficients, confusion matrices, and experiments with inverse filtering, prove that voicing in speech stimuli is the main cause of word misclassification in mismatched train/test scenarios. The new framework based on DDAE and TECC feature, significantly improves whisper recognition accuracy and outperforms traditional MFCC and GMM-HMM (Gaussian mixture density-Hidden Markov model) baseline, resulting in an absolute 31% improvement of whisper recognition accuracy. The achieved word recognition rate in neutral/whisper scenario is 92.81%.
Author Jovicic, Slobodan T.
Grozdic, Dorde T.
Author_xml – sequence: 1
  givenname: Dorde T.
  surname: Grozdic
  fullname: Grozdic, Dorde T.
  email: djordjegrozdic@gmail.com
  organization: Sch. of Electr. Eng., Univ. of Belgrade, Belgrade, Serbia
– sequence: 2
  givenname: Slobodan T.
  surname: Jovicic
  fullname: Jovicic, Slobodan T.
  email: jovicic@etf.rs
  organization: Sch. of Electr. Eng., Univ. of Belgrade, Belgrade, Serbia
BookMark eNp9kMlOwzAQQC1UJErpD8DFP5DitY6PUaFQKRKItuIYJc6kNSp2ZAck_p50gQMHLrNo5o1G7xINnHeA0DUlE0qJvl1ly_x5wghVE6Z4KqU-Q0PGmU40J2LwUzNNLtA4xjdCCCVKayWGaPm6tbGFADVetgBmi1_A-I2znfUOr6N1G3wH0PbBeXtos4_OgzO-hoBLV-OF-4QQAc_troPQb1yh86bcRRif8git5_er2WOSPz0sZlmeGDZVXcKkAVUboqQUoqkqolSjVE2BMyWJkVSoxkx5VQshJDAmteGyqYwoU8N1Pxmh9HjXBB9jgKYwtiv3j3ehtLuCkmLvpzj4KfZ-ipOfHmV_0DbY9zJ8_Q_dHCELAL9ASqngUvJvdDdz5Q
CODEN ITASD8
CitedBy_id crossref_primary_10_1007_s12065_019_00306_6
crossref_primary_10_1016_j_csl_2021_101281
crossref_primary_10_1007_s00034_019_01164_4
crossref_primary_10_1007_s10772_023_10034_z
crossref_primary_10_1109_ACCESS_2022_3164897
crossref_primary_10_1016_j_specom_2020_10_003
crossref_primary_10_1109_MGRS_2018_2853555
crossref_primary_10_3390_a15020068
crossref_primary_10_1007_s11063_020_10223_w
crossref_primary_10_1007_s00034_022_02263_5
crossref_primary_10_1109_ACCESS_2021_3112535
crossref_primary_10_1109_TIM_2018_2868490
crossref_primary_10_1109_TGRS_2020_3019313
crossref_primary_10_1016_j_csl_2022_101477
crossref_primary_10_1109_ACCESS_2019_2940700
crossref_primary_10_1109_ACCESS_2023_3299814
crossref_primary_10_1016_j_dsp_2024_104811
crossref_primary_10_3390_e21100963
crossref_primary_10_3390_sym14040777
crossref_primary_10_1145_3351276
crossref_primary_10_3390_app14188223
crossref_primary_10_3389_fpubh_2022_941083
crossref_primary_10_1080_23311983_2023_2290786
crossref_primary_10_1111_coin_12281
crossref_primary_10_1016_j_inffus_2017_12_007
crossref_primary_10_1088_1742_6596_1237_2_022106
Cites_doi 10.1016/j.jvoice.2006.08.012
10.1145/1390156.1390294
10.1007/978-3-319-11581-8_31
10.1109/ICASSP.2010.5495022
10.1109/ISCSLP.2012.6423522
10.1109/ICASSP.2015.7178927
10.1109/TASLP.2016.2580944
10.1109/TASSP.1980.1163453
10.1109/TASL.2009.2034770
10.1007/978-0-387-30441-0
10.1044/jshr.2702.251
10.1109/TASL.2010.2066967
10.1109/78.277799
10.1109/ICASSP.2013.6639243
10.1186/1687-6180-2012-157
10.1109/ICASSP.1993.319457
10.1162/089976602760128018
10.1109/ICASSP.2014.6854059
10.1016/j.specom.2003.10.005
10.1007/978-3-642-40585-3_74
10.1109/TASL.2010.2091631
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
DOI 10.1109/TASLP.2017.2738559
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005-present
IEEE All-Society Periodicals Package (ASPP) 1998-Present
IEEE Electronic Library (IEL)
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Explore
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Biology
EISSN 2329-9304
EndPage 2322
ExternalDocumentID 10_1109_TASLP_2017_2738559
8114355
Genre orig-research
GrantInformation_xml – fundername: Serbian Ministry of Education
– fundername: Science and Technological Development
  grantid: TR 32032; OI 178027
GroupedDBID 0R~
4.4
6IK
97E
AAJGR
AAKMM
AALFJ
AARMG
AASAJ
AAWTH
AAWTV
ABAZT
ABQJQ
ABVLG
ACIWK
ACM
ADBCU
AEBYY
AEFXT
AEJOY
AENSD
AFWIH
AFWXC
AGQYO
AGSQL
AHBIQ
AIKLT
AKJIK
AKQYR
AKRVB
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CCLIF
EBS
EJD
GUFHI
HGAVV
IFIPE
IPLJI
JAVBF
LHSKQ
M43
OCL
PQQKQ
RIA
RIE
RNS
ROL
AAYXX
CITATION
ID FETCH-LOGICAL-c267t-25ce7dc075544fbb077f77d1e32750c5147fc63bd4445e2259c35fbc4a8c39c63
IEDL.DBID RIE
ISICitedReferencesCount 49
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000417743800007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2329-9290
IngestDate Tue Nov 18 20:53:17 EST 2025
Sat Nov 29 02:43:50 EST 2025
Tue Aug 26 16:55:38 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 12
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c267t-25ce7dc075544fbb077f77d1e32750c5147fc63bd4445e2259c35fbc4a8c39c63
PageCount 10
ParticipantIDs crossref_citationtrail_10_1109_TASLP_2017_2738559
ieee_primary_8114355
crossref_primary_10_1109_TASLP_2017_2738559
PublicationCentury 2000
PublicationDate 2017-Dec.
2017-12-00
PublicationDateYYYYMMDD 2017-12-01
PublicationDate_xml – month: 12
  year: 2017
  text: 2017-Dec.
PublicationDecade 2010
PublicationTitle IEEE/ACM transactions on audio, speech, and language processing
PublicationTitleAbbrev TASLP
PublicationYear 2017
Publisher IEEE
Publisher_xml – name: IEEE
References ref35
ref13
jovicic (ref25) 1998; 84
ref15
ref14
ref31
ref30
leggetter (ref37) 0
ref33
lim (ref4) 2011
ref1
ghaffarzadegan (ref20) 0
morris (ref2) 2003
ref38
ref16
ref19
ref18
young (ref36) 2002
dimitriadis (ref10) 0
lee (ref17) 0
zhang (ref27) 0
tao (ref9) 0
heracleous (ref11) 2009; 5
jou (ref8) 0
ref26
ref22
grozdi? (ref23) 0
ref21
ref28
zhou (ref12) 0
ref29
ref7
grozdi? (ref34) 0
ref3
ref6
ref5
kvedalen (ref32) 2003
jovi?i? (ref24) 0
References_xml – ident: ref1
  doi: 10.1016/j.jvoice.2006.08.012
– start-page: 157
  year: 0
  ident: ref34
  article-title: Application of inverse filtering in enhancement of whisper recognition
  publication-title: Proc IEEE Neural Netw Appl Elect Eng
– ident: ref33
  doi: 10.1145/1390156.1390294
– year: 2003
  ident: ref32
  article-title: Signal processing using the Teager energy operator and other nonlinear operators
– start-page: 77
  year: 0
  ident: ref24
  article-title: Serbian emotional speech database: Design, processing and evaluation
  publication-title: Proc 9th Conf Speech Comput
– ident: ref18
  doi: 10.1007/978-3-319-11581-8_31
– start-page: 2420
  year: 0
  ident: ref20
  article-title: Model and feature based compensation for whispered speech recognition
  publication-title: Proc INTERSPEECH
– ident: ref19
  doi: 10.1109/ICASSP.2010.5495022
– start-page: 110
  year: 0
  ident: ref37
  article-title: Flexible speaker adaptation using maximum likelihood linear regression
  publication-title: Proc ARPA Spoken Lang Technol Workshop
– ident: ref5
  doi: 10.1109/ISCSLP.2012.6423522
– start-page: 5
  year: 0
  ident: ref8
  article-title: Adaptation for soft whisper recognition using a throat microphone
  publication-title: Proc INTERSPEECH
– ident: ref7
  doi: 10.1109/ICASSP.2015.7178927
– ident: ref22
  doi: 10.1109/TASLP.2016.2580944
– start-page: 549
  year: 0
  ident: ref12
  article-title: Classification of speech under stress based on features derived from the nonlinear Teager energy operator
  publication-title: Proc IEEE Int Conf Acoust Speech Signal Process
– ident: ref30
  doi: 10.1109/TASSP.1980.1163453
– ident: ref21
  doi: 10.1109/TASL.2009.2034770
– year: 2002
  ident: ref36
  publication-title: The HTK Book (for HTK Version 3 2)
– ident: ref38
  doi: 10.1007/978-0-387-30441-0
– start-page: 1598
  year: 0
  ident: ref17
  article-title: A whispered mandarin corpus for speech technology applications
  publication-title: Proc INTERSPEECH
– ident: ref28
  doi: 10.1044/jshr.2702.251
– year: 2011
  ident: ref4
  article-title: Computational differences between whispered and non-whispered speech
– volume: 5
  start-page: 31
  year: 2009
  ident: ref11
  article-title: Using teager energy cepstrum and HMM distances
  publication-title: Int J Inform Commun Eng
– ident: ref14
  doi: 10.1109/TASL.2010.2066967
– ident: ref31
  doi: 10.1109/78.277799
– start-page: 3013
  year: 0
  ident: ref10
  article-title: Auditory teager energy cepstrum coefficients for robust speech recognition
  publication-title: Proc EUSPICO
– ident: ref16
  doi: 10.1109/ICASSP.2013.6639243
– start-page: 2396
  year: 0
  ident: ref27
  article-title: Analysis and classification of speech mode: Whispered through shouted
  publication-title: Proc INTERSPEECH
– ident: ref6
  doi: 10.1186/1687-6180-2012-157
– ident: ref29
  doi: 10.1109/ICASSP.1993.319457
– start-page: 1154
  year: 0
  ident: ref9
  article-title: Lipreading approach for isolated digits recognition under whisper and neutral speech
  publication-title: Proc ISCA Interspeech
– ident: ref35
  doi: 10.1162/089976602760128018
– year: 2003
  ident: ref2
  article-title: Enhancement and recognition of whispered speech
– ident: ref15
  doi: 10.1109/ICASSP.2014.6854059
– start-page: 728
  year: 0
  ident: ref23
  article-title: Application of neural networks in whispered speech recognition
  publication-title: 20th Telecommunication Forum
– volume: 84
  start-page: 739
  year: 1998
  ident: ref25
  article-title: Formant feature differences between whispered and voiced sustained vowels
  publication-title: Acta Acust United with Acust
– ident: ref3
  doi: 10.1016/j.specom.2003.10.005
– ident: ref13
  doi: 10.1007/978-3-642-40585-3_74
– ident: ref26
  doi: 10.1109/TASL.2010.2091631
SSID ssj0001079974
Score 2.2889292
Snippet Due to the profound differences between acoustic characteristics of neutral and whispered speech, the performance of traditional automatic speech recognition...
SourceID crossref
ieee
SourceType Enrichment Source
Index Database
Publisher
StartPage 2313
SubjectTerms Automatic speech recognition
Biology
Character recognition
deep denoising autoencoder
Encoding
Inverse filtering
Mel frequency cepstral coefficient
Noise levels
Speech recognition
Teager-energy operator
whispered speech recognition
Title Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse Filtering
URI https://ieeexplore.ieee.org/document/8114355
Volume 25
WOSCitedRecordID wos000417743800007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Explore
  customDbUrl:
  eissn: 2329-9304
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001079974
  issn: 2329-9290
  databaseCode: RIE
  dateStart: 20140101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5qUdCDj1axvsjBm267j2zTHItaPJRSbNXelk0ySxdkt_Qh-O9Nstvagwheln1MIGQ2fJPkm28AbrmrmJ946LixChyN13pKadR1pEAufKoSbsmYb302GHQmEz6swP0mFwYRLfkMm-bWnuWrXK7MVlmr4xl0D3dghzFW5Gr97Ke4jHMruqxjBO5o1HfXOTIub427o_7QELlY0-q3GGnSLRzaKqxicaV39L8eHcNhGT-SbuHwE6hgVoO9oqLkVw0OtvQF6zB6n6ZGCRwVGc0Q5ZS8rPlCeUYsW4A8Is70JctT-9hdLXOjbalwTuJMEaPDMV8g6aXmWF1bnMJr72n88OyUVRQc6bfZ0vFDiUxJHRqElCZCuIwljCkPAyPtLnXAxBLZDoSilIaopzeXQZgISeOODLj-cgbVLM_wHIgKAiXiWHhtHlIfqTlz9JSfuGaV2BasAd56TCNZSoybShcfkV1quDyyfoiMH6LSDw2427SZFQIbf1rXjRM2luX4X_z--hL2TeOCfXIF1eV8hdewKz-X6WJ-Y_-fb4h5wTY
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8MwDLbGAAEHHhuI8cyBG3TrI12W4wRMIMaE2HjcqiZxxSTUTmND4t-TpN3YASFxqfpwo8puZCf-_BngjLuK-YmHjhurwNH-Wk8p7XUdKZALn6qEWzDmc5f1eq3XV_5Qgot5LQwiWvAZ1s2pzeWrTE7NVlmj5RnvHi7Bckip7-XVWj87Ki7j3NIu6yiBO9rvu7MqGZc3Bu1-98FAuVjdMrgYctIFT7TQWsV6ls7W_75pGzaLCJK0c5PvQAnTCqzmPSW_KrCxwDBYhf7L29BwgaMi_RGifCOPM8RQlhKLFyBXiCN9SLOhvWxPJ5lht1Q4JnGqiGHiGH8g6QxNYl1L7MJT53pweeMUfRQc6TfZxPFDiUxJHRxotSVCuIwljCkPA0PuLnXIxBLZDISilIaoJziXQZgISeOWDLh-sgflNEtxH4gKAiXiWHhNHlIfqck6espPXLNObApWA2-m00gWJOOm18V7ZBcbLo-sHSJjh6iwQw3O5--McoqNP6WrxghzyUL_B7_fPoW1m8F9N-re9u4OYd0MlGNRjqA8GU_xGFbk52T4MT6x_9I3wqnEfQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Whispered+Speech+Recognition+Using+Deep+Denoising+Autoencoder+and+Inverse+Filtering&rft.jtitle=IEEE%2FACM+transactions+on+audio%2C+speech%2C+and+language+processing&rft.au=Grozdic%2C+Dorde+T.&rft.au=Jovicic%2C+Slobodan+T.&rft.date=2017-12-01&rft.pub=IEEE&rft.issn=2329-9290&rft.volume=25&rft.issue=12&rft.spage=2313&rft.epage=2322&rft_id=info:doi/10.1109%2FTASLP.2017.2738559&rft.externalDocID=8114355
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2329-9290&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2329-9290&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2329-9290&client=summon