Whispered speech recognition using deep denoising autoencoder

Recently Deep Denoising Autoencoders (DDAE) have shown state-of-the-art performance on various machine learning tasks. In this paper, the authors extended this approach to whispered speech recognition which is one of the most challenging problems in Automatic Speech Recognition (ASR). Namely, due to...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Engineering applications of artificial intelligence Ročník 59; s. 15 - 22
Hlavní autori: Grozdić, Đorđe T., Jovičić, Slobodan T., Subotić, Miško
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Elsevier Ltd 01.03.2017
Predmet:
ISSN:0952-1976, 1873-6769
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Recently Deep Denoising Autoencoders (DDAE) have shown state-of-the-art performance on various machine learning tasks. In this paper, the authors extended this approach to whispered speech recognition which is one of the most challenging problems in Automatic Speech Recognition (ASR). Namely, due to the profound differences between acoustic characteristics of neutral and whispered speech, the performance of traditional ASR systems trained on neutral speech degrades significantly when whisper is applied. This mismatch between training and testing is successfully alleviated with the new proposed system based on deep learning, where DDAE is applied for generating whisper-robust cepstral features. This system was tested and compared in terms of word recognition accuracy with conventional Hidden Markov Model (HMM) speech recognizer in an isolated word recognition task with a real database of whispered speech (WhiSpe). Three types of cepstral coefficients were used in the experiments: MFCC (Mel-Frequency Cepstral Coefficients), TECC (Teager-Energy Cepstral Coefficients) and TEMFCC (Teager-based Mel-Frequency Cepstral Coefficients). The experimental results showed that the proposed system significantly improves whisper recognition accuracy and outperforms traditional HMM-MFCC baseline, resulting in an absolute 31% improvement of whisper recognition accuracy. The highest word recognition rate of 92.81% in whispered speech was achieved with TECC feature.
AbstractList Recently Deep Denoising Autoencoders (DDAE) have shown state-of-the-art performance on various machine learning tasks. In this paper, the authors extended this approach to whispered speech recognition which is one of the most challenging problems in Automatic Speech Recognition (ASR). Namely, due to the profound differences between acoustic characteristics of neutral and whispered speech, the performance of traditional ASR systems trained on neutral speech degrades significantly when whisper is applied. This mismatch between training and testing is successfully alleviated with the new proposed system based on deep learning, where DDAE is applied for generating whisper-robust cepstral features. This system was tested and compared in terms of word recognition accuracy with conventional Hidden Markov Model (HMM) speech recognizer in an isolated word recognition task with a real database of whispered speech (WhiSpe). Three types of cepstral coefficients were used in the experiments: MFCC (Mel-Frequency Cepstral Coefficients), TECC (Teager-Energy Cepstral Coefficients) and TEMFCC (Teager-based Mel-Frequency Cepstral Coefficients). The experimental results showed that the proposed system significantly improves whisper recognition accuracy and outperforms traditional HMM-MFCC baseline, resulting in an absolute 31% improvement of whisper recognition accuracy. The highest word recognition rate of 92.81% in whispered speech was achieved with TECC feature.
Author Subotić, Miško
Jovičić, Slobodan T.
Grozdić, Đorđe T.
Author_xml – sequence: 1
  givenname: Đorđe T.
  surname: Grozdić
  fullname: Grozdić, Đorđe T.
  email: djordjegrozdic@gmail.com
  organization: School of Electrical Engineering, University of Belgrade, Bulevar Kralja Aleksandra 73, 11000 Belgrade, Serbia
– sequence: 2
  givenname: Slobodan T.
  surname: Jovičić
  fullname: Jovičić, Slobodan T.
  email: jovicic@etf.rs
  organization: School of Electrical Engineering, University of Belgrade, Bulevar Kralja Aleksandra 73, 11000 Belgrade, Serbia
– sequence: 3
  givenname: Miško
  surname: Subotić
  fullname: Subotić, Miško
  email: ifp2@ikomline.net
  organization: Life Activities Advancement Center, Laboratory for Forensic Acoustics and Phonetics, Gospodar Jovanova 35, 11000 Belgrade, Serbia
BookMark eNqFkM1KAzEUhYNUsK2-gswLzJhk8jegoBT_oOBGcRkydzJtSk2GZCr49qZWN266uYe7-A6cb4YmPniL0CXBFcFEXG0q61dmGIyraP4rQitM6AmaEiXrUkjRTNAUN5yWpJHiDM1S2mCMa8XEFN28r10abLRdkcPCuogWwsq70QVf7JLzq6KzdsjHB_fzmt0YrIfQ2XiOTnuzTfbiN-fo7eH-dfFULl8enxd3yxJqQsfSgKzbvgWgDVNc9bIRHBgxIAzjDBiWLWcSuGo4qFYpBhj3lJCWCd6BZPUciUMvxJBStL0eovsw8UsTrPcS9Eb_SdB7CZpQnSVk8PofCG40-21jNG57HL894DaP-3Q26gQub7edy5pG3QV3rOIbukZ_zA
CitedBy_id crossref_primary_10_1016_j_iswa_2022_200066
crossref_primary_10_1016_j_optlastec_2023_109417
crossref_primary_10_1016_j_apacoust_2020_107573
crossref_primary_10_1016_j_asoc_2019_105904
crossref_primary_10_1155_2022_8279856
crossref_primary_10_3390_app13127008
crossref_primary_10_1016_j_seta_2019_100601
crossref_primary_10_1016_j_knosys_2019_104874
crossref_primary_10_2478_amns_2023_2_01464
crossref_primary_10_1007_s10772_018_9502_0
crossref_primary_10_1007_s00521_021_05767_4
crossref_primary_10_1016_j_compag_2019_02_021
crossref_primary_10_1016_j_compstruct_2022_116263
crossref_primary_10_1007_s00521_021_05878_y
crossref_primary_10_1038_s41598_022_22075_6
crossref_primary_10_1109_ACCESS_2024_3414435
crossref_primary_10_3390_app13074331
crossref_primary_10_1007_s11042_023_15598_1
crossref_primary_10_4316_AECE_2023_03001
crossref_primary_10_1007_s11227_024_06098_6
crossref_primary_10_1109_ACCESS_2018_2820510
crossref_primary_10_1007_s00521_018_3623_x
crossref_primary_10_1007_s11042_017_5174_z
crossref_primary_10_1016_j_ins_2021_01_064
crossref_primary_10_1007_s43670_023_00053_x
crossref_primary_10_3390_app12104841
crossref_primary_10_1121_10_0003339
crossref_primary_10_3390_app14188223
crossref_primary_10_1063_1_5057725
crossref_primary_10_1016_j_engappai_2020_103903
crossref_primary_10_1016_j_csl_2023_101549
crossref_primary_10_1016_j_cegh_2018_12_004
crossref_primary_10_1111_coin_12281
crossref_primary_10_1016_j_engappai_2024_108685
crossref_primary_10_1016_j_engappai_2017_09_002
crossref_primary_10_4316_AECE_2017_01004
crossref_primary_10_1007_s41365_018_0402_4
crossref_primary_10_1016_j_asoc_2022_109785
crossref_primary_10_1016_j_asoc_2020_107003
Cites_doi 10.1109/ICASSP.1998.674489
10.21437/Interspeech.2004-565
10.21437/Interspeech.2014-232
10.1109/NEUREL.2014.7011492
10.21437/Interspeech.2005-142
10.1109/ICASSP.2014.6854059
10.21437/Interspeech.2014-294
10.1109/ICASSP.1990.115702
10.1109/ISCAS.2013.6571843
10.1109/ICASSP.2010.5495022
10.1007/978-3-319-11581-8_31
10.1162/089976602760128018
10.21437/Interspeech.2007-621
10.1561/2000000039
10.1186/1687-6180-2012-157
10.1007/978-3-642-40585-3_74
10.1016/j.specom.2003.10.005
10.1016/j.jvoice.2006.08.012
10.1561/2200000006
10.1109/ISCSLP.2012.6423522
10.1109/TASL.2009.2034770
10.1109/TELFOR.2012.6419311
10.1109/TASSP.1980.1163453
10.1145/1390156.1390294
10.1044/jshr.2702.251
10.21437/Interspeech.2014-380
10.1109/ICASSP.2013.6639243
10.1109/TASL.2010.2066967
10.1109/ICASSP.2015.7178927
10.1186/s13634-015-0246-6
10.1109/ICASSP.1993.319457
10.1016/j.engappai.2013.03.013
10.1109/TASL.2010.2091631
ContentType Journal Article
Copyright 2016 Elsevier Ltd
Copyright_xml – notice: 2016 Elsevier Ltd
DBID AAYXX
CITATION
DOI 10.1016/j.engappai.2016.12.012
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
Computer Science
EISSN 1873-6769
EndPage 22
ExternalDocumentID 10_1016_j_engappai_2016_12_012
S0952197616302391
GroupedDBID --K
--M
.DC
.~1
0R~
1B1
1~.
1~5
29G
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
AABNK
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
ABBOA
ABMAC
ABXDB
ABYKQ
ACDAQ
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADJOM
ADMUD
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
IHE
J1W
JJJVA
KOM
LG9
LY7
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
ROL
RPZ
SBC
SDF
SDG
SDP
SES
SET
SEW
SPC
SPCBC
SST
SSV
SSZ
T5K
TN5
UHS
WUQ
ZMT
~G-
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABJNI
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
ID FETCH-LOGICAL-c312t-ac73bfbcc294858f7965c41ac6a454c407b547c5895c8b884c00f211b465dc743
ISICitedReferencesCount 56
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000393937400002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0952-1976
IngestDate Sat Nov 29 02:17:56 EST 2025
Tue Nov 18 21:59:32 EST 2025
Fri Feb 23 02:28:55 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Deep learning
Deep denoising autoencoder (DDAE)
Automatic speech recognition (ASR)
Whispered speech
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c312t-ac73bfbcc294858f7965c41ac6a454c407b547c5895c8b884c00f211b465dc743
PageCount 8
ParticipantIDs crossref_primary_10_1016_j_engappai_2016_12_012
crossref_citationtrail_10_1016_j_engappai_2016_12_012
elsevier_sciencedirect_doi_10_1016_j_engappai_2016_12_012
PublicationCentury 2000
PublicationDate March 2017
2017-03-00
PublicationDateYYYYMMDD 2017-03-01
PublicationDate_xml – month: 03
  year: 2017
  text: March 2017
PublicationDecade 2010
PublicationTitle Engineering applications of artificial intelligence
PublicationYear 2017
Publisher Elsevier Ltd
Publisher_xml – name: Elsevier Ltd
References Lee, P.X., Wee, D., Si, H., Toh, Y., Lim, B.P., Chen, N., Ma, B., College, V.J., 2014. A whispered Mandarin corpus for speech technology applications, In: Proceedings of the Annual Conference International Speech Communication Association INTERSPEECH, 2014. Singapore, pp. 1598–1602.
Ghaffarzadegan, S., Boril, H., Hansen, J.H.L., 2014b. UT-Vocal Effort II: Analysis and constrained-lexicon recognition of whispered speech, In: Proceedings of the ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing. Florence, Italy, pp. 2544–2548.
Bengio (bib1) 2009; 2
Dimitriadis, D., Maragos, P., Potamianos, A., 2005. Auditory Teager energy cepstrum coefficients for robust speech recognition, In: Proceedings of European Speech Processing Conference. Lisbon, Portugal, pp. 3013–3016.
Ito, Takeda, Itakura (bib16) 2005; 45

Lim (bib25) 2011
Marković, B., Jovic̆ić, S.T., Galić, J., Grozdić, Đ., 2013. Whispered speech database: Design, processing and application. In: Proceedings of the 16th International Conference, TSD 2013. Pilsen, Czech Republic, pp. 591–598.
Shahin (bib31) 2013; 26
Ghaffarzadegan, S., Boril, H., Hansen, J.H.L., 2015. Generative modeling of pseudo-target domain adaptation samples for whispered speech recognition, In: Proceedings of the ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing. Brisbane, Australia, pp. 5024–5024.
Jovičić, S.T., Kašić, Z., Đorđević, M., Rajković, M., 2004. Serbian emotional speech database: design, processing and evaluation, In: Proceedings of the 9th International Conference on Speech and Computer SPECOM 2004. St. Petersburg, Russia, pp. 77–81.
Qi, J., Wang, D., Jiang, Y., Liu, R., 2013. Auditory features based on Gammatone filters for robust speech recognition, In: Proceedings of the IEEE International Symposium Circuits Systems Beijing, China, pp. 305–308.
Kaiser, J.F., 1990. On a simple algorithm to calculate the ‘energy’ of a signal. In: Proceedings of the IEEE International Conference Acoustic Speech Signal Process. Albuquerque, USA, pp. 381–384.
Ghaffarzadegan, S., Bořil, H., Hansen, J.H.L., 2014a. Model and feature based compensation for whispered speech recognition, In: Proceedings of the Annual Conference International Speech Communication Association INTERSPEECH, 2014. Singapore, pp. 2420–2424.
Teager (bib33) 1980; 28
Zhang, Hansen (bib38) 2011; 19
Young, Evermann, Kershaw, Moore, Odell, Ollason, Povey, Valtchev, Woodland (bib37) 2002
Jovičić, Šarić (bib19) 2008; 22
Fan, Hansen (bib5) 2011; 19
Vincent, P., Larochelle, H., Bengio, Y., Manzagol P., 2008. Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th Interntaionsal Conference on Machine Learning, ICML 2008. Helsinki, Finland, pp. 1096–1103.
Zhang, C., Hansen, J.H.L., 2007. Analysis and classification of speech mode: Whispered through shouted. In: Proceedings of the 8th Annual International Conference on Speech Communications Association Interspeech 2007. Int. Speech Commun. Assoc. Antwerp, Belgium, pp. 2396–2399.
Deng, Yu (bib3) 2014; 7
Mimura, Sakai, Kawahara (bib28) 2015; 2015
Morris (bib29) 2003
Zhou, G., Hansen, J., Kaiser, J., 1998. Classification of speech under stress based on features derived from the nonlinear Teager energy operator. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing. Seattle, USA, pp. 2–5.
Heracleous (bib14) 2009; 5
.
Mathur, Reddy, Hegde (bib27) 2012; 2012
Grozdić, Đ.T., Jovičić, S.T., Galić, J., Marković, B., 2014. Application of inverse filtering in enhancement of whisper recognition, In: Proceedings of the 12th IEEE Symposium on Neural Network Applications in Electrical Engineering (NEUREL). Belgrade, Serbia, pp. 157–162.
Kaiser, J.F., 1993. Some useful properties of Teager’s energy operators, In: Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing. Minneapolis, USA, pp. 149–152.
Hahm, S., Bořil, H., Angkititrakul, P., Hansen, J.H.L., 2013. Advanced feature normalization and rapid model adaptation for robust in-vehicle speech recognition, In: Proceedings of the 6th Biennial Workshop on Digital Signal Processing for In-Vehicle Systems. Seoul, Korea, pp. 14–17.
Jovicic (bib18) 1998; 84
Tao, F., Busso, C., 2014. Lipreading approach for isolated digits recognition under whisper and neutral speech. In: Proceedings of the Annual Conference International Speech Communication Association INTERSPEECH, 2014. Singapore, pp. 1154–1158.
Hinton (bib15) 2002; 14
Galić, J., Jovičić, S.T., Grozdić, Đ., Marković, B., 2014b. Constrained lexicon dpeaker dependent recognition of whispered speech, In: Proceedings of the 10th International Symposium on Industrial Electronics INDEL. Banja Luka, BIH, pp. 180–184.
Tran, T., Mariooryad, S., Busso, C., 2013. Audiovisual corpus to analyze whisper speech. In: Proceedings of the ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada, pp. 8101–8105.
Galić, J., Jovičić, S.T., Grozdić, Đ., Marković, B., 2014a. HTK-based recognition of whispered speech, In: Proceedings of the 16th International Conference on Speech and Computer, SPECOM 2014. Novi Sad, Serbia, pp. 251–258.
Jou, S., Schultz, T., Waibel, A., 2004. Adaptation for soft whisper recognition using a throat microphone, In: Proceedings Annual Conference International Speech Communication Association INTERSPEECH. Jeju Island, Korea, pp. 5–8.
Bořil, Hansen (bib2) 2010; 18
Zhang, C., Hansen, J.H.L., 2010. Advancements in whisper-island detection using the linear predictive residual. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Dallas, USA, pp. 5170–5173.
Kallail, Emanuel (bib23) 1984; 27
Yang, C.Y., Brown, G., Lu, L., Yamagishi, J., King, S., 2012. Noise-robust whispered speech recognition using a non-audible-murmur microphone with VTS compensation. In: Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012. Hong Kong, China, pp. 220–223.
Grozdić, Đ.T. , Marković, B., Galić, J., Jovicić, S.T., 2012. Application of neural networks in whispered speech recognition, In: Proceedings of the 20th IEEE TelecommunicationsForum (TELFOR). Belgrade, Serbia, pp. 728–731.
Mimura (10.1016/j.engappai.2016.12.012_bib28) 2015; 2015
10.1016/j.engappai.2016.12.012_bib30
10.1016/j.engappai.2016.12.012_bib10
10.1016/j.engappai.2016.12.012_bib32
10.1016/j.engappai.2016.12.012_bib11
10.1016/j.engappai.2016.12.012_bib12
10.1016/j.engappai.2016.12.012_bib34
10.1016/j.engappai.2016.12.012_bib13
10.1016/j.engappai.2016.12.012_bib35
10.1016/j.engappai.2016.12.012_bib36
10.1016/j.engappai.2016.12.012_bib17
10.1016/j.engappai.2016.12.012_bib39
Teager (10.1016/j.engappai.2016.12.012_bib33) 1980; 28
Bengio (10.1016/j.engappai.2016.12.012_bib1) 2009; 2
Ito (10.1016/j.engappai.2016.12.012_bib16) 2005; 45
Young (10.1016/j.engappai.2016.12.012_bib37) 2002
Fan (10.1016/j.engappai.2016.12.012_bib5) 2011; 19
Kallail (10.1016/j.engappai.2016.12.012_bib23) 1984; 27
Morris (10.1016/j.engappai.2016.12.012_bib29) 2003
Lim (10.1016/j.engappai.2016.12.012_bib25) 2011
10.1016/j.engappai.2016.12.012_bib4
10.1016/j.engappai.2016.12.012_bib7
10.1016/j.engappai.2016.12.012_bib6
10.1016/j.engappai.2016.12.012_bib40
10.1016/j.engappai.2016.12.012_bib41
10.1016/j.engappai.2016.12.012_bib20
Jovicic (10.1016/j.engappai.2016.12.012_bib18) 1998; 84
10.1016/j.engappai.2016.12.012_bib21
10.1016/j.engappai.2016.12.012_bib22
10.1016/j.engappai.2016.12.012_bib24
10.1016/j.engappai.2016.12.012_bib26
Zhang (10.1016/j.engappai.2016.12.012_bib38) 2011; 19
Hinton (10.1016/j.engappai.2016.12.012_bib15) 2002; 14
Shahin (10.1016/j.engappai.2016.12.012_bib31) 2013; 26
Mathur (10.1016/j.engappai.2016.12.012_bib27) 2012; 2012
Bořil (10.1016/j.engappai.2016.12.012_bib2) 2010; 18
Deng (10.1016/j.engappai.2016.12.012_bib3) 2014; 7
Heracleous (10.1016/j.engappai.2016.12.012_bib14) 2009; 5
Jovičić (10.1016/j.engappai.2016.12.012_bib19) 2008; 22
10.1016/j.engappai.2016.12.012_bib9
10.1016/j.engappai.2016.12.012_bib8
References_xml – reference: Ghaffarzadegan, S., Bořil, H., Hansen, J.H.L., 2014a. Model and feature based compensation for whispered speech recognition, In: Proceedings of the Annual Conference International Speech Communication Association INTERSPEECH, 2014. Singapore, pp. 2420–2424.
– volume: 18
  start-page: 1379
  year: 2010
  end-page: 1393
  ident: bib2
  article-title: Unsupervised equalization of lombard effect for speech recognition in noisy adverse environments
  publication-title: IEEE Trans. Audio Speech Lang. Process.
– volume: 26
  start-page: 1652
  year: 2013
  end-page: 1659
  ident: bib31
  article-title: Speaker identification in emotional talking envionments based on CSPHMM2s
  publication-title: Eng. Appl. Artif. Intell.
– reference: Marković, B., Jovic̆ić, S.T., Galić, J., Grozdić, Đ., 2013. Whispered speech database: Design, processing and application. In: Proceedings of the 16th International Conference, TSD 2013. Pilsen, Czech Republic, pp. 591–598. 〈
– volume: 5
  start-page: 31
  year: 2009
  end-page: 37
  ident: bib14
  article-title: Using Teager energy cepstrum and HMM distances in automatic speech recognition and analysis of unvoiced speech
  publication-title: Int J. Inf. Commun. Eng.
– reference: Hahm, S., Bořil, H., Angkititrakul, P., Hansen, J.H.L., 2013. Advanced feature normalization and rapid model adaptation for robust in-vehicle speech recognition, In: Proceedings of the 6th Biennial Workshop on Digital Signal Processing for In-Vehicle Systems. Seoul, Korea, pp. 14–17.
– reference: Vincent, P., Larochelle, H., Bengio, Y., Manzagol P., 2008. Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th Interntaionsal Conference on Machine Learning, ICML 2008. Helsinki, Finland, pp. 1096–1103.
– volume: 2015
  start-page: 62
  year: 2015
  ident: bib28
  article-title: Reverberant speech recognition combining deep neural networks and deep autoencoders augmented with a phone-class feature
  publication-title: EURASIP J. Adv. Signal Process.
– volume: 27
  start-page: 245
  year: 1984
  end-page: 251
  ident: bib23
  article-title: Formant-frequency differences between isolated whispered and phonated vowel samples produced by adult female subjects
  publication-title: J. Speech Hear. Res.
– volume: 2
  start-page: 1
  year: 2009
  end-page: 127
  ident: bib1
  article-title: Learning deep architectures for AI
  publication-title: Found. Trends Mach. Learn.
– reference: Qi, J., Wang, D., Jiang, Y., Liu, R., 2013. Auditory features based on Gammatone filters for robust speech recognition, In: Proceedings of the IEEE International Symposium Circuits Systems Beijing, China, pp. 305–308. 〈
– reference: Tao, F., Busso, C., 2014. Lipreading approach for isolated digits recognition under whisper and neutral speech. In: Proceedings of the Annual Conference International Speech Communication Association INTERSPEECH, 2014. Singapore, pp. 1154–1158.
– reference: Ghaffarzadegan, S., Boril, H., Hansen, J.H.L., 2015. Generative modeling of pseudo-target domain adaptation samples for whispered speech recognition, In: Proceedings of the ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing. Brisbane, Australia, pp. 5024–5024.
– year: 2003
  ident: bib29
  article-title: Enhancement and Recognition of Whispered Speech
– volume: 19
  start-page: 1408
  year: 2011
  end-page: 1421
  ident: bib5
  article-title: Speaker identification within whispered speech audio streams
  publication-title: IEEE Trans. Audio, Speech Lang. Process.
– reference: Grozdić, Đ.T., Jovičić, S.T., Galić, J., Marković, B., 2014. Application of inverse filtering in enhancement of whisper recognition, In: Proceedings of the 12th IEEE Symposium on Neural Network Applications in Electrical Engineering (NEUREL). Belgrade, Serbia, pp. 157–162. 〈
– reference: 〉.
– reference: Zhang, C., Hansen, J.H.L., 2010. Advancements in whisper-island detection using the linear predictive residual. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Dallas, USA, pp. 5170–5173. 〈
– reference: Jovičić, S.T., Kašić, Z., Đorđević, M., Rajković, M., 2004. Serbian emotional speech database: design, processing and evaluation, In: Proceedings of the 9th International Conference on Speech and Computer SPECOM 2004. St. Petersburg, Russia, pp. 77–81.
– year: 2002
  ident: bib37
  article-title: The HTK Book (for HTK Version3.2), Techn. Report
– reference: Galić, J., Jovičić, S.T., Grozdić, Đ., Marković, B., 2014b. Constrained lexicon dpeaker dependent recognition of whispered speech, In: Proceedings of the 10th International Symposium on Industrial Electronics INDEL. Banja Luka, BIH, pp. 180–184.
– volume: 22
  start-page: 263
  year: 2008
  end-page: 274
  ident: bib19
  article-title: Acoustic analysis of consonants in whispered speech
  publication-title: J. Voice
– volume: 2012
  start-page: 157
  year: 2012
  ident: bib27
  article-title: Significance of parametric spectral ratio methods in detection and recognition of whispered speech
  publication-title: EURASIP J. Adv. Signal Process.
– volume: 14
  start-page: 1771
  year: 2002
  end-page: 1800
  ident: bib15
  article-title: Training products of experts by minimizing contrastive divergence
  publication-title: Neural Comput.
– reference:
– reference: Tran, T., Mariooryad, S., Busso, C., 2013. Audiovisual corpus to analyze whisper speech. In: Proceedings of the ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada, pp. 8101–8105. 〈
– reference: Zhang, C., Hansen, J.H.L., 2007. Analysis and classification of speech mode: Whispered through shouted. In: Proceedings of the 8th Annual International Conference on Speech Communications Association Interspeech 2007. Int. Speech Commun. Assoc. Antwerp, Belgium, pp. 2396–2399.
– reference: Zhou, G., Hansen, J., Kaiser, J., 1998. Classification of speech under stress based on features derived from the nonlinear Teager energy operator. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing. Seattle, USA, pp. 2–5. 〈
– reference: Grozdić, Đ.T. , Marković, B., Galić, J., Jovicić, S.T., 2012. Application of neural networks in whispered speech recognition, In: Proceedings of the 20th IEEE TelecommunicationsForum (TELFOR). Belgrade, Serbia, pp. 728–731. 〈
– volume: 7
  start-page: 197
  year: 2014
  end-page: 387
  ident: bib3
  article-title: Deep learning: mehtods and applications
  publication-title: Found. Trends Signal Process.
– reference: Ghaffarzadegan, S., Boril, H., Hansen, J.H.L., 2014b. UT-Vocal Effort II: Analysis and constrained-lexicon recognition of whispered speech, In: Proceedings of the ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing. Florence, Italy, pp. 2544–2548. 〈
– volume: 84
  start-page: 739
  year: 1998
  end-page: 743
  ident: bib18
  article-title: Formant feature differences between whispered and voiced sustained vowels
  publication-title: Acustica
– volume: 19
  start-page: 883
  year: 2011
  end-page: 894
  ident: bib38
  article-title: Whisper-island detection based on unsupervised segmentation with entropy-based speech feature processing
  publication-title: IEEE Trans. Audio Speech Lang. Process.
– year: 2011
  ident: bib25
  article-title: Computational Differences between Whispered and Non-whispered Speech
– volume: 28
  start-page: 599
  year: 1980
  end-page: 601
  ident: bib33
  article-title: Some observations on oral air flow during phonation
  publication-title: IEEE Trans. Acoust.
– reference: Kaiser, J.F., 1990. On a simple algorithm to calculate the ‘energy’ of a signal. In: Proceedings of the IEEE International Conference Acoustic Speech Signal Process. Albuquerque, USA, pp. 381–384.
– reference: Yang, C.Y., Brown, G., Lu, L., Yamagishi, J., King, S., 2012. Noise-robust whispered speech recognition using a non-audible-murmur microphone with VTS compensation. In: Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012. Hong Kong, China, pp. 220–223. 〈
– reference: Galić, J., Jovičić, S.T., Grozdić, Đ., Marković, B., 2014a. HTK-based recognition of whispered speech, In: Proceedings of the 16th International Conference on Speech and Computer, SPECOM 2014. Novi Sad, Serbia, pp. 251–258. 〈
– volume: 45
  start-page: 139
  year: 2005
  end-page: 152
  ident: bib16
  article-title: Analysis and recognition of whispered speech
  publication-title: Speech Commun.
– reference: Lee, P.X., Wee, D., Si, H., Toh, Y., Lim, B.P., Chen, N., Ma, B., College, V.J., 2014. A whispered Mandarin corpus for speech technology applications, In: Proceedings of the Annual Conference International Speech Communication Association INTERSPEECH, 2014. Singapore, pp. 1598–1602.
– reference: Dimitriadis, D., Maragos, P., Potamianos, A., 2005. Auditory Teager energy cepstrum coefficients for robust speech recognition, In: Proceedings of European Speech Processing Conference. Lisbon, Portugal, pp. 3013–3016.
– reference: Jou, S., Schultz, T., Waibel, A., 2004. Adaptation for soft whisper recognition using a throat microphone, In: Proceedings Annual Conference International Speech Communication Association INTERSPEECH. Jeju Island, Korea, pp. 5–8.
– reference: Kaiser, J.F., 1993. Some useful properties of Teager’s energy operators, In: Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing. Minneapolis, USA, pp. 149–152. 〈
– ident: 10.1016/j.engappai.2016.12.012_bib41
  doi: 10.1109/ICASSP.1998.674489
– ident: 10.1016/j.engappai.2016.12.012_bib17
  doi: 10.21437/Interspeech.2004-565
– ident: 10.1016/j.engappai.2016.12.012_bib8
  doi: 10.21437/Interspeech.2014-232
– ident: 10.1016/j.engappai.2016.12.012_bib12
  doi: 10.1109/NEUREL.2014.7011492
– ident: 10.1016/j.engappai.2016.12.012_bib4
  doi: 10.21437/Interspeech.2005-142
– ident: 10.1016/j.engappai.2016.12.012_bib9
  doi: 10.1109/ICASSP.2014.6854059
– ident: 10.1016/j.engappai.2016.12.012_bib32
  doi: 10.21437/Interspeech.2014-294
– ident: 10.1016/j.engappai.2016.12.012_bib21
  doi: 10.1109/ICASSP.1990.115702
– ident: 10.1016/j.engappai.2016.12.012_bib30
  doi: 10.1109/ISCAS.2013.6571843
– ident: 10.1016/j.engappai.2016.12.012_bib40
  doi: 10.1109/ICASSP.2010.5495022
– ident: 10.1016/j.engappai.2016.12.012_bib7
  doi: 10.1007/978-3-319-11581-8_31
– volume: 14
  start-page: 1771
  year: 2002
  ident: 10.1016/j.engappai.2016.12.012_bib15
  article-title: Training products of experts by minimizing contrastive divergence
  publication-title: Neural Comput.
  doi: 10.1162/089976602760128018
– ident: 10.1016/j.engappai.2016.12.012_bib6
  doi: 10.1007/978-3-319-11581-8_31
– ident: 10.1016/j.engappai.2016.12.012_bib39
  doi: 10.21437/Interspeech.2007-621
– volume: 7
  start-page: 197
  year: 2014
  ident: 10.1016/j.engappai.2016.12.012_bib3
  article-title: Deep learning: mehtods and applications
  publication-title: Found. Trends Signal Process.
  doi: 10.1561/2000000039
– year: 2003
  ident: 10.1016/j.engappai.2016.12.012_bib29
– year: 2002
  ident: 10.1016/j.engappai.2016.12.012_bib37
– volume: 2012
  start-page: 157
  year: 2012
  ident: 10.1016/j.engappai.2016.12.012_bib27
  article-title: Significance of parametric spectral ratio methods in detection and recognition of whispered speech
  publication-title: EURASIP J. Adv. Signal Process.
  doi: 10.1186/1687-6180-2012-157
– volume: 84
  start-page: 739
  year: 1998
  ident: 10.1016/j.engappai.2016.12.012_bib18
  article-title: Formant feature differences between whispered and voiced sustained vowels
  publication-title: Acustica
– ident: 10.1016/j.engappai.2016.12.012_bib20
– ident: 10.1016/j.engappai.2016.12.012_bib26
  doi: 10.1007/978-3-642-40585-3_74
– volume: 45
  start-page: 139
  year: 2005
  ident: 10.1016/j.engappai.2016.12.012_bib16
  article-title: Analysis and recognition of whispered speech
  publication-title: Speech Commun.
  doi: 10.1016/j.specom.2003.10.005
– volume: 22
  start-page: 263
  year: 2008
  ident: 10.1016/j.engappai.2016.12.012_bib19
  article-title: Acoustic analysis of consonants in whispered speech
  publication-title: J. Voice
  doi: 10.1016/j.jvoice.2006.08.012
– volume: 2
  start-page: 1
  year: 2009
  ident: 10.1016/j.engappai.2016.12.012_bib1
  article-title: Learning deep architectures for AI
  publication-title: Found. Trends Mach. Learn.
  doi: 10.1561/2200000006
– ident: 10.1016/j.engappai.2016.12.012_bib36
  doi: 10.1109/ISCSLP.2012.6423522
– volume: 18
  start-page: 1379
  year: 2010
  ident: 10.1016/j.engappai.2016.12.012_bib2
  article-title: Unsupervised equalization of lombard effect for speech recognition in noisy adverse environments
  publication-title: IEEE Trans. Audio Speech Lang. Process.
  doi: 10.1109/TASL.2009.2034770
– ident: 10.1016/j.engappai.2016.12.012_bib11
  doi: 10.1109/TELFOR.2012.6419311
– volume: 28
  start-page: 599
  year: 1980
  ident: 10.1016/j.engappai.2016.12.012_bib33
  article-title: Some observations on oral air flow during phonation
  publication-title: IEEE Trans. Acoust.
  doi: 10.1109/TASSP.1980.1163453
– ident: 10.1016/j.engappai.2016.12.012_bib35
  doi: 10.1145/1390156.1390294
– ident: 10.1016/j.engappai.2016.12.012_bib13
– volume: 27
  start-page: 245
  year: 1984
  ident: 10.1016/j.engappai.2016.12.012_bib23
  article-title: Formant-frequency differences between isolated whispered and phonated vowel samples produced by adult female subjects
  publication-title: J. Speech Hear. Res.
  doi: 10.1044/jshr.2702.251
– ident: 10.1016/j.engappai.2016.12.012_bib24
  doi: 10.21437/Interspeech.2014-380
– ident: 10.1016/j.engappai.2016.12.012_bib34
  doi: 10.1109/ICASSP.2013.6639243
– volume: 19
  start-page: 883
  year: 2011
  ident: 10.1016/j.engappai.2016.12.012_bib38
  article-title: Whisper-island detection based on unsupervised segmentation with entropy-based speech feature processing
  publication-title: IEEE Trans. Audio Speech Lang. Process.
  doi: 10.1109/TASL.2010.2066967
– year: 2011
  ident: 10.1016/j.engappai.2016.12.012_bib25
– ident: 10.1016/j.engappai.2016.12.012_bib10
  doi: 10.1109/ICASSP.2015.7178927
– volume: 5
  start-page: 31
  issue: 1
  year: 2009
  ident: 10.1016/j.engappai.2016.12.012_bib14
  article-title: Using Teager energy cepstrum and HMM distances in automatic speech recognition and analysis of unvoiced speech
  publication-title: Int J. Inf. Commun. Eng.
– volume: 2015
  start-page: 62
  year: 2015
  ident: 10.1016/j.engappai.2016.12.012_bib28
  article-title: Reverberant speech recognition combining deep neural networks and deep autoencoders augmented with a phone-class feature
  publication-title: EURASIP J. Adv. Signal Process.
  doi: 10.1186/s13634-015-0246-6
– ident: 10.1016/j.engappai.2016.12.012_bib22
  doi: 10.1109/ICASSP.1993.319457
– volume: 26
  start-page: 1652
  issue: 2013
  year: 2013
  ident: 10.1016/j.engappai.2016.12.012_bib31
  article-title: Speaker identification in emotional talking envionments based on CSPHMM2s
  publication-title: Eng. Appl. Artif. Intell.
  doi: 10.1016/j.engappai.2013.03.013
– volume: 19
  start-page: 1408
  year: 2011
  ident: 10.1016/j.engappai.2016.12.012_bib5
  article-title: Speaker identification within whispered speech audio streams
  publication-title: IEEE Trans. Audio, Speech Lang. Process.
  doi: 10.1109/TASL.2010.2091631
SSID ssj0003846
Score 2.3954587
Snippet Recently Deep Denoising Autoencoders (DDAE) have shown state-of-the-art performance on various machine learning tasks. In this paper, the authors extended this...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 15
SubjectTerms Automatic speech recognition (ASR)
Deep denoising autoencoder (DDAE)
Deep learning
Whispered speech
Title Whispered speech recognition using deep denoising autoencoder
URI https://dx.doi.org/10.1016/j.engappai.2016.12.012
Volume 59
WOSCitedRecordID wos000393937400002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1873-6769
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0003846
  issn: 0952-1976
  databaseCode: AIEXJ
  dateStart: 19950201
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwELZWLQcuQHmI8qhygNMqJXHsODmuoFVBUCGxiL1F8SM0ZZWsttlVxW_hxzJ-JQEqSg9cHK9lO4lnMh7PfjOD0AsqlFQphWOqiFhIUvgUQWvNQpkTCdqIZCU3lH7PTk-zxSL_OJn88L4w2yVrmuzyMl_9V1JDGxBbu87egNz9pNAAdSA6lEB2KP-J8F_Oah39GxRJuChxNu0xQkDojTENSKVWUDRtbX6Wm67V8SylQ-p6Q_0QqnA6_p_bQAfWBmNkMn6MgnoOaJ72u6yNoso0EXUtj9q1ucZqOj8ckDtb2-_NqPunZctbCZJn6AcCru1GXT5Anb6cxd_asd0C9sIeuNUbIHEY5zb7i5fFLjq4FabWz9Nty9Z7-Q-Bb20P54eq-QorUdYarJcaA69DZ_8SYfu3na_HI3qo23nh5yn0PEWMi0insN7FjOYgM3dnb48W7_qdPsmsI5h_k5EH-tVPdLXyM1Jo5vfQHXcSCWaWg_bQRDX30V13KgmczL-AJp_4w7c9QAOPBZbHghGPBYbHAs1jQc9jwYjHHqLPx0fz1yehy8MRiiTGXVgKlvCKC4F1KKGsYnlKBYlLkZaEEkEixilhgmY5FRnPMiKiqMJxzElKpQAV9RHaadpGPUaBJFWFJSacxgnBFeMVA5WdRwJXCakw3UfUL1AhXJB6nStlWfydRPvoVT9uZcO0XDsi9-tfOGXTKpEFsNY1Y5_c-G5P0e3hG3iGdrr1Rj1Ht8S2qy_WB46vfgL2F6D5
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Whispered+speech+recognition+using+deep+denoising+autoencoder&rft.jtitle=Engineering+applications+of+artificial+intelligence&rft.au=Grozdi%C4%87%2C+%C4%90or%C4%91e+T.&rft.au=Jovi%C4%8Di%C4%87%2C+Slobodan+T.&rft.au=Suboti%C4%87%2C+Mi%C5%A1ko&rft.date=2017-03-01&rft.issn=0952-1976&rft.volume=59&rft.spage=15&rft.epage=22&rft_id=info:doi/10.1016%2Fj.engappai.2016.12.012&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_engappai_2016_12_012
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0952-1976&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0952-1976&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0952-1976&client=summon