Whispered speech recognition using deep denoising autoencoder
Recently Deep Denoising Autoencoders (DDAE) have shown state-of-the-art performance on various machine learning tasks. In this paper, the authors extended this approach to whispered speech recognition which is one of the most challenging problems in Automatic Speech Recognition (ASR). Namely, due to...
Uložené v:
| Vydané v: | Engineering applications of artificial intelligence Ročník 59; s. 15 - 22 |
|---|---|
| Hlavní autori: | , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Elsevier Ltd
01.03.2017
|
| Predmet: | |
| ISSN: | 0952-1976, 1873-6769 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Recently Deep Denoising Autoencoders (DDAE) have shown state-of-the-art performance on various machine learning tasks. In this paper, the authors extended this approach to whispered speech recognition which is one of the most challenging problems in Automatic Speech Recognition (ASR). Namely, due to the profound differences between acoustic characteristics of neutral and whispered speech, the performance of traditional ASR systems trained on neutral speech degrades significantly when whisper is applied. This mismatch between training and testing is successfully alleviated with the new proposed system based on deep learning, where DDAE is applied for generating whisper-robust cepstral features. This system was tested and compared in terms of word recognition accuracy with conventional Hidden Markov Model (HMM) speech recognizer in an isolated word recognition task with a real database of whispered speech (WhiSpe). Three types of cepstral coefficients were used in the experiments: MFCC (Mel-Frequency Cepstral Coefficients), TECC (Teager-Energy Cepstral Coefficients) and TEMFCC (Teager-based Mel-Frequency Cepstral Coefficients). The experimental results showed that the proposed system significantly improves whisper recognition accuracy and outperforms traditional HMM-MFCC baseline, resulting in an absolute 31% improvement of whisper recognition accuracy. The highest word recognition rate of 92.81% in whispered speech was achieved with TECC feature. |
|---|---|
| AbstractList | Recently Deep Denoising Autoencoders (DDAE) have shown state-of-the-art performance on various machine learning tasks. In this paper, the authors extended this approach to whispered speech recognition which is one of the most challenging problems in Automatic Speech Recognition (ASR). Namely, due to the profound differences between acoustic characteristics of neutral and whispered speech, the performance of traditional ASR systems trained on neutral speech degrades significantly when whisper is applied. This mismatch between training and testing is successfully alleviated with the new proposed system based on deep learning, where DDAE is applied for generating whisper-robust cepstral features. This system was tested and compared in terms of word recognition accuracy with conventional Hidden Markov Model (HMM) speech recognizer in an isolated word recognition task with a real database of whispered speech (WhiSpe). Three types of cepstral coefficients were used in the experiments: MFCC (Mel-Frequency Cepstral Coefficients), TECC (Teager-Energy Cepstral Coefficients) and TEMFCC (Teager-based Mel-Frequency Cepstral Coefficients). The experimental results showed that the proposed system significantly improves whisper recognition accuracy and outperforms traditional HMM-MFCC baseline, resulting in an absolute 31% improvement of whisper recognition accuracy. The highest word recognition rate of 92.81% in whispered speech was achieved with TECC feature. |
| Author | Subotić, Miško Jovičić, Slobodan T. Grozdić, Đorđe T. |
| Author_xml | – sequence: 1 givenname: Đorđe T. surname: Grozdić fullname: Grozdić, Đorđe T. email: djordjegrozdic@gmail.com organization: School of Electrical Engineering, University of Belgrade, Bulevar Kralja Aleksandra 73, 11000 Belgrade, Serbia – sequence: 2 givenname: Slobodan T. surname: Jovičić fullname: Jovičić, Slobodan T. email: jovicic@etf.rs organization: School of Electrical Engineering, University of Belgrade, Bulevar Kralja Aleksandra 73, 11000 Belgrade, Serbia – sequence: 3 givenname: Miško surname: Subotić fullname: Subotić, Miško email: ifp2@ikomline.net organization: Life Activities Advancement Center, Laboratory for Forensic Acoustics and Phonetics, Gospodar Jovanova 35, 11000 Belgrade, Serbia |
| BookMark | eNqFkM1KAzEUhYNUsK2-gswLzJhk8jegoBT_oOBGcRkydzJtSk2GZCr49qZWN266uYe7-A6cb4YmPniL0CXBFcFEXG0q61dmGIyraP4rQitM6AmaEiXrUkjRTNAUN5yWpJHiDM1S2mCMa8XEFN28r10abLRdkcPCuogWwsq70QVf7JLzq6KzdsjHB_fzmt0YrIfQ2XiOTnuzTfbiN-fo7eH-dfFULl8enxd3yxJqQsfSgKzbvgWgDVNc9bIRHBgxIAzjDBiWLWcSuGo4qFYpBhj3lJCWCd6BZPUciUMvxJBStL0eovsw8UsTrPcS9Eb_SdB7CZpQnSVk8PofCG40-21jNG57HL894DaP-3Q26gQub7edy5pG3QV3rOIbukZ_zA |
| CitedBy_id | crossref_primary_10_1016_j_iswa_2022_200066 crossref_primary_10_1016_j_optlastec_2023_109417 crossref_primary_10_1016_j_apacoust_2020_107573 crossref_primary_10_1016_j_asoc_2019_105904 crossref_primary_10_1155_2022_8279856 crossref_primary_10_3390_app13127008 crossref_primary_10_1016_j_seta_2019_100601 crossref_primary_10_1016_j_knosys_2019_104874 crossref_primary_10_2478_amns_2023_2_01464 crossref_primary_10_1007_s10772_018_9502_0 crossref_primary_10_1007_s00521_021_05767_4 crossref_primary_10_1016_j_compag_2019_02_021 crossref_primary_10_1016_j_compstruct_2022_116263 crossref_primary_10_1007_s00521_021_05878_y crossref_primary_10_1038_s41598_022_22075_6 crossref_primary_10_1109_ACCESS_2024_3414435 crossref_primary_10_3390_app13074331 crossref_primary_10_1007_s11042_023_15598_1 crossref_primary_10_4316_AECE_2023_03001 crossref_primary_10_1007_s11227_024_06098_6 crossref_primary_10_1109_ACCESS_2018_2820510 crossref_primary_10_1007_s00521_018_3623_x crossref_primary_10_1007_s11042_017_5174_z crossref_primary_10_1016_j_ins_2021_01_064 crossref_primary_10_1007_s43670_023_00053_x crossref_primary_10_3390_app12104841 crossref_primary_10_1121_10_0003339 crossref_primary_10_3390_app14188223 crossref_primary_10_1063_1_5057725 crossref_primary_10_1016_j_engappai_2020_103903 crossref_primary_10_1016_j_csl_2023_101549 crossref_primary_10_1016_j_cegh_2018_12_004 crossref_primary_10_1111_coin_12281 crossref_primary_10_1016_j_engappai_2024_108685 crossref_primary_10_1016_j_engappai_2017_09_002 crossref_primary_10_4316_AECE_2017_01004 crossref_primary_10_1007_s41365_018_0402_4 crossref_primary_10_1016_j_asoc_2022_109785 crossref_primary_10_1016_j_asoc_2020_107003 |
| Cites_doi | 10.1109/ICASSP.1998.674489 10.21437/Interspeech.2004-565 10.21437/Interspeech.2014-232 10.1109/NEUREL.2014.7011492 10.21437/Interspeech.2005-142 10.1109/ICASSP.2014.6854059 10.21437/Interspeech.2014-294 10.1109/ICASSP.1990.115702 10.1109/ISCAS.2013.6571843 10.1109/ICASSP.2010.5495022 10.1007/978-3-319-11581-8_31 10.1162/089976602760128018 10.21437/Interspeech.2007-621 10.1561/2000000039 10.1186/1687-6180-2012-157 10.1007/978-3-642-40585-3_74 10.1016/j.specom.2003.10.005 10.1016/j.jvoice.2006.08.012 10.1561/2200000006 10.1109/ISCSLP.2012.6423522 10.1109/TASL.2009.2034770 10.1109/TELFOR.2012.6419311 10.1109/TASSP.1980.1163453 10.1145/1390156.1390294 10.1044/jshr.2702.251 10.21437/Interspeech.2014-380 10.1109/ICASSP.2013.6639243 10.1109/TASL.2010.2066967 10.1109/ICASSP.2015.7178927 10.1186/s13634-015-0246-6 10.1109/ICASSP.1993.319457 10.1016/j.engappai.2013.03.013 10.1109/TASL.2010.2091631 |
| ContentType | Journal Article |
| Copyright | 2016 Elsevier Ltd |
| Copyright_xml | – notice: 2016 Elsevier Ltd |
| DBID | AAYXX CITATION |
| DOI | 10.1016/j.engappai.2016.12.012 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Applied Sciences Computer Science |
| EISSN | 1873-6769 |
| EndPage | 22 |
| ExternalDocumentID | 10_1016_j_engappai_2016_12_012 S0952197616302391 |
| GroupedDBID | --K --M .DC .~1 0R~ 1B1 1~. 1~5 29G 4.4 457 4G. 5GY 5VS 7-5 71M 8P~ 9JN AABNK AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABMAC ABXDB ABYKQ ACDAQ ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADJOM ADMUD ADTZH AEBSH AECPX AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-2 G-Q GBLVA GBOLZ HLZ HVGLF HZ~ IHE J1W JJJVA KOM LG9 LY7 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SBC SDF SDG SDP SES SET SEW SPC SPCBC SST SSV SSZ T5K TN5 UHS WUQ ZMT ~G- 9DU AATTM AAXKI AAYWO AAYXX ABJNI ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD |
| ID | FETCH-LOGICAL-c312t-ac73bfbcc294858f7965c41ac6a454c407b547c5895c8b884c00f211b465dc743 |
| ISICitedReferencesCount | 56 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000393937400002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0952-1976 |
| IngestDate | Sat Nov 29 02:17:56 EST 2025 Tue Nov 18 21:59:32 EST 2025 Fri Feb 23 02:28:55 EST 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Deep learning Deep denoising autoencoder (DDAE) Automatic speech recognition (ASR) Whispered speech |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c312t-ac73bfbcc294858f7965c41ac6a454c407b547c5895c8b884c00f211b465dc743 |
| PageCount | 8 |
| ParticipantIDs | crossref_primary_10_1016_j_engappai_2016_12_012 crossref_citationtrail_10_1016_j_engappai_2016_12_012 elsevier_sciencedirect_doi_10_1016_j_engappai_2016_12_012 |
| PublicationCentury | 2000 |
| PublicationDate | March 2017 2017-03-00 |
| PublicationDateYYYYMMDD | 2017-03-01 |
| PublicationDate_xml | – month: 03 year: 2017 text: March 2017 |
| PublicationDecade | 2010 |
| PublicationTitle | Engineering applications of artificial intelligence |
| PublicationYear | 2017 |
| Publisher | Elsevier Ltd |
| Publisher_xml | – name: Elsevier Ltd |
| References | Lee, P.X., Wee, D., Si, H., Toh, Y., Lim, B.P., Chen, N., Ma, B., College, V.J., 2014. A whispered Mandarin corpus for speech technology applications, In: Proceedings of the Annual Conference International Speech Communication Association INTERSPEECH, 2014. Singapore, pp. 1598–1602. Ghaffarzadegan, S., Boril, H., Hansen, J.H.L., 2014b. UT-Vocal Effort II: Analysis and constrained-lexicon recognition of whispered speech, In: Proceedings of the ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing. Florence, Italy, pp. 2544–2548. Bengio (bib1) 2009; 2 Dimitriadis, D., Maragos, P., Potamianos, A., 2005. Auditory Teager energy cepstrum coefficients for robust speech recognition, In: Proceedings of European Speech Processing Conference. Lisbon, Portugal, pp. 3013–3016. Ito, Takeda, Itakura (bib16) 2005; 45 〉 Lim (bib25) 2011 Marković, B., Jovic̆ić, S.T., Galić, J., Grozdić, Đ., 2013. Whispered speech database: Design, processing and application. In: Proceedings of the 16th International Conference, TSD 2013. Pilsen, Czech Republic, pp. 591–598. Shahin (bib31) 2013; 26 Ghaffarzadegan, S., Boril, H., Hansen, J.H.L., 2015. Generative modeling of pseudo-target domain adaptation samples for whispered speech recognition, In: Proceedings of the ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing. Brisbane, Australia, pp. 5024–5024. Jovičić, S.T., Kašić, Z., Đorđević, M., Rajković, M., 2004. Serbian emotional speech database: design, processing and evaluation, In: Proceedings of the 9th International Conference on Speech and Computer SPECOM 2004. St. Petersburg, Russia, pp. 77–81. Qi, J., Wang, D., Jiang, Y., Liu, R., 2013. Auditory features based on Gammatone filters for robust speech recognition, In: Proceedings of the IEEE International Symposium Circuits Systems Beijing, China, pp. 305–308. Kaiser, J.F., 1990. On a simple algorithm to calculate the ‘energy’ of a signal. In: Proceedings of the IEEE International Conference Acoustic Speech Signal Process. Albuquerque, USA, pp. 381–384. Ghaffarzadegan, S., Bořil, H., Hansen, J.H.L., 2014a. Model and feature based compensation for whispered speech recognition, In: Proceedings of the Annual Conference International Speech Communication Association INTERSPEECH, 2014. Singapore, pp. 2420–2424. Teager (bib33) 1980; 28 Zhang, Hansen (bib38) 2011; 19 Young, Evermann, Kershaw, Moore, Odell, Ollason, Povey, Valtchev, Woodland (bib37) 2002 Jovičić, Šarić (bib19) 2008; 22 Fan, Hansen (bib5) 2011; 19 Vincent, P., Larochelle, H., Bengio, Y., Manzagol P., 2008. Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th Interntaionsal Conference on Machine Learning, ICML 2008. Helsinki, Finland, pp. 1096–1103. Zhang, C., Hansen, J.H.L., 2007. Analysis and classification of speech mode: Whispered through shouted. In: Proceedings of the 8th Annual International Conference on Speech Communications Association Interspeech 2007. Int. Speech Commun. Assoc. Antwerp, Belgium, pp. 2396–2399. Deng, Yu (bib3) 2014; 7 Mimura, Sakai, Kawahara (bib28) 2015; 2015 Morris (bib29) 2003 Zhou, G., Hansen, J., Kaiser, J., 1998. Classification of speech under stress based on features derived from the nonlinear Teager energy operator. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing. Seattle, USA, pp. 2–5. Heracleous (bib14) 2009; 5 . Mathur, Reddy, Hegde (bib27) 2012; 2012 Grozdić, Đ.T., Jovičić, S.T., Galić, J., Marković, B., 2014. Application of inverse filtering in enhancement of whisper recognition, In: Proceedings of the 12th IEEE Symposium on Neural Network Applications in Electrical Engineering (NEUREL). Belgrade, Serbia, pp. 157–162. Kaiser, J.F., 1993. Some useful properties of Teager’s energy operators, In: Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing. Minneapolis, USA, pp. 149–152. Hahm, S., Bořil, H., Angkititrakul, P., Hansen, J.H.L., 2013. Advanced feature normalization and rapid model adaptation for robust in-vehicle speech recognition, In: Proceedings of the 6th Biennial Workshop on Digital Signal Processing for In-Vehicle Systems. Seoul, Korea, pp. 14–17. Jovicic (bib18) 1998; 84 Tao, F., Busso, C., 2014. Lipreading approach for isolated digits recognition under whisper and neutral speech. In: Proceedings of the Annual Conference International Speech Communication Association INTERSPEECH, 2014. Singapore, pp. 1154–1158. Hinton (bib15) 2002; 14 Galić, J., Jovičić, S.T., Grozdić, Đ., Marković, B., 2014b. Constrained lexicon dpeaker dependent recognition of whispered speech, In: Proceedings of the 10th International Symposium on Industrial Electronics INDEL. Banja Luka, BIH, pp. 180–184. Tran, T., Mariooryad, S., Busso, C., 2013. Audiovisual corpus to analyze whisper speech. In: Proceedings of the ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada, pp. 8101–8105. Galić, J., Jovičić, S.T., Grozdić, Đ., Marković, B., 2014a. HTK-based recognition of whispered speech, In: Proceedings of the 16th International Conference on Speech and Computer, SPECOM 2014. Novi Sad, Serbia, pp. 251–258. Jou, S., Schultz, T., Waibel, A., 2004. Adaptation for soft whisper recognition using a throat microphone, In: Proceedings Annual Conference International Speech Communication Association INTERSPEECH. Jeju Island, Korea, pp. 5–8. Bořil, Hansen (bib2) 2010; 18 Zhang, C., Hansen, J.H.L., 2010. Advancements in whisper-island detection using the linear predictive residual. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Dallas, USA, pp. 5170–5173. Kallail, Emanuel (bib23) 1984; 27 Yang, C.Y., Brown, G., Lu, L., Yamagishi, J., King, S., 2012. Noise-robust whispered speech recognition using a non-audible-murmur microphone with VTS compensation. In: Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012. Hong Kong, China, pp. 220–223. Grozdić, Đ.T. , Marković, B., Galić, J., Jovicić, S.T., 2012. Application of neural networks in whispered speech recognition, In: Proceedings of the 20th IEEE TelecommunicationsForum (TELFOR). Belgrade, Serbia, pp. 728–731. Mimura (10.1016/j.engappai.2016.12.012_bib28) 2015; 2015 10.1016/j.engappai.2016.12.012_bib30 10.1016/j.engappai.2016.12.012_bib10 10.1016/j.engappai.2016.12.012_bib32 10.1016/j.engappai.2016.12.012_bib11 10.1016/j.engappai.2016.12.012_bib12 10.1016/j.engappai.2016.12.012_bib34 10.1016/j.engappai.2016.12.012_bib13 10.1016/j.engappai.2016.12.012_bib35 10.1016/j.engappai.2016.12.012_bib36 10.1016/j.engappai.2016.12.012_bib17 10.1016/j.engappai.2016.12.012_bib39 Teager (10.1016/j.engappai.2016.12.012_bib33) 1980; 28 Bengio (10.1016/j.engappai.2016.12.012_bib1) 2009; 2 Ito (10.1016/j.engappai.2016.12.012_bib16) 2005; 45 Young (10.1016/j.engappai.2016.12.012_bib37) 2002 Fan (10.1016/j.engappai.2016.12.012_bib5) 2011; 19 Kallail (10.1016/j.engappai.2016.12.012_bib23) 1984; 27 Morris (10.1016/j.engappai.2016.12.012_bib29) 2003 Lim (10.1016/j.engappai.2016.12.012_bib25) 2011 10.1016/j.engappai.2016.12.012_bib4 10.1016/j.engappai.2016.12.012_bib7 10.1016/j.engappai.2016.12.012_bib6 10.1016/j.engappai.2016.12.012_bib40 10.1016/j.engappai.2016.12.012_bib41 10.1016/j.engappai.2016.12.012_bib20 Jovicic (10.1016/j.engappai.2016.12.012_bib18) 1998; 84 10.1016/j.engappai.2016.12.012_bib21 10.1016/j.engappai.2016.12.012_bib22 10.1016/j.engappai.2016.12.012_bib24 10.1016/j.engappai.2016.12.012_bib26 Zhang (10.1016/j.engappai.2016.12.012_bib38) 2011; 19 Hinton (10.1016/j.engappai.2016.12.012_bib15) 2002; 14 Shahin (10.1016/j.engappai.2016.12.012_bib31) 2013; 26 Mathur (10.1016/j.engappai.2016.12.012_bib27) 2012; 2012 Bořil (10.1016/j.engappai.2016.12.012_bib2) 2010; 18 Deng (10.1016/j.engappai.2016.12.012_bib3) 2014; 7 Heracleous (10.1016/j.engappai.2016.12.012_bib14) 2009; 5 Jovičić (10.1016/j.engappai.2016.12.012_bib19) 2008; 22 10.1016/j.engappai.2016.12.012_bib9 10.1016/j.engappai.2016.12.012_bib8 |
| References_xml | – reference: Ghaffarzadegan, S., Bořil, H., Hansen, J.H.L., 2014a. Model and feature based compensation for whispered speech recognition, In: Proceedings of the Annual Conference International Speech Communication Association INTERSPEECH, 2014. Singapore, pp. 2420–2424. – volume: 18 start-page: 1379 year: 2010 end-page: 1393 ident: bib2 article-title: Unsupervised equalization of lombard effect for speech recognition in noisy adverse environments publication-title: IEEE Trans. Audio Speech Lang. Process. – volume: 26 start-page: 1652 year: 2013 end-page: 1659 ident: bib31 article-title: Speaker identification in emotional talking envionments based on CSPHMM2s publication-title: Eng. Appl. Artif. Intell. – reference: Marković, B., Jovic̆ić, S.T., Galić, J., Grozdić, Đ., 2013. Whispered speech database: Design, processing and application. In: Proceedings of the 16th International Conference, TSD 2013. Pilsen, Czech Republic, pp. 591–598. 〈 – volume: 5 start-page: 31 year: 2009 end-page: 37 ident: bib14 article-title: Using Teager energy cepstrum and HMM distances in automatic speech recognition and analysis of unvoiced speech publication-title: Int J. Inf. Commun. Eng. – reference: Hahm, S., Bořil, H., Angkititrakul, P., Hansen, J.H.L., 2013. Advanced feature normalization and rapid model adaptation for robust in-vehicle speech recognition, In: Proceedings of the 6th Biennial Workshop on Digital Signal Processing for In-Vehicle Systems. Seoul, Korea, pp. 14–17. – reference: Vincent, P., Larochelle, H., Bengio, Y., Manzagol P., 2008. Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th Interntaionsal Conference on Machine Learning, ICML 2008. Helsinki, Finland, pp. 1096–1103. – volume: 2015 start-page: 62 year: 2015 ident: bib28 article-title: Reverberant speech recognition combining deep neural networks and deep autoencoders augmented with a phone-class feature publication-title: EURASIP J. Adv. Signal Process. – volume: 27 start-page: 245 year: 1984 end-page: 251 ident: bib23 article-title: Formant-frequency differences between isolated whispered and phonated vowel samples produced by adult female subjects publication-title: J. Speech Hear. Res. – volume: 2 start-page: 1 year: 2009 end-page: 127 ident: bib1 article-title: Learning deep architectures for AI publication-title: Found. Trends Mach. Learn. – reference: Qi, J., Wang, D., Jiang, Y., Liu, R., 2013. Auditory features based on Gammatone filters for robust speech recognition, In: Proceedings of the IEEE International Symposium Circuits Systems Beijing, China, pp. 305–308. 〈 – reference: Tao, F., Busso, C., 2014. Lipreading approach for isolated digits recognition under whisper and neutral speech. In: Proceedings of the Annual Conference International Speech Communication Association INTERSPEECH, 2014. Singapore, pp. 1154–1158. – reference: Ghaffarzadegan, S., Boril, H., Hansen, J.H.L., 2015. Generative modeling of pseudo-target domain adaptation samples for whispered speech recognition, In: Proceedings of the ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing. Brisbane, Australia, pp. 5024–5024. – year: 2003 ident: bib29 article-title: Enhancement and Recognition of Whispered Speech – volume: 19 start-page: 1408 year: 2011 end-page: 1421 ident: bib5 article-title: Speaker identification within whispered speech audio streams publication-title: IEEE Trans. Audio, Speech Lang. Process. – reference: Grozdić, Đ.T., Jovičić, S.T., Galić, J., Marković, B., 2014. Application of inverse filtering in enhancement of whisper recognition, In: Proceedings of the 12th IEEE Symposium on Neural Network Applications in Electrical Engineering (NEUREL). Belgrade, Serbia, pp. 157–162. 〈 – reference: 〉. – reference: Zhang, C., Hansen, J.H.L., 2010. Advancements in whisper-island detection using the linear predictive residual. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Dallas, USA, pp. 5170–5173. 〈 – reference: Jovičić, S.T., Kašić, Z., Đorđević, M., Rajković, M., 2004. Serbian emotional speech database: design, processing and evaluation, In: Proceedings of the 9th International Conference on Speech and Computer SPECOM 2004. St. Petersburg, Russia, pp. 77–81. – year: 2002 ident: bib37 article-title: The HTK Book (for HTK Version3.2), Techn. Report – reference: Galić, J., Jovičić, S.T., Grozdić, Đ., Marković, B., 2014b. Constrained lexicon dpeaker dependent recognition of whispered speech, In: Proceedings of the 10th International Symposium on Industrial Electronics INDEL. Banja Luka, BIH, pp. 180–184. – volume: 22 start-page: 263 year: 2008 end-page: 274 ident: bib19 article-title: Acoustic analysis of consonants in whispered speech publication-title: J. Voice – volume: 2012 start-page: 157 year: 2012 ident: bib27 article-title: Significance of parametric spectral ratio methods in detection and recognition of whispered speech publication-title: EURASIP J. Adv. Signal Process. – volume: 14 start-page: 1771 year: 2002 end-page: 1800 ident: bib15 article-title: Training products of experts by minimizing contrastive divergence publication-title: Neural Comput. – reference: 〉 – reference: Tran, T., Mariooryad, S., Busso, C., 2013. Audiovisual corpus to analyze whisper speech. In: Proceedings of the ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada, pp. 8101–8105. 〈 – reference: Zhang, C., Hansen, J.H.L., 2007. Analysis and classification of speech mode: Whispered through shouted. In: Proceedings of the 8th Annual International Conference on Speech Communications Association Interspeech 2007. Int. Speech Commun. Assoc. Antwerp, Belgium, pp. 2396–2399. – reference: Zhou, G., Hansen, J., Kaiser, J., 1998. Classification of speech under stress based on features derived from the nonlinear Teager energy operator. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing. Seattle, USA, pp. 2–5. 〈 – reference: Grozdić, Đ.T. , Marković, B., Galić, J., Jovicić, S.T., 2012. Application of neural networks in whispered speech recognition, In: Proceedings of the 20th IEEE TelecommunicationsForum (TELFOR). Belgrade, Serbia, pp. 728–731. 〈 – volume: 7 start-page: 197 year: 2014 end-page: 387 ident: bib3 article-title: Deep learning: mehtods and applications publication-title: Found. Trends Signal Process. – reference: Ghaffarzadegan, S., Boril, H., Hansen, J.H.L., 2014b. UT-Vocal Effort II: Analysis and constrained-lexicon recognition of whispered speech, In: Proceedings of the ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing. Florence, Italy, pp. 2544–2548. 〈 – volume: 84 start-page: 739 year: 1998 end-page: 743 ident: bib18 article-title: Formant feature differences between whispered and voiced sustained vowels publication-title: Acustica – volume: 19 start-page: 883 year: 2011 end-page: 894 ident: bib38 article-title: Whisper-island detection based on unsupervised segmentation with entropy-based speech feature processing publication-title: IEEE Trans. Audio Speech Lang. Process. – year: 2011 ident: bib25 article-title: Computational Differences between Whispered and Non-whispered Speech – volume: 28 start-page: 599 year: 1980 end-page: 601 ident: bib33 article-title: Some observations on oral air flow during phonation publication-title: IEEE Trans. Acoust. – reference: Kaiser, J.F., 1990. On a simple algorithm to calculate the ‘energy’ of a signal. In: Proceedings of the IEEE International Conference Acoustic Speech Signal Process. Albuquerque, USA, pp. 381–384. – reference: Yang, C.Y., Brown, G., Lu, L., Yamagishi, J., King, S., 2012. Noise-robust whispered speech recognition using a non-audible-murmur microphone with VTS compensation. In: Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012. Hong Kong, China, pp. 220–223. 〈 – reference: Galić, J., Jovičić, S.T., Grozdić, Đ., Marković, B., 2014a. HTK-based recognition of whispered speech, In: Proceedings of the 16th International Conference on Speech and Computer, SPECOM 2014. Novi Sad, Serbia, pp. 251–258. 〈 – volume: 45 start-page: 139 year: 2005 end-page: 152 ident: bib16 article-title: Analysis and recognition of whispered speech publication-title: Speech Commun. – reference: Lee, P.X., Wee, D., Si, H., Toh, Y., Lim, B.P., Chen, N., Ma, B., College, V.J., 2014. A whispered Mandarin corpus for speech technology applications, In: Proceedings of the Annual Conference International Speech Communication Association INTERSPEECH, 2014. Singapore, pp. 1598–1602. – reference: Dimitriadis, D., Maragos, P., Potamianos, A., 2005. Auditory Teager energy cepstrum coefficients for robust speech recognition, In: Proceedings of European Speech Processing Conference. Lisbon, Portugal, pp. 3013–3016. – reference: Jou, S., Schultz, T., Waibel, A., 2004. Adaptation for soft whisper recognition using a throat microphone, In: Proceedings Annual Conference International Speech Communication Association INTERSPEECH. Jeju Island, Korea, pp. 5–8. – reference: Kaiser, J.F., 1993. Some useful properties of Teager’s energy operators, In: Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing. Minneapolis, USA, pp. 149–152. 〈 – ident: 10.1016/j.engappai.2016.12.012_bib41 doi: 10.1109/ICASSP.1998.674489 – ident: 10.1016/j.engappai.2016.12.012_bib17 doi: 10.21437/Interspeech.2004-565 – ident: 10.1016/j.engappai.2016.12.012_bib8 doi: 10.21437/Interspeech.2014-232 – ident: 10.1016/j.engappai.2016.12.012_bib12 doi: 10.1109/NEUREL.2014.7011492 – ident: 10.1016/j.engappai.2016.12.012_bib4 doi: 10.21437/Interspeech.2005-142 – ident: 10.1016/j.engappai.2016.12.012_bib9 doi: 10.1109/ICASSP.2014.6854059 – ident: 10.1016/j.engappai.2016.12.012_bib32 doi: 10.21437/Interspeech.2014-294 – ident: 10.1016/j.engappai.2016.12.012_bib21 doi: 10.1109/ICASSP.1990.115702 – ident: 10.1016/j.engappai.2016.12.012_bib30 doi: 10.1109/ISCAS.2013.6571843 – ident: 10.1016/j.engappai.2016.12.012_bib40 doi: 10.1109/ICASSP.2010.5495022 – ident: 10.1016/j.engappai.2016.12.012_bib7 doi: 10.1007/978-3-319-11581-8_31 – volume: 14 start-page: 1771 year: 2002 ident: 10.1016/j.engappai.2016.12.012_bib15 article-title: Training products of experts by minimizing contrastive divergence publication-title: Neural Comput. doi: 10.1162/089976602760128018 – ident: 10.1016/j.engappai.2016.12.012_bib6 doi: 10.1007/978-3-319-11581-8_31 – ident: 10.1016/j.engappai.2016.12.012_bib39 doi: 10.21437/Interspeech.2007-621 – volume: 7 start-page: 197 year: 2014 ident: 10.1016/j.engappai.2016.12.012_bib3 article-title: Deep learning: mehtods and applications publication-title: Found. Trends Signal Process. doi: 10.1561/2000000039 – year: 2003 ident: 10.1016/j.engappai.2016.12.012_bib29 – year: 2002 ident: 10.1016/j.engappai.2016.12.012_bib37 – volume: 2012 start-page: 157 year: 2012 ident: 10.1016/j.engappai.2016.12.012_bib27 article-title: Significance of parametric spectral ratio methods in detection and recognition of whispered speech publication-title: EURASIP J. Adv. Signal Process. doi: 10.1186/1687-6180-2012-157 – volume: 84 start-page: 739 year: 1998 ident: 10.1016/j.engappai.2016.12.012_bib18 article-title: Formant feature differences between whispered and voiced sustained vowels publication-title: Acustica – ident: 10.1016/j.engappai.2016.12.012_bib20 – ident: 10.1016/j.engappai.2016.12.012_bib26 doi: 10.1007/978-3-642-40585-3_74 – volume: 45 start-page: 139 year: 2005 ident: 10.1016/j.engappai.2016.12.012_bib16 article-title: Analysis and recognition of whispered speech publication-title: Speech Commun. doi: 10.1016/j.specom.2003.10.005 – volume: 22 start-page: 263 year: 2008 ident: 10.1016/j.engappai.2016.12.012_bib19 article-title: Acoustic analysis of consonants in whispered speech publication-title: J. Voice doi: 10.1016/j.jvoice.2006.08.012 – volume: 2 start-page: 1 year: 2009 ident: 10.1016/j.engappai.2016.12.012_bib1 article-title: Learning deep architectures for AI publication-title: Found. Trends Mach. Learn. doi: 10.1561/2200000006 – ident: 10.1016/j.engappai.2016.12.012_bib36 doi: 10.1109/ISCSLP.2012.6423522 – volume: 18 start-page: 1379 year: 2010 ident: 10.1016/j.engappai.2016.12.012_bib2 article-title: Unsupervised equalization of lombard effect for speech recognition in noisy adverse environments publication-title: IEEE Trans. Audio Speech Lang. Process. doi: 10.1109/TASL.2009.2034770 – ident: 10.1016/j.engappai.2016.12.012_bib11 doi: 10.1109/TELFOR.2012.6419311 – volume: 28 start-page: 599 year: 1980 ident: 10.1016/j.engappai.2016.12.012_bib33 article-title: Some observations on oral air flow during phonation publication-title: IEEE Trans. Acoust. doi: 10.1109/TASSP.1980.1163453 – ident: 10.1016/j.engappai.2016.12.012_bib35 doi: 10.1145/1390156.1390294 – ident: 10.1016/j.engappai.2016.12.012_bib13 – volume: 27 start-page: 245 year: 1984 ident: 10.1016/j.engappai.2016.12.012_bib23 article-title: Formant-frequency differences between isolated whispered and phonated vowel samples produced by adult female subjects publication-title: J. Speech Hear. Res. doi: 10.1044/jshr.2702.251 – ident: 10.1016/j.engappai.2016.12.012_bib24 doi: 10.21437/Interspeech.2014-380 – ident: 10.1016/j.engappai.2016.12.012_bib34 doi: 10.1109/ICASSP.2013.6639243 – volume: 19 start-page: 883 year: 2011 ident: 10.1016/j.engappai.2016.12.012_bib38 article-title: Whisper-island detection based on unsupervised segmentation with entropy-based speech feature processing publication-title: IEEE Trans. Audio Speech Lang. Process. doi: 10.1109/TASL.2010.2066967 – year: 2011 ident: 10.1016/j.engappai.2016.12.012_bib25 – ident: 10.1016/j.engappai.2016.12.012_bib10 doi: 10.1109/ICASSP.2015.7178927 – volume: 5 start-page: 31 issue: 1 year: 2009 ident: 10.1016/j.engappai.2016.12.012_bib14 article-title: Using Teager energy cepstrum and HMM distances in automatic speech recognition and analysis of unvoiced speech publication-title: Int J. Inf. Commun. Eng. – volume: 2015 start-page: 62 year: 2015 ident: 10.1016/j.engappai.2016.12.012_bib28 article-title: Reverberant speech recognition combining deep neural networks and deep autoencoders augmented with a phone-class feature publication-title: EURASIP J. Adv. Signal Process. doi: 10.1186/s13634-015-0246-6 – ident: 10.1016/j.engappai.2016.12.012_bib22 doi: 10.1109/ICASSP.1993.319457 – volume: 26 start-page: 1652 issue: 2013 year: 2013 ident: 10.1016/j.engappai.2016.12.012_bib31 article-title: Speaker identification in emotional talking envionments based on CSPHMM2s publication-title: Eng. Appl. Artif. Intell. doi: 10.1016/j.engappai.2013.03.013 – volume: 19 start-page: 1408 year: 2011 ident: 10.1016/j.engappai.2016.12.012_bib5 article-title: Speaker identification within whispered speech audio streams publication-title: IEEE Trans. Audio, Speech Lang. Process. doi: 10.1109/TASL.2010.2091631 |
| SSID | ssj0003846 |
| Score | 2.3954587 |
| Snippet | Recently Deep Denoising Autoencoders (DDAE) have shown state-of-the-art performance on various machine learning tasks. In this paper, the authors extended this... |
| SourceID | crossref elsevier |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 15 |
| SubjectTerms | Automatic speech recognition (ASR) Deep denoising autoencoder (DDAE) Deep learning Whispered speech |
| Title | Whispered speech recognition using deep denoising autoencoder |
| URI | https://dx.doi.org/10.1016/j.engappai.2016.12.012 |
| Volume | 59 |
| WOSCitedRecordID | wos000393937400002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1873-6769 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0003846 issn: 0952-1976 databaseCode: AIEXJ dateStart: 19950201 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwELZWLQcuQHmI8qhygNMqJXHsODmuoFVBUCGxiL1F8SM0ZZWsttlVxW_hxzJ-JQEqSg9cHK9lO4lnMh7PfjOD0AsqlFQphWOqiFhIUvgUQWvNQpkTCdqIZCU3lH7PTk-zxSL_OJn88L4w2yVrmuzyMl_9V1JDGxBbu87egNz9pNAAdSA6lEB2KP-J8F_Oah39GxRJuChxNu0xQkDojTENSKVWUDRtbX6Wm67V8SylQ-p6Q_0QqnA6_p_bQAfWBmNkMn6MgnoOaJ72u6yNoso0EXUtj9q1ucZqOj8ckDtb2-_NqPunZctbCZJn6AcCru1GXT5Anb6cxd_asd0C9sIeuNUbIHEY5zb7i5fFLjq4FabWz9Nty9Z7-Q-Bb20P54eq-QorUdYarJcaA69DZ_8SYfu3na_HI3qo23nh5yn0PEWMi0insN7FjOYgM3dnb48W7_qdPsmsI5h_k5EH-tVPdLXyM1Jo5vfQHXcSCWaWg_bQRDX30V13KgmczL-AJp_4w7c9QAOPBZbHghGPBYbHAs1jQc9jwYjHHqLPx0fz1yehy8MRiiTGXVgKlvCKC4F1KKGsYnlKBYlLkZaEEkEixilhgmY5FRnPMiKiqMJxzElKpQAV9RHaadpGPUaBJFWFJSacxgnBFeMVA5WdRwJXCakw3UfUL1AhXJB6nStlWfydRPvoVT9uZcO0XDsi9-tfOGXTKpEFsNY1Y5_c-G5P0e3hG3iGdrr1Rj1Ht8S2qy_WB46vfgL2F6D5 |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Whispered+speech+recognition+using+deep+denoising+autoencoder&rft.jtitle=Engineering+applications+of+artificial+intelligence&rft.au=Grozdi%C4%87%2C+%C4%90or%C4%91e+T.&rft.au=Jovi%C4%8Di%C4%87%2C+Slobodan+T.&rft.au=Suboti%C4%87%2C+Mi%C5%A1ko&rft.date=2017-03-01&rft.issn=0952-1976&rft.volume=59&rft.spage=15&rft.epage=22&rft_id=info:doi/10.1016%2Fj.engappai.2016.12.012&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_engappai_2016_12_012 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0952-1976&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0952-1976&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0952-1976&client=summon |