Correlation distance skip connection denoising autoencoder (CDSK-DAE) for speech feature enhancement
Performance of learning based Automatic Speech Recognition (ASR) is susceptible to noise, especially when it is introduced in the testing data while not presented in the training data. This work focuses on a feature enhancement for noise robust end-to-end ASR system by introducing a novel variant of...
Gespeichert in:
| Veröffentlicht in: | Applied acoustics Jg. 163; S. 107213 |
|---|---|
| Hauptverfasser: | , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Elsevier Ltd
01.06.2020
|
| Schlagworte: | |
| ISSN: | 0003-682X, 1872-910X |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Performance of learning based Automatic Speech Recognition (ASR) is susceptible to noise, especially when it is introduced in the testing data while not presented in the training data. This work focuses on a feature enhancement for noise robust end-to-end ASR system by introducing a novel variant of denoising autoencoder (DAE). The proposed method uses skip connections in both encoder and decoder sides by passing speech information of the target frame from input to the model. It also uses a new objective function in training model that uses a correlation distance measure in penalty terms by measuring dependency of the latent target features and the model (latent features and enhanced features obtained from the DAE). Performance of the proposed method was compared against a conventional model and a state of the art model under both seen and unseen noisy environments of 7 different types of background noise with different SNR levels (0, 5, 10 and 20 dB). The proposed method also is tested using linear and non-linear penalty terms as well, where, they both show an improvement on the overall average WER under noisy conditions both seen and unseen in comparison to the state-of-the-art model. |
|---|---|
| AbstractList | Performance of learning based Automatic Speech Recognition (ASR) is susceptible to noise, especially when it is introduced in the testing data while not presented in the training data. This work focuses on a feature enhancement for noise robust end-to-end ASR system by introducing a novel variant of denoising autoencoder (DAE). The proposed method uses skip connections in both encoder and decoder sides by passing speech information of the target frame from input to the model. It also uses a new objective function in training model that uses a correlation distance measure in penalty terms by measuring dependency of the latent target features and the model (latent features and enhanced features obtained from the DAE). Performance of the proposed method was compared against a conventional model and a state of the art model under both seen and unseen noisy environments of 7 different types of background noise with different SNR levels (0, 5, 10 and 20 dB). The proposed method also is tested using linear and non-linear penalty terms as well, where, they both show an improvement on the overall average WER under noisy conditions both seen and unseen in comparison to the state-of-the-art model. |
| ArticleNumber | 107213 |
| Author | Ko, Hanseok Han, David K. Park, Sangwook Badi, Alzahra |
| Author_xml | – sequence: 1 givenname: Alzahra surname: Badi fullname: Badi, Alzahra organization: Electrical and Computer Engineering, Korea University, Republic of Korea – sequence: 2 givenname: Sangwook surname: Park fullname: Park, Sangwook organization: Electrical and Computer Engineering, Johns Hopkins University, United States – sequence: 3 givenname: David K. surname: Han fullname: Han, David K. organization: Information Sciences Division, U.S. Army Research Laboratory, United States – sequence: 4 givenname: Hanseok surname: Ko fullname: Ko, Hanseok email: hsko@korea.ac.kr organization: Electrical and Computer Engineering, Korea University, Republic of Korea |
| BookMark | eNqFkE1LAzEQhoNUsK3-BclRD1uTrLvtggdLWz-w4EGF3sJsdmJT22RJUsF_766rFy89DTPD88L7DEjPOouEnHM24oznV5sR1KDcPsSRYKI9jgVPj0ifT8YiKThb9UifMZYm-USsTsgghE2zMpFlfVLNnPe4hWicpZUJEaxCGj5MTZWzFlX3QOtMMPadwj46tMpV6OnFbP7ylMyni0uqnaehRlRrqhHi3iNFu26zdmjjKTnWsA149juH5O1u8Tp7SJbP94-z6TJRKRcx0ToTKoOq1ALK5qQU8ALaOgWmWhRaiQLG_FqDLtOqKrDUOmcCNMtyBmWaDslNl6u8C8GjlsrEn2rRg9lKzmRrTG7knzHZGpOdsQbP_-G1NzvwX4fB2w7EptynQS-DMo0lrIxvDMrKmUMR3zf4jzE |
| CitedBy_id | crossref_primary_10_1109_ACCESS_2025_3542953 crossref_primary_10_1109_LSP_2022_3203911 crossref_primary_10_1007_s11802_023_5309_y crossref_primary_10_1016_j_cma_2024_117071 crossref_primary_10_1016_j_renene_2022_05_141 |
| Cites_doi | 10.1109/TASL.2011.2109382 10.1109/TASL.2011.2134090 10.1214/009053607000000505 10.1109/ASRU.2015.7404790 10.1007/s00417-006-0391-6 10.1109/ASRU.2017.8268911 10.1006/csla.2001.0174 |
| ContentType | Journal Article |
| Copyright | 2020 Elsevier Ltd |
| Copyright_xml | – notice: 2020 Elsevier Ltd |
| DBID | AAYXX CITATION |
| DOI | 10.1016/j.apacoust.2020.107213 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Physics |
| EISSN | 1872-910X |
| ExternalDocumentID | 10_1016_j_apacoust_2020_107213 S0003682X19308175 |
| GroupedDBID | --K --M -~X .~1 0R~ 1B1 1~. 1~5 23M 4.4 457 4G. 5GY 5VS 7-5 71M 8P~ 9JN AABNK AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO ABFNM ABMAC ABNEU ABTAH ABXDB ABYKQ ACDAQ ACFVG ACGFS ACNNM ACRLP ADBBV ADEZE ADMUD ADTZH AEBSH AECPX AEKER AENEX AFFNX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AI. AIEXJ AIKHN AITUG AIVDX AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 FDB FEDTE FGOYB FIRID FNPLU FYGXN G-2 G-Q GBLVA HVGLF HZ~ IHE J1W JJJVA KOM LY7 M41 MO0 N9A O-L O9- OAUVE OGIMB OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SDF SDG SDP SES SET SEW SPC SPCBC SPD SSQ SST SSZ T5K VH1 WUQ XPP ZMT ZY4 ~02 ~G- 9DU AATTM AAXKI AAYWO AAYXX ABJNI ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD |
| ID | FETCH-LOGICAL-c312t-ff52c5adbf2abc31cca19a10729e3f29fc29a714fafb3dd9ebff602af0560ab33 |
| ISICitedReferencesCount | 7 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000521507200002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0003-682X |
| IngestDate | Sat Nov 29 07:31:42 EST 2025 Tue Nov 18 20:29:11 EST 2025 Fri Feb 23 02:44:45 EST 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Automatic speech recognition (ASR) Skip connection Denoising Autoencoder (SK-DAE) Correlation distance measure (CDM) |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c312t-ff52c5adbf2abc31cca19a10729e3f29fc29a714fafb3dd9ebff602af0560ab33 |
| ParticipantIDs | crossref_citationtrail_10_1016_j_apacoust_2020_107213 crossref_primary_10_1016_j_apacoust_2020_107213 elsevier_sciencedirect_doi_10_1016_j_apacoust_2020_107213 |
| PublicationCentury | 2000 |
| PublicationDate | June 2020 2020-06-00 |
| PublicationDateYYYYMMDD | 2020-06-01 |
| PublicationDate_xml | – month: 06 year: 2020 text: June 2020 |
| PublicationDecade | 2020 |
| PublicationTitle | Applied acoustics |
| PublicationYear | 2020 |
| Publisher | Elsevier Ltd |
| Publisher_xml | – name: Elsevier Ltd |
| References | Jyoti Bora, Kumar Gupta (b0105) 2014; 5 Chan, Jaitly, Le, Vinyals (b0065) 2016 Srivastava, Greff, Schmidhuber (b0120) 2015; 2015 Mikolov, Kombrink, Burget, Černocký, Khudanpur (b0030) 2011 Jia, Shelhamer, Donahue, Karayev, Long, Girshick, Guadarrama, Darrell (b0165) 2014 Mao, Shen, Yang (b0130) 2016 Hsu WN, Zhang Y, Glass J. Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU); 2017. p. 16–23. arXiv:1707.06265 He, Zhang, Ren, Sun (b0145) 2016 Xu, Chen, Gao, Wang, Li, Goel, Carmiel, Povey, Khudanpur (b0035) 2018 Vincent, Larochelle, Bengio, Manzagol (b0110) 2008 Székely, Rizzo, Bakirov (b0100) 2007; 35 Feng, Zhang, Glass (b0085) 2014 ITU-T, ITU-T P.56 Objective measurement of active speech level, Tech. rep.; 2011. Zhang, Liu, Inoue, Shinoda (b0095) 2018 He, Zhang, Ren, Sun (b0125) 2016 Tu, Zhang (b0140) 2017 Mohamed, Dahl, Hinton (b0015) 2012; 20 Sundermeyer, Schlueter, Ney (b0040) 2012; 2012 Graves, Jaitly (b0050) 2014; 2014 Miao Y, Gowayyed M, Metze F. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding, arXiv preprint arXiv:1507.08240 (2016) 167–174 (2016). arXiv:1507.08240 Dahl, Yu, Deng, Acero (b0010) 2012; 20 Huang, Liu, Van Der Maaten, Weinberger (b0170) 2017 Rao, Peng, Sak, Beaufays (b0025) 2015 Liu, Yang (b0135) 2018 Lu, Tsao, Matsuda, Hori (b0080) 2013; 2013 Hannun A, Case C, Casper J, Catanzaro B, Diamos G, Elsen E, Prenger R, Satheesh S, Sengupta S, Coates A, Ng AY. Deep speech: scaling up end-to-end speech recognition; 2014. arXiv preprint arXiv:1412.5567. arXiv:1412.5567v2. doi:arXiv:1412.5567v2. . Veit, Wilber, Belongie (b0150) 2016 Goodman (b0005) 2001; 15 Mohamed, Dahl, Hinton (b0020) 2009; 1 European Telecommunications Standards Institute, ETSI: EG 202 396–1 v1.2.2, Tech. rep.; 2008. Bahdanau D, Serdyuk D, Brakel P, Ke NR, Chorowski J, Courville A, Bengio Y. Task loss estimation for sequence prediction; 2015, arXiv preprint arXiv:1511.06456. arXiv:1511.06456 Graves, Fernandezl, Gomez, Schmidhuber (b0075) 2006 Zhang, Chan, Jaitly (b0055) 2017 Lu, Matsuda, Hori, Kashioka (b0115) 2012; 2012 Graves (10.1016/j.apacoust.2020.107213_b0075) 2006 Srivastava (10.1016/j.apacoust.2020.107213_b0120) 2015; 2015 Mao (10.1016/j.apacoust.2020.107213_b0130) 2016 Dahl (10.1016/j.apacoust.2020.107213_b0010) 2012; 20 Zhang (10.1016/j.apacoust.2020.107213_b0095) 2018 Zhang (10.1016/j.apacoust.2020.107213_b0055) 2017 He (10.1016/j.apacoust.2020.107213_b0145) 2016 Xu (10.1016/j.apacoust.2020.107213_b0035) 2018 10.1016/j.apacoust.2020.107213_b0045 Veit (10.1016/j.apacoust.2020.107213_b0150) 2016 He (10.1016/j.apacoust.2020.107213_b0125) 2016 Jia (10.1016/j.apacoust.2020.107213_b0165) 2014 Goodman (10.1016/j.apacoust.2020.107213_b0005) 2001; 15 Sundermeyer (10.1016/j.apacoust.2020.107213_b0040) 2012; 2012 10.1016/j.apacoust.2020.107213_b0060 Mohamed (10.1016/j.apacoust.2020.107213_b0015) 2012; 20 10.1016/j.apacoust.2020.107213_b0160 Jyoti Bora (10.1016/j.apacoust.2020.107213_b0105) 2014; 5 Rao (10.1016/j.apacoust.2020.107213_b0025) 2015 Lu (10.1016/j.apacoust.2020.107213_b0080) 2013; 2013 Mikolov (10.1016/j.apacoust.2020.107213_b0030) 2011 Chan (10.1016/j.apacoust.2020.107213_b0065) 2016 Tu (10.1016/j.apacoust.2020.107213_b0140) 2017 Huang (10.1016/j.apacoust.2020.107213_b0170) 2017 Székely (10.1016/j.apacoust.2020.107213_b0100) 2007; 35 Graves (10.1016/j.apacoust.2020.107213_b0050) 2014; 2014 Lu (10.1016/j.apacoust.2020.107213_b0115) 2012; 2012 10.1016/j.apacoust.2020.107213_b0155 Feng (10.1016/j.apacoust.2020.107213_b0085) 2014 Vincent (10.1016/j.apacoust.2020.107213_b0110) 2008 10.1016/j.apacoust.2020.107213_b0090 10.1016/j.apacoust.2020.107213_b0070 Mohamed (10.1016/j.apacoust.2020.107213_b0020) 2009; 1 Liu (10.1016/j.apacoust.2020.107213_b0135) 2018 |
| References_xml | – start-page: 4225 year: 2015 end-page: 4229 ident: b0025 article-title: Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks publication-title: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) – volume: 20 start-page: 30 year: 2012 end-page: 42 ident: b0010 article-title: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition publication-title: IEEE Trans Audio, Speech Language Process – volume: 5 start-page: 2501 year: 2014 end-page: 2506 ident: b0105 article-title: Effect of different distance measures on the performance of K-Means Algorithm: an experimental study in Matlab publication-title: Int. J. Comput. Sci. Information Technol. (IJCSIT) – volume: 2012 start-page: 194 year: 2012 end-page: 197 ident: b0040 article-title: LSTM neural networks for language modeling publication-title: INTERSPEECH – volume: 1 start-page: 39 year: 2009 ident: b0020 article-title: Deep belief networks for phone recognition publication-title: NIPS Workshop Deep Learning for Speech Recognition and Related Applications – start-page: 5599 year: 2018 end-page: 5603 ident: b0095 article-title: Multi-Task Autoencoder for Noise-Robust Speech Recognition publication-title: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) – start-page: 2802 year: 2016 end-page: 2810 ident: b0130 article-title: Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections publication-title: Advances in Neural Information Processing Systems (NIPS) – volume: 2014 start-page: 1764 year: 2014 end-page: 1772 ident: b0050 article-title: Towards end-to-end speech recognition with recurrent neural networks publication-title: International Conference on Machine Learning – reference: Hsu WN, Zhang Y, Glass J. Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU); 2017. p. 16–23. arXiv:1707.06265, – start-page: 4960 year: 2016 end-page: 4964 ident: b0065 article-title: Listen, attend and spell: A neural network for large vocabulary conversational speech recognition publication-title: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) – reference: ITU-T, ITU-T P.56 Objective measurement of active speech level, Tech. rep.; 2011. – start-page: 1759 year: 2014 end-page: 1763 ident: b0085 article-title: Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition publication-title: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) – volume: 35 start-page: 2769 year: 2007 end-page: 2794 ident: b0100 article-title: Measuring and testing dependency by correlation of distances publication-title: Ann. Stat. – start-page: 4845 year: 2017 end-page: 4849 ident: b0055 article-title: Very deep convolutional networks for end-to-end speech recognition publication-title: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) – reference: Bahdanau D, Serdyuk D, Brakel P, Ke NR, Chorowski J, Courville A, Bengio Y. Task loss estimation for sequence prediction; 2015, arXiv preprint arXiv:1511.06456. arXiv:1511.06456, – start-page: 675 year: 2014 end-page: 678 ident: b0165 article-title: Caffe: Convolutional architecture for fast feature embedding publication-title: ACM Multimedia – volume: 2015 start-page: 2377 year: 2015 end-page: 2385 ident: b0120 article-title: Training very deep networks publication-title: Advances in Neural Information Processing Systems (NIPS) – reference: European Telecommunications Standards Institute, ETSI: EG 202 396–1 v1.2.2, Tech. rep.; 2008. – volume: 2013 start-page: 436 year: 2013 end-page: 440 ident: b0080 article-title: Speech enhancement based on deep denoising autoencoder publication-title: INTERSPEECH – start-page: 5528 year: 2011 end-page: 5531 ident: b0030 article-title: Extensions of recurrent neural network language model publication-title: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) – volume: 2012 start-page: 1504 year: 2012 end-page: 1507 ident: b0115 article-title: Speech restoration based on deep learning autoencoder with layer-wised pretraining publication-title: INTERSPEECH – reference: Miao Y, Gowayyed M, Metze F. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding, arXiv preprint arXiv:1507.08240 (2016) 167–174 (2016). arXiv:1507.08240, – volume: 15 start-page: 403 year: 2001 end-page: 434 ident: b0005 article-title: A bit of progress in language modeling publication-title: Comput Speech Language – reference: Hannun A, Case C, Casper J, Catanzaro B, Diamos G, Elsen E, Prenger R, Satheesh S, Sengupta S, Coates A, Ng AY. Deep speech: scaling up end-to-end speech recognition; 2014. arXiv preprint arXiv:1412.5567. arXiv:1412.5567v2. doi:arXiv:1412.5567v2. – start-page: 4700 year: 2017 end-page: 4708 ident: b0170 article-title: Densely connected convolutional networks publication-title: Proceedings of the IEEE conference on computer vision and pattern recognition – start-page: 1096 year: 2008 end-page: 1103 ident: b0110 article-title: Extracting and composing robust features with denoising autoencoders publication-title: 25th International Conference on Machine Learning – start-page: 5565 year: 2017 end-page: 5569 ident: b0140 article-title: Speech enhancement based on deep neural networks with skip connections publication-title: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) – reference: . – start-page: 5929 year: 2018 end-page: 5933 ident: b0035 article-title: A pruned rnnlm lattice-rescoring algorithm for automatic speech recognition publication-title: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) – start-page: 369 year: 2006 end-page: 376 ident: b0075 article-title: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks publication-title: 23rd International Conference on Machine Learning (ICML) – start-page: 630 year: 2016 end-page: 645 ident: b0145 article-title: Identity mappings in deep residual networks publication-title: European Conference on Computer Vision (ECCV) – start-page: 770 year: 2016 end-page: 778 ident: b0125 article-title: Deep residual learning for image recognition publication-title: IEEE Conference on Computer Vision and Pattern Recognition – volume: 20 start-page: 14 year: 2012 end-page: 22 ident: b0015 article-title: Acoustic modeling using deep belief networks publication-title: IEEE Trans Audio Speech Language Process – start-page: 550 year: 2016 end-page: 558 ident: b0150 article-title: Residual networks behave like ensembles of relatively shallow networks publication-title: Advances in Neural Information Processing Systems (NIPS) – start-page: 773 year: 2018 end-page: 778 ident: b0135 article-title: Denoising auto-encoder with recurrent skip connections and residual regression for music source separation publication-title: 17th IEEE International Conference on Machine Learning and Applications (ICMLA) – start-page: 4845 year: 2017 ident: 10.1016/j.apacoust.2020.107213_b0055 article-title: Very deep convolutional networks for end-to-end speech recognition – ident: 10.1016/j.apacoust.2020.107213_b0160 – start-page: 675 year: 2014 ident: 10.1016/j.apacoust.2020.107213_b0165 article-title: Caffe: Convolutional architecture for fast feature embedding – start-page: 1096 year: 2008 ident: 10.1016/j.apacoust.2020.107213_b0110 article-title: Extracting and composing robust features with denoising autoencoders – volume: 2014 start-page: 1764 year: 2014 ident: 10.1016/j.apacoust.2020.107213_b0050 article-title: Towards end-to-end speech recognition with recurrent neural networks publication-title: International Conference on Machine Learning – volume: 20 start-page: 14 issue: 1 year: 2012 ident: 10.1016/j.apacoust.2020.107213_b0015 article-title: Acoustic modeling using deep belief networks publication-title: IEEE Trans Audio Speech Language Process doi: 10.1109/TASL.2011.2109382 – volume: 2012 start-page: 194 year: 2012 ident: 10.1016/j.apacoust.2020.107213_b0040 article-title: LSTM neural networks for language modeling publication-title: INTERSPEECH – volume: 2013 start-page: 436 year: 2013 ident: 10.1016/j.apacoust.2020.107213_b0080 article-title: Speech enhancement based on deep denoising autoencoder publication-title: INTERSPEECH – volume: 20 start-page: 30 issue: 1 year: 2012 ident: 10.1016/j.apacoust.2020.107213_b0010 article-title: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition publication-title: IEEE Trans Audio, Speech Language Process doi: 10.1109/TASL.2011.2134090 – volume: 35 start-page: 2769 issue: 6 year: 2007 ident: 10.1016/j.apacoust.2020.107213_b0100 article-title: Measuring and testing dependency by correlation of distances publication-title: Ann. Stat. doi: 10.1214/009053607000000505 – volume: 2012 start-page: 1504 year: 2012 ident: 10.1016/j.apacoust.2020.107213_b0115 article-title: Speech restoration based on deep learning autoencoder with layer-wised pretraining publication-title: INTERSPEECH – volume: 2015 start-page: 2377 year: 2015 ident: 10.1016/j.apacoust.2020.107213_b0120 article-title: Training very deep networks publication-title: Advances in Neural Information Processing Systems (NIPS) – volume: 1 start-page: 39 year: 2009 ident: 10.1016/j.apacoust.2020.107213_b0020 article-title: Deep belief networks for phone recognition publication-title: NIPS Workshop Deep Learning for Speech Recognition and Related Applications – ident: 10.1016/j.apacoust.2020.107213_b0045 – start-page: 4960 year: 2016 ident: 10.1016/j.apacoust.2020.107213_b0065 article-title: Listen, attend and spell: A neural network for large vocabulary conversational speech recognition – start-page: 2802 year: 2016 ident: 10.1016/j.apacoust.2020.107213_b0130 article-title: Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections – start-page: 1759 year: 2014 ident: 10.1016/j.apacoust.2020.107213_b0085 article-title: Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition – start-page: 4700 year: 2017 ident: 10.1016/j.apacoust.2020.107213_b0170 article-title: Densely connected convolutional networks – ident: 10.1016/j.apacoust.2020.107213_b0060 doi: 10.1109/ASRU.2015.7404790 – start-page: 369 year: 2006 ident: 10.1016/j.apacoust.2020.107213_b0075 article-title: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks – start-page: 630 year: 2016 ident: 10.1016/j.apacoust.2020.107213_b0145 article-title: Identity mappings in deep residual networks – start-page: 5528 year: 2011 ident: 10.1016/j.apacoust.2020.107213_b0030 article-title: Extensions of recurrent neural network language model – start-page: 4225 year: 2015 ident: 10.1016/j.apacoust.2020.107213_b0025 article-title: Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks – start-page: 5929 year: 2018 ident: 10.1016/j.apacoust.2020.107213_b0035 article-title: A pruned rnnlm lattice-rescoring algorithm for automatic speech recognition – start-page: 550 year: 2016 ident: 10.1016/j.apacoust.2020.107213_b0150 article-title: Residual networks behave like ensembles of relatively shallow networks publication-title: Advances in Neural Information Processing Systems (NIPS) – start-page: 5565 year: 2017 ident: 10.1016/j.apacoust.2020.107213_b0140 article-title: Speech enhancement based on deep neural networks with skip connections – start-page: 5599 year: 2018 ident: 10.1016/j.apacoust.2020.107213_b0095 article-title: Multi-Task Autoencoder for Noise-Robust Speech Recognition – ident: 10.1016/j.apacoust.2020.107213_b0070 doi: 10.1007/s00417-006-0391-6 – start-page: 770 year: 2016 ident: 10.1016/j.apacoust.2020.107213_b0125 article-title: Deep residual learning for image recognition publication-title: IEEE Conference on Computer Vision and Pattern Recognition – ident: 10.1016/j.apacoust.2020.107213_b0155 – ident: 10.1016/j.apacoust.2020.107213_b0090 doi: 10.1109/ASRU.2017.8268911 – volume: 5 start-page: 2501 issue: 2 year: 2014 ident: 10.1016/j.apacoust.2020.107213_b0105 article-title: Effect of different distance measures on the performance of K-Means Algorithm: an experimental study in Matlab publication-title: Int. J. Comput. Sci. Information Technol. (IJCSIT) – volume: 15 start-page: 403 issue: 4 year: 2001 ident: 10.1016/j.apacoust.2020.107213_b0005 article-title: A bit of progress in language modeling publication-title: Comput Speech Language doi: 10.1006/csla.2001.0174 – start-page: 773 year: 2018 ident: 10.1016/j.apacoust.2020.107213_b0135 article-title: Denoising auto-encoder with recurrent skip connections and residual regression for music source separation |
| SSID | ssj0000255 |
| Score | 2.2943325 |
| Snippet | Performance of learning based Automatic Speech Recognition (ASR) is susceptible to noise, especially when it is introduced in the testing data while not... |
| SourceID | crossref elsevier |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 107213 |
| SubjectTerms | Automatic speech recognition (ASR) Correlation distance measure (CDM) Skip connection Denoising Autoencoder (SK-DAE) |
| Title | Correlation distance skip connection denoising autoencoder (CDSK-DAE) for speech feature enhancement |
| URI | https://dx.doi.org/10.1016/j.apacoust.2020.107213 |
| Volume | 163 |
| WOSCitedRecordID | wos000521507200002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1872-910X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000255 issn: 0003-682X databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9NAEF6FFCQ4ICggykt74ACKHJLdOPYeozQIKKoqpaDcrPU-mpTKiWK3RPx6Zl-2EZUKQlysaJ31yjufZz6P54HQazqWIymYjBLg1tFI62HEk1REYHnhBI85senRXz8nx8fpYsFOOp1dyIW5ukiKIt3t2Oa_ihrGQNgmdfYvxF1fFAbgNwgdjiB2OP6R4Kem34aLcDOfXyqbE1B-W21MhHmhXGtw0DbrlXUT8MtqbYpZ2poSJJ0ezo-iw8nMeAtMBGK5UUose1rZAqA9VSzN9epwmVC_1nNZUK-2O1jje-dy5fJofvDltjYBJz5Ce86Ls--B5ls9WNSB9r2jfm0NrD8XTpbK_9f7KcigiadyzrOQQNNEKzmFTKNxajuqgzlyOjhNCOjgweIXJe3U4G8K3_kezvvALOwd9s3SMAwvtrQxcXXg4dwW4IH1gLcCGUriW2iPJDFLu2hv8nG2-NRYcRLHoduimdDKLr9-teuJTYusnD5A9_1bBp44dDxEHVXso3ut2pP76I6N_RXlIyRbiMEBMdggBjeIwTVicAsx-E3Ay1sMaMEOLdijBbfQ8hh9eT87nX6IfPONSNAhqSKtYyJiLnNNeA5D8KQPGTe3yhTVhGlBGE-GI811TqVkKtd6PCBcA6Me8JzSJ6hbrAv1FGExTimXwGXhMiOqaC5VDDxYqUSArCU_QHHYuUz4yvSmQcpFFkIQz7Ow45nZ8czt-AF6V8_buNosN85gQTCZZ5iOOWaApxvmPvuHuc_R3eaReIG61fZSvUS3xVW1KrevPPR-AgJcpiU |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Correlation+distance+skip+connection+denoising+autoencoder+%28CDSK-DAE%29+for+speech+feature+enhancement&rft.jtitle=Applied+acoustics&rft.au=Badi%2C+Alzahra&rft.au=Park%2C+Sangwook&rft.au=Han%2C+David+K.&rft.au=Ko%2C+Hanseok&rft.date=2020-06-01&rft.pub=Elsevier+Ltd&rft.issn=0003-682X&rft.eissn=1872-910X&rft.volume=163&rft_id=info:doi/10.1016%2Fj.apacoust.2020.107213&rft.externalDocID=S0003682X19308175 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0003-682X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0003-682X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0003-682X&client=summon |