Correlation distance skip connection denoising autoencoder (CDSK-DAE) for speech feature enhancement

Performance of learning based Automatic Speech Recognition (ASR) is susceptible to noise, especially when it is introduced in the testing data while not presented in the training data. This work focuses on a feature enhancement for noise robust end-to-end ASR system by introducing a novel variant of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied acoustics Jg. 163; S. 107213
Hauptverfasser: Badi, Alzahra, Park, Sangwook, Han, David K., Ko, Hanseok
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier Ltd 01.06.2020
Schlagworte:
ISSN:0003-682X, 1872-910X
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Performance of learning based Automatic Speech Recognition (ASR) is susceptible to noise, especially when it is introduced in the testing data while not presented in the training data. This work focuses on a feature enhancement for noise robust end-to-end ASR system by introducing a novel variant of denoising autoencoder (DAE). The proposed method uses skip connections in both encoder and decoder sides by passing speech information of the target frame from input to the model. It also uses a new objective function in training model that uses a correlation distance measure in penalty terms by measuring dependency of the latent target features and the model (latent features and enhanced features obtained from the DAE). Performance of the proposed method was compared against a conventional model and a state of the art model under both seen and unseen noisy environments of 7 different types of background noise with different SNR levels (0, 5, 10 and 20 dB). The proposed method also is tested using linear and non-linear penalty terms as well, where, they both show an improvement on the overall average WER under noisy conditions both seen and unseen in comparison to the state-of-the-art model.
AbstractList Performance of learning based Automatic Speech Recognition (ASR) is susceptible to noise, especially when it is introduced in the testing data while not presented in the training data. This work focuses on a feature enhancement for noise robust end-to-end ASR system by introducing a novel variant of denoising autoencoder (DAE). The proposed method uses skip connections in both encoder and decoder sides by passing speech information of the target frame from input to the model. It also uses a new objective function in training model that uses a correlation distance measure in penalty terms by measuring dependency of the latent target features and the model (latent features and enhanced features obtained from the DAE). Performance of the proposed method was compared against a conventional model and a state of the art model under both seen and unseen noisy environments of 7 different types of background noise with different SNR levels (0, 5, 10 and 20 dB). The proposed method also is tested using linear and non-linear penalty terms as well, where, they both show an improvement on the overall average WER under noisy conditions both seen and unseen in comparison to the state-of-the-art model.
ArticleNumber 107213
Author Ko, Hanseok
Han, David K.
Park, Sangwook
Badi, Alzahra
Author_xml – sequence: 1
  givenname: Alzahra
  surname: Badi
  fullname: Badi, Alzahra
  organization: Electrical and Computer Engineering, Korea University, Republic of Korea
– sequence: 2
  givenname: Sangwook
  surname: Park
  fullname: Park, Sangwook
  organization: Electrical and Computer Engineering, Johns Hopkins University, United States
– sequence: 3
  givenname: David K.
  surname: Han
  fullname: Han, David K.
  organization: Information Sciences Division, U.S. Army Research Laboratory, United States
– sequence: 4
  givenname: Hanseok
  surname: Ko
  fullname: Ko, Hanseok
  email: hsko@korea.ac.kr
  organization: Electrical and Computer Engineering, Korea University, Republic of Korea
BookMark eNqFkE1LAzEQhoNUsK3-BclRD1uTrLvtggdLWz-w4EGF3sJsdmJT22RJUsF_766rFy89DTPD88L7DEjPOouEnHM24oznV5sR1KDcPsSRYKI9jgVPj0ifT8YiKThb9UifMZYm-USsTsgghE2zMpFlfVLNnPe4hWicpZUJEaxCGj5MTZWzFlX3QOtMMPadwj46tMpV6OnFbP7ylMyni0uqnaehRlRrqhHi3iNFu26zdmjjKTnWsA149juH5O1u8Tp7SJbP94-z6TJRKRcx0ToTKoOq1ALK5qQU8ALaOgWmWhRaiQLG_FqDLtOqKrDUOmcCNMtyBmWaDslNl6u8C8GjlsrEn2rRg9lKzmRrTG7knzHZGpOdsQbP_-G1NzvwX4fB2w7EptynQS-DMo0lrIxvDMrKmUMR3zf4jzE
CitedBy_id crossref_primary_10_1109_ACCESS_2025_3542953
crossref_primary_10_1109_LSP_2022_3203911
crossref_primary_10_1007_s11802_023_5309_y
crossref_primary_10_1016_j_cma_2024_117071
crossref_primary_10_1016_j_renene_2022_05_141
Cites_doi 10.1109/TASL.2011.2109382
10.1109/TASL.2011.2134090
10.1214/009053607000000505
10.1109/ASRU.2015.7404790
10.1007/s00417-006-0391-6
10.1109/ASRU.2017.8268911
10.1006/csla.2001.0174
ContentType Journal Article
Copyright 2020 Elsevier Ltd
Copyright_xml – notice: 2020 Elsevier Ltd
DBID AAYXX
CITATION
DOI 10.1016/j.apacoust.2020.107213
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Physics
EISSN 1872-910X
ExternalDocumentID 10_1016_j_apacoust_2020_107213
S0003682X19308175
GroupedDBID --K
--M
-~X
.~1
0R~
1B1
1~.
1~5
23M
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
AABNK
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
ABFNM
ABMAC
ABNEU
ABTAH
ABXDB
ABYKQ
ACDAQ
ACFVG
ACGFS
ACNNM
ACRLP
ADBBV
ADEZE
ADMUD
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFFNX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AI.
AIEXJ
AIKHN
AITUG
AIVDX
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
GBLVA
HVGLF
HZ~
IHE
J1W
JJJVA
KOM
LY7
M41
MO0
N9A
O-L
O9-
OAUVE
OGIMB
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
ROL
RPZ
SDF
SDG
SDP
SES
SET
SEW
SPC
SPCBC
SPD
SSQ
SST
SSZ
T5K
VH1
WUQ
XPP
ZMT
ZY4
~02
~G-
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABJNI
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
ID FETCH-LOGICAL-c312t-ff52c5adbf2abc31cca19a10729e3f29fc29a714fafb3dd9ebff602af0560ab33
ISICitedReferencesCount 7
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000521507200002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0003-682X
IngestDate Sat Nov 29 07:31:42 EST 2025
Tue Nov 18 20:29:11 EST 2025
Fri Feb 23 02:44:45 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Automatic speech recognition (ASR)
Skip connection Denoising Autoencoder (SK-DAE)
Correlation distance measure (CDM)
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c312t-ff52c5adbf2abc31cca19a10729e3f29fc29a714fafb3dd9ebff602af0560ab33
ParticipantIDs crossref_citationtrail_10_1016_j_apacoust_2020_107213
crossref_primary_10_1016_j_apacoust_2020_107213
elsevier_sciencedirect_doi_10_1016_j_apacoust_2020_107213
PublicationCentury 2000
PublicationDate June 2020
2020-06-00
PublicationDateYYYYMMDD 2020-06-01
PublicationDate_xml – month: 06
  year: 2020
  text: June 2020
PublicationDecade 2020
PublicationTitle Applied acoustics
PublicationYear 2020
Publisher Elsevier Ltd
Publisher_xml – name: Elsevier Ltd
References Jyoti Bora, Kumar Gupta (b0105) 2014; 5
Chan, Jaitly, Le, Vinyals (b0065) 2016
Srivastava, Greff, Schmidhuber (b0120) 2015; 2015
Mikolov, Kombrink, Burget, Černocký, Khudanpur (b0030) 2011
Jia, Shelhamer, Donahue, Karayev, Long, Girshick, Guadarrama, Darrell (b0165) 2014
Mao, Shen, Yang (b0130) 2016
Hsu WN, Zhang Y, Glass J. Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU); 2017. p. 16–23. arXiv:1707.06265
He, Zhang, Ren, Sun (b0145) 2016
Xu, Chen, Gao, Wang, Li, Goel, Carmiel, Povey, Khudanpur (b0035) 2018
Vincent, Larochelle, Bengio, Manzagol (b0110) 2008
Székely, Rizzo, Bakirov (b0100) 2007; 35
Feng, Zhang, Glass (b0085) 2014
ITU-T, ITU-T P.56 Objective measurement of active speech level, Tech. rep.; 2011.
Zhang, Liu, Inoue, Shinoda (b0095) 2018
He, Zhang, Ren, Sun (b0125) 2016
Tu, Zhang (b0140) 2017
Mohamed, Dahl, Hinton (b0015) 2012; 20
Sundermeyer, Schlueter, Ney (b0040) 2012; 2012
Graves, Jaitly (b0050) 2014; 2014
Miao Y, Gowayyed M, Metze F. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding, arXiv preprint arXiv:1507.08240 (2016) 167–174 (2016). arXiv:1507.08240
Dahl, Yu, Deng, Acero (b0010) 2012; 20
Huang, Liu, Van Der Maaten, Weinberger (b0170) 2017
Rao, Peng, Sak, Beaufays (b0025) 2015
Liu, Yang (b0135) 2018
Lu, Tsao, Matsuda, Hori (b0080) 2013; 2013
Hannun A, Case C, Casper J, Catanzaro B, Diamos G, Elsen E, Prenger R, Satheesh S, Sengupta S, Coates A, Ng AY. Deep speech: scaling up end-to-end speech recognition; 2014. arXiv preprint arXiv:1412.5567. arXiv:1412.5567v2. doi:arXiv:1412.5567v2.
.
Veit, Wilber, Belongie (b0150) 2016
Goodman (b0005) 2001; 15
Mohamed, Dahl, Hinton (b0020) 2009; 1
European Telecommunications Standards Institute, ETSI: EG 202 396–1 v1.2.2, Tech. rep.; 2008.
Bahdanau D, Serdyuk D, Brakel P, Ke NR, Chorowski J, Courville A, Bengio Y. Task loss estimation for sequence prediction; 2015, arXiv preprint arXiv:1511.06456. arXiv:1511.06456
Graves, Fernandezl, Gomez, Schmidhuber (b0075) 2006
Zhang, Chan, Jaitly (b0055) 2017
Lu, Matsuda, Hori, Kashioka (b0115) 2012; 2012
Graves (10.1016/j.apacoust.2020.107213_b0075) 2006
Srivastava (10.1016/j.apacoust.2020.107213_b0120) 2015; 2015
Mao (10.1016/j.apacoust.2020.107213_b0130) 2016
Dahl (10.1016/j.apacoust.2020.107213_b0010) 2012; 20
Zhang (10.1016/j.apacoust.2020.107213_b0095) 2018
Zhang (10.1016/j.apacoust.2020.107213_b0055) 2017
He (10.1016/j.apacoust.2020.107213_b0145) 2016
Xu (10.1016/j.apacoust.2020.107213_b0035) 2018
10.1016/j.apacoust.2020.107213_b0045
Veit (10.1016/j.apacoust.2020.107213_b0150) 2016
He (10.1016/j.apacoust.2020.107213_b0125) 2016
Jia (10.1016/j.apacoust.2020.107213_b0165) 2014
Goodman (10.1016/j.apacoust.2020.107213_b0005) 2001; 15
Sundermeyer (10.1016/j.apacoust.2020.107213_b0040) 2012; 2012
10.1016/j.apacoust.2020.107213_b0060
Mohamed (10.1016/j.apacoust.2020.107213_b0015) 2012; 20
10.1016/j.apacoust.2020.107213_b0160
Jyoti Bora (10.1016/j.apacoust.2020.107213_b0105) 2014; 5
Rao (10.1016/j.apacoust.2020.107213_b0025) 2015
Lu (10.1016/j.apacoust.2020.107213_b0080) 2013; 2013
Mikolov (10.1016/j.apacoust.2020.107213_b0030) 2011
Chan (10.1016/j.apacoust.2020.107213_b0065) 2016
Tu (10.1016/j.apacoust.2020.107213_b0140) 2017
Huang (10.1016/j.apacoust.2020.107213_b0170) 2017
Székely (10.1016/j.apacoust.2020.107213_b0100) 2007; 35
Graves (10.1016/j.apacoust.2020.107213_b0050) 2014; 2014
Lu (10.1016/j.apacoust.2020.107213_b0115) 2012; 2012
10.1016/j.apacoust.2020.107213_b0155
Feng (10.1016/j.apacoust.2020.107213_b0085) 2014
Vincent (10.1016/j.apacoust.2020.107213_b0110) 2008
10.1016/j.apacoust.2020.107213_b0090
10.1016/j.apacoust.2020.107213_b0070
Mohamed (10.1016/j.apacoust.2020.107213_b0020) 2009; 1
Liu (10.1016/j.apacoust.2020.107213_b0135) 2018
References_xml – start-page: 4225
  year: 2015
  end-page: 4229
  ident: b0025
  article-title: Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks
  publication-title: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
– volume: 20
  start-page: 30
  year: 2012
  end-page: 42
  ident: b0010
  article-title: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
  publication-title: IEEE Trans Audio, Speech Language Process
– volume: 5
  start-page: 2501
  year: 2014
  end-page: 2506
  ident: b0105
  article-title: Effect of different distance measures on the performance of K-Means Algorithm: an experimental study in Matlab
  publication-title: Int. J. Comput. Sci. Information Technol. (IJCSIT)
– volume: 2012
  start-page: 194
  year: 2012
  end-page: 197
  ident: b0040
  article-title: LSTM neural networks for language modeling
  publication-title: INTERSPEECH
– volume: 1
  start-page: 39
  year: 2009
  ident: b0020
  article-title: Deep belief networks for phone recognition
  publication-title: NIPS Workshop Deep Learning for Speech Recognition and Related Applications
– start-page: 5599
  year: 2018
  end-page: 5603
  ident: b0095
  article-title: Multi-Task Autoencoder for Noise-Robust Speech Recognition
  publication-title: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
– start-page: 2802
  year: 2016
  end-page: 2810
  ident: b0130
  article-title: Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections
  publication-title: Advances in Neural Information Processing Systems (NIPS)
– volume: 2014
  start-page: 1764
  year: 2014
  end-page: 1772
  ident: b0050
  article-title: Towards end-to-end speech recognition with recurrent neural networks
  publication-title: International Conference on Machine Learning
– reference: Hsu WN, Zhang Y, Glass J. Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU); 2017. p. 16–23. arXiv:1707.06265,
– start-page: 4960
  year: 2016
  end-page: 4964
  ident: b0065
  article-title: Listen, attend and spell: A neural network for large vocabulary conversational speech recognition
  publication-title: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
– reference: ITU-T, ITU-T P.56 Objective measurement of active speech level, Tech. rep.; 2011.
– start-page: 1759
  year: 2014
  end-page: 1763
  ident: b0085
  article-title: Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition
  publication-title: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
– volume: 35
  start-page: 2769
  year: 2007
  end-page: 2794
  ident: b0100
  article-title: Measuring and testing dependency by correlation of distances
  publication-title: Ann. Stat.
– start-page: 4845
  year: 2017
  end-page: 4849
  ident: b0055
  article-title: Very deep convolutional networks for end-to-end speech recognition
  publication-title: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
– reference: Bahdanau D, Serdyuk D, Brakel P, Ke NR, Chorowski J, Courville A, Bengio Y. Task loss estimation for sequence prediction; 2015, arXiv preprint arXiv:1511.06456. arXiv:1511.06456,
– start-page: 675
  year: 2014
  end-page: 678
  ident: b0165
  article-title: Caffe: Convolutional architecture for fast feature embedding
  publication-title: ACM Multimedia
– volume: 2015
  start-page: 2377
  year: 2015
  end-page: 2385
  ident: b0120
  article-title: Training very deep networks
  publication-title: Advances in Neural Information Processing Systems (NIPS)
– reference: European Telecommunications Standards Institute, ETSI: EG 202 396–1 v1.2.2, Tech. rep.; 2008.
– volume: 2013
  start-page: 436
  year: 2013
  end-page: 440
  ident: b0080
  article-title: Speech enhancement based on deep denoising autoencoder
  publication-title: INTERSPEECH
– start-page: 5528
  year: 2011
  end-page: 5531
  ident: b0030
  article-title: Extensions of recurrent neural network language model
  publication-title: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
– volume: 2012
  start-page: 1504
  year: 2012
  end-page: 1507
  ident: b0115
  article-title: Speech restoration based on deep learning autoencoder with layer-wised pretraining
  publication-title: INTERSPEECH
– reference: Miao Y, Gowayyed M, Metze F. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding, arXiv preprint arXiv:1507.08240 (2016) 167–174 (2016). arXiv:1507.08240,
– volume: 15
  start-page: 403
  year: 2001
  end-page: 434
  ident: b0005
  article-title: A bit of progress in language modeling
  publication-title: Comput Speech Language
– reference: Hannun A, Case C, Casper J, Catanzaro B, Diamos G, Elsen E, Prenger R, Satheesh S, Sengupta S, Coates A, Ng AY. Deep speech: scaling up end-to-end speech recognition; 2014. arXiv preprint arXiv:1412.5567. arXiv:1412.5567v2. doi:arXiv:1412.5567v2.
– start-page: 4700
  year: 2017
  end-page: 4708
  ident: b0170
  article-title: Densely connected convolutional networks
  publication-title: Proceedings of the IEEE conference on computer vision and pattern recognition
– start-page: 1096
  year: 2008
  end-page: 1103
  ident: b0110
  article-title: Extracting and composing robust features with denoising autoencoders
  publication-title: 25th International Conference on Machine Learning
– start-page: 5565
  year: 2017
  end-page: 5569
  ident: b0140
  article-title: Speech enhancement based on deep neural networks with skip connections
  publication-title: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
– reference: .
– start-page: 5929
  year: 2018
  end-page: 5933
  ident: b0035
  article-title: A pruned rnnlm lattice-rescoring algorithm for automatic speech recognition
  publication-title: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
– start-page: 369
  year: 2006
  end-page: 376
  ident: b0075
  article-title: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks
  publication-title: 23rd International Conference on Machine Learning (ICML)
– start-page: 630
  year: 2016
  end-page: 645
  ident: b0145
  article-title: Identity mappings in deep residual networks
  publication-title: European Conference on Computer Vision (ECCV)
– start-page: 770
  year: 2016
  end-page: 778
  ident: b0125
  article-title: Deep residual learning for image recognition
  publication-title: IEEE Conference on Computer Vision and Pattern Recognition
– volume: 20
  start-page: 14
  year: 2012
  end-page: 22
  ident: b0015
  article-title: Acoustic modeling using deep belief networks
  publication-title: IEEE Trans Audio Speech Language Process
– start-page: 550
  year: 2016
  end-page: 558
  ident: b0150
  article-title: Residual networks behave like ensembles of relatively shallow networks
  publication-title: Advances in Neural Information Processing Systems (NIPS)
– start-page: 773
  year: 2018
  end-page: 778
  ident: b0135
  article-title: Denoising auto-encoder with recurrent skip connections and residual regression for music source separation
  publication-title: 17th IEEE International Conference on Machine Learning and Applications (ICMLA)
– start-page: 4845
  year: 2017
  ident: 10.1016/j.apacoust.2020.107213_b0055
  article-title: Very deep convolutional networks for end-to-end speech recognition
– ident: 10.1016/j.apacoust.2020.107213_b0160
– start-page: 675
  year: 2014
  ident: 10.1016/j.apacoust.2020.107213_b0165
  article-title: Caffe: Convolutional architecture for fast feature embedding
– start-page: 1096
  year: 2008
  ident: 10.1016/j.apacoust.2020.107213_b0110
  article-title: Extracting and composing robust features with denoising autoencoders
– volume: 2014
  start-page: 1764
  year: 2014
  ident: 10.1016/j.apacoust.2020.107213_b0050
  article-title: Towards end-to-end speech recognition with recurrent neural networks
  publication-title: International Conference on Machine Learning
– volume: 20
  start-page: 14
  issue: 1
  year: 2012
  ident: 10.1016/j.apacoust.2020.107213_b0015
  article-title: Acoustic modeling using deep belief networks
  publication-title: IEEE Trans Audio Speech Language Process
  doi: 10.1109/TASL.2011.2109382
– volume: 2012
  start-page: 194
  year: 2012
  ident: 10.1016/j.apacoust.2020.107213_b0040
  article-title: LSTM neural networks for language modeling
  publication-title: INTERSPEECH
– volume: 2013
  start-page: 436
  year: 2013
  ident: 10.1016/j.apacoust.2020.107213_b0080
  article-title: Speech enhancement based on deep denoising autoencoder
  publication-title: INTERSPEECH
– volume: 20
  start-page: 30
  issue: 1
  year: 2012
  ident: 10.1016/j.apacoust.2020.107213_b0010
  article-title: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
  publication-title: IEEE Trans Audio, Speech Language Process
  doi: 10.1109/TASL.2011.2134090
– volume: 35
  start-page: 2769
  issue: 6
  year: 2007
  ident: 10.1016/j.apacoust.2020.107213_b0100
  article-title: Measuring and testing dependency by correlation of distances
  publication-title: Ann. Stat.
  doi: 10.1214/009053607000000505
– volume: 2012
  start-page: 1504
  year: 2012
  ident: 10.1016/j.apacoust.2020.107213_b0115
  article-title: Speech restoration based on deep learning autoencoder with layer-wised pretraining
  publication-title: INTERSPEECH
– volume: 2015
  start-page: 2377
  year: 2015
  ident: 10.1016/j.apacoust.2020.107213_b0120
  article-title: Training very deep networks
  publication-title: Advances in Neural Information Processing Systems (NIPS)
– volume: 1
  start-page: 39
  year: 2009
  ident: 10.1016/j.apacoust.2020.107213_b0020
  article-title: Deep belief networks for phone recognition
  publication-title: NIPS Workshop Deep Learning for Speech Recognition and Related Applications
– ident: 10.1016/j.apacoust.2020.107213_b0045
– start-page: 4960
  year: 2016
  ident: 10.1016/j.apacoust.2020.107213_b0065
  article-title: Listen, attend and spell: A neural network for large vocabulary conversational speech recognition
– start-page: 2802
  year: 2016
  ident: 10.1016/j.apacoust.2020.107213_b0130
  article-title: Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections
– start-page: 1759
  year: 2014
  ident: 10.1016/j.apacoust.2020.107213_b0085
  article-title: Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition
– start-page: 4700
  year: 2017
  ident: 10.1016/j.apacoust.2020.107213_b0170
  article-title: Densely connected convolutional networks
– ident: 10.1016/j.apacoust.2020.107213_b0060
  doi: 10.1109/ASRU.2015.7404790
– start-page: 369
  year: 2006
  ident: 10.1016/j.apacoust.2020.107213_b0075
  article-title: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks
– start-page: 630
  year: 2016
  ident: 10.1016/j.apacoust.2020.107213_b0145
  article-title: Identity mappings in deep residual networks
– start-page: 5528
  year: 2011
  ident: 10.1016/j.apacoust.2020.107213_b0030
  article-title: Extensions of recurrent neural network language model
– start-page: 4225
  year: 2015
  ident: 10.1016/j.apacoust.2020.107213_b0025
  article-title: Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks
– start-page: 5929
  year: 2018
  ident: 10.1016/j.apacoust.2020.107213_b0035
  article-title: A pruned rnnlm lattice-rescoring algorithm for automatic speech recognition
– start-page: 550
  year: 2016
  ident: 10.1016/j.apacoust.2020.107213_b0150
  article-title: Residual networks behave like ensembles of relatively shallow networks
  publication-title: Advances in Neural Information Processing Systems (NIPS)
– start-page: 5565
  year: 2017
  ident: 10.1016/j.apacoust.2020.107213_b0140
  article-title: Speech enhancement based on deep neural networks with skip connections
– start-page: 5599
  year: 2018
  ident: 10.1016/j.apacoust.2020.107213_b0095
  article-title: Multi-Task Autoencoder for Noise-Robust Speech Recognition
– ident: 10.1016/j.apacoust.2020.107213_b0070
  doi: 10.1007/s00417-006-0391-6
– start-page: 770
  year: 2016
  ident: 10.1016/j.apacoust.2020.107213_b0125
  article-title: Deep residual learning for image recognition
  publication-title: IEEE Conference on Computer Vision and Pattern Recognition
– ident: 10.1016/j.apacoust.2020.107213_b0155
– ident: 10.1016/j.apacoust.2020.107213_b0090
  doi: 10.1109/ASRU.2017.8268911
– volume: 5
  start-page: 2501
  issue: 2
  year: 2014
  ident: 10.1016/j.apacoust.2020.107213_b0105
  article-title: Effect of different distance measures on the performance of K-Means Algorithm: an experimental study in Matlab
  publication-title: Int. J. Comput. Sci. Information Technol. (IJCSIT)
– volume: 15
  start-page: 403
  issue: 4
  year: 2001
  ident: 10.1016/j.apacoust.2020.107213_b0005
  article-title: A bit of progress in language modeling
  publication-title: Comput Speech Language
  doi: 10.1006/csla.2001.0174
– start-page: 773
  year: 2018
  ident: 10.1016/j.apacoust.2020.107213_b0135
  article-title: Denoising auto-encoder with recurrent skip connections and residual regression for music source separation
SSID ssj0000255
Score 2.2943325
Snippet Performance of learning based Automatic Speech Recognition (ASR) is susceptible to noise, especially when it is introduced in the testing data while not...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 107213
SubjectTerms Automatic speech recognition (ASR)
Correlation distance measure (CDM)
Skip connection Denoising Autoencoder (SK-DAE)
Title Correlation distance skip connection denoising autoencoder (CDSK-DAE) for speech feature enhancement
URI https://dx.doi.org/10.1016/j.apacoust.2020.107213
Volume 163
WOSCitedRecordID wos000521507200002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1872-910X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000255
  issn: 0003-682X
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9NAEF6FFCQ4ICggykt74ACKHJLdOPYeozQIKKoqpaDcrPU-mpTKiWK3RPx6Zl-2EZUKQlysaJ31yjufZz6P54HQazqWIymYjBLg1tFI62HEk1REYHnhBI85senRXz8nx8fpYsFOOp1dyIW5ukiKIt3t2Oa_ihrGQNgmdfYvxF1fFAbgNwgdjiB2OP6R4Kem34aLcDOfXyqbE1B-W21MhHmhXGtw0DbrlXUT8MtqbYpZ2poSJJ0ezo-iw8nMeAtMBGK5UUose1rZAqA9VSzN9epwmVC_1nNZUK-2O1jje-dy5fJofvDltjYBJz5Ce86Ls--B5ls9WNSB9r2jfm0NrD8XTpbK_9f7KcigiadyzrOQQNNEKzmFTKNxajuqgzlyOjhNCOjgweIXJe3U4G8K3_kezvvALOwd9s3SMAwvtrQxcXXg4dwW4IH1gLcCGUriW2iPJDFLu2hv8nG2-NRYcRLHoduimdDKLr9-teuJTYusnD5A9_1bBp44dDxEHVXso3ut2pP76I6N_RXlIyRbiMEBMdggBjeIwTVicAsx-E3Ay1sMaMEOLdijBbfQ8hh9eT87nX6IfPONSNAhqSKtYyJiLnNNeA5D8KQPGTe3yhTVhGlBGE-GI811TqVkKtd6PCBcA6Me8JzSJ6hbrAv1FGExTimXwGXhMiOqaC5VDDxYqUSArCU_QHHYuUz4yvSmQcpFFkIQz7Ow45nZ8czt-AF6V8_buNosN85gQTCZZ5iOOWaApxvmPvuHuc_R3eaReIG61fZSvUS3xVW1KrevPPR-AgJcpiU
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Correlation+distance+skip+connection+denoising+autoencoder+%28CDSK-DAE%29+for+speech+feature+enhancement&rft.jtitle=Applied+acoustics&rft.au=Badi%2C+Alzahra&rft.au=Park%2C+Sangwook&rft.au=Han%2C+David+K.&rft.au=Ko%2C+Hanseok&rft.date=2020-06-01&rft.pub=Elsevier+Ltd&rft.issn=0003-682X&rft.eissn=1872-910X&rft.volume=163&rft_id=info:doi/10.1016%2Fj.apacoust.2020.107213&rft.externalDocID=S0003682X19308175
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0003-682X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0003-682X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0003-682X&client=summon