Attention based convolutional recurrent neural network for environmental sound classification

[Display omitted] •We employ an attention model to automatically focus on the semantically relevant frames for ESC.•We propose a novel convolutional RNN model to analyze temporal relations for ESC.•We apply a data augmentation pipeline to further improve perfromance for ESC. Environmental sound clas...

Full description

Saved in:
Bibliographic Details
Published in:Neurocomputing (Amsterdam) Vol. 453; pp. 896 - 903
Main Authors: Zhang, Zhichao, Xu, Shugong, Zhang, Shunqing, Qiao, Tianhao, Cao, Shan
Format: Journal Article
Language:English
Published: Elsevier B.V 17.09.2021
Subjects:
ISSN:0925-2312, 1872-8286
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract [Display omitted] •We employ an attention model to automatically focus on the semantically relevant frames for ESC.•We propose a novel convolutional RNN model to analyze temporal relations for ESC.•We apply a data augmentation pipeline to further improve perfromance for ESC. Environmental sound classification (ESC) is a challenging problem due to the complexity of sounds. The classification performance is heavily dependent on the effectiveness of representative features extracted from the environmental sounds. However, ESC often suffers from the semantically irrelevant frames and silent frames. In order to deal with this, we employ a frame-level attention model to focus on the semantically relevant frames and salient frames. Specifically, we first propose a convolutional recurrent neural network to learn spectro-temporal features and temporal correlations. Then, we extend our convolutional RNN model with a frame-level attention mechanism to learn discriminative feature representations for ESC. We investigated the classification performance when using different attention scaling function and applying different layers. Experiments were conducted on ESC-50 and ESC-10 datasets. Experimental results demonstrated the effectiveness of the proposed method and our method achieved the state-of-the-art or competitive classification accuracy with lower computational complexity. We also visualized our attention results and observed that the proposed attention mechanism was able to lead the network tofocus on the semantically relevant parts of environmental sounds.
AbstractList [Display omitted] •We employ an attention model to automatically focus on the semantically relevant frames for ESC.•We propose a novel convolutional RNN model to analyze temporal relations for ESC.•We apply a data augmentation pipeline to further improve perfromance for ESC. Environmental sound classification (ESC) is a challenging problem due to the complexity of sounds. The classification performance is heavily dependent on the effectiveness of representative features extracted from the environmental sounds. However, ESC often suffers from the semantically irrelevant frames and silent frames. In order to deal with this, we employ a frame-level attention model to focus on the semantically relevant frames and salient frames. Specifically, we first propose a convolutional recurrent neural network to learn spectro-temporal features and temporal correlations. Then, we extend our convolutional RNN model with a frame-level attention mechanism to learn discriminative feature representations for ESC. We investigated the classification performance when using different attention scaling function and applying different layers. Experiments were conducted on ESC-50 and ESC-10 datasets. Experimental results demonstrated the effectiveness of the proposed method and our method achieved the state-of-the-art or competitive classification accuracy with lower computational complexity. We also visualized our attention results and observed that the proposed attention mechanism was able to lead the network tofocus on the semantically relevant parts of environmental sounds.
Author Cao, Shan
Zhang, Zhichao
Zhang, Shunqing
Xu, Shugong
Qiao, Tianhao
Author_xml – sequence: 1
  givenname: Zhichao
  surname: Zhang
  fullname: Zhang, Zhichao
– sequence: 2
  givenname: Shugong
  surname: Xu
  fullname: Xu, Shugong
  email: shugong@shu.edu.cn
– sequence: 3
  givenname: Shunqing
  surname: Zhang
  fullname: Zhang, Shunqing
– sequence: 4
  givenname: Tianhao
  surname: Qiao
  fullname: Qiao, Tianhao
– sequence: 5
  givenname: Shan
  surname: Cao
  fullname: Cao, Shan
BookMark eNqFkM9OAyEQh4mpiW31DTzsC-wKFHZZDyZN47-kiRc9GgLsbELdggG2xreXtZ486GmSmfl-A98CzZx3gNAlwRXBpL7aVQ5G4_cVxRRXWFS4bk_QnIiGloKKeobmuKW8pCtCz9Aixh3GpCG0naPXdUrgkvWu0CpCVxjvDn4Yp44aigBmDCEvFPlCyA0H6cOHt6L3oQB3sMG7fR7nSfSjy_igYrS9NWpKOEenvRoiXPzUJXq5u33ePJTbp_vHzXpbGkZEKqlSda-50Jh3RmndMoM5gFC051wzrjElxCjAq5blxYbVjTZcUwW6J6yjqyW6Puaa4GMM0Etj0_cLUlB2kATLSZTcyaMoOYmSWMgsKsPsF_we7F6Fz_-wmyMG-WMHC0FGY8EZ6Gy2lmTn7d8BX_bvi0k
CitedBy_id crossref_primary_10_1109_ACCESS_2022_3232807
crossref_primary_10_1007_s11042_025_20820_3
crossref_primary_10_1109_ACCESS_2024_3459423
crossref_primary_10_3390_app15158413
crossref_primary_10_1016_j_oceaneng_2023_115863
crossref_primary_10_3390_jmse12101862
crossref_primary_10_1016_j_asoc_2024_112619
crossref_primary_10_1016_j_dsp_2023_104170
crossref_primary_10_1016_j_engappai_2025_110622
crossref_primary_10_1007_s11042_023_17066_2
crossref_primary_10_1038_s41598_022_10382_x
crossref_primary_10_1007_s11265_021_01702_x
crossref_primary_10_1016_j_apacoust_2025_110636
crossref_primary_10_3390_app13169358
crossref_primary_10_1109_ACCESS_2022_3222495
crossref_primary_10_1109_ACCESS_2025_3590626
crossref_primary_10_1016_j_eswa_2024_123768
crossref_primary_10_1038_s41598_022_13237_7
crossref_primary_10_1109_TASLP_2023_3244507
crossref_primary_10_32604_cmc_2023_032719
crossref_primary_10_1007_s11042_023_16024_2
crossref_primary_10_1016_j_artmed_2024_102903
crossref_primary_10_1109_TIM_2023_3260282
crossref_primary_10_1007_s10462_023_10625_x
crossref_primary_10_1016_j_compeleceng_2022_108252
crossref_primary_10_1007_s11042_024_18421_7
crossref_primary_10_1016_j_apacoust_2022_108813
crossref_primary_10_3390_electronics11223743
crossref_primary_10_3390_s22228608
crossref_primary_10_1016_j_apacoust_2021_108437
crossref_primary_10_1007_s11042_023_17982_3
crossref_primary_10_1016_j_culher_2024_06_011
crossref_primary_10_1016_j_ecoinf_2024_102471
crossref_primary_10_1016_j_bspc_2024_107086
crossref_primary_10_3390_s22186818
crossref_primary_10_1016_j_apacoust_2024_110463
crossref_primary_10_1109_TCDS_2022_3222350
crossref_primary_10_1016_j_csl_2025_101868
crossref_primary_10_1016_j_dsp_2025_105234
crossref_primary_10_1007_s42044_025_00289_x
crossref_primary_10_3389_fcomp_2025_1517346
crossref_primary_10_1016_j_ecoinf_2023_102065
crossref_primary_10_3390_app15052758
crossref_primary_10_1177_14613484251347090
crossref_primary_10_1049_cit2_12375
crossref_primary_10_1016_j_apacoust_2022_109168
crossref_primary_10_1109_ACCESS_2022_3185224
crossref_primary_10_1016_j_procs_2023_12_111
crossref_primary_10_1142_S0219649223500284
crossref_primary_10_3390_s22228874
crossref_primary_10_1016_j_neucom_2024_128727
crossref_primary_10_1007_s43926_023_00049_y
crossref_primary_10_3390_s22155566
crossref_primary_10_1007_s10489_023_04973_y
crossref_primary_10_1016_j_scitotenv_2024_176083
crossref_primary_10_1371_journal_pone_0274395
crossref_primary_10_1016_j_neucom_2024_128136
crossref_primary_10_1016_j_asoc_2023_110423
crossref_primary_10_1109_ACCESS_2023_3318015
crossref_primary_10_1186_s12880_022_00933_z
crossref_primary_10_3390_fi15020065
crossref_primary_10_1016_j_neucom_2022_07_056
crossref_primary_10_1007_s11042_022_11994_1
crossref_primary_10_3390_app12073502
crossref_primary_10_3390_app14219711
crossref_primary_10_3390_s22093118
crossref_primary_10_1016_j_iswa_2022_200115
crossref_primary_10_3390_d16080509
crossref_primary_10_3390_s22124453
Cites_doi 10.1109/MSP.2010.937498
10.1109/TASLP.2017.2778423
10.1109/TASLP.2017.2690570
10.1109/TASL.2009.2017438
10.1016/j.asoc.2009.12.033
10.1145/2733373.2806390
10.1109/ASPAA.2005.1540194
10.1016/j.procs.2017.08.250
10.1109/IJCNN.2018.8489641
10.1109/TASLP.2015.2389618
10.21437/Interspeech.2019-3019
10.1109/MSP.2014.2326181
10.1109/TMM.2012.2199972
10.1109/MLSP.2015.7324337
10.1109/ICASSP.2017.7952190
10.3390/app8071152
ContentType Journal Article
Copyright 2020 The Authors
Copyright_xml – notice: 2020 The Authors
DBID 6I.
AAFTH
AAYXX
CITATION
DOI 10.1016/j.neucom.2020.08.069
DatabaseName ScienceDirect Open Access Titles
Elsevier:ScienceDirect:Open Access
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1872-8286
EndPage 903
ExternalDocumentID 10_1016_j_neucom_2020_08_069
S0925231220313618
GroupedDBID ---
--K
--M
.DC
.~1
0R~
123
1B1
1~.
1~5
4.4
457
4G.
53G
5VS
6I.
7-5
71M
8P~
9JM
9JN
AABNK
AACTN
AADPK
AAEDT
AAEDW
AAFTH
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAXLA
AAXUO
AAYFN
ABBOA
ABCQJ
ABFNM
ABJNI
ABMAC
ABYKQ
ACDAQ
ACGFS
ACRLP
ACZNC
ADBBV
ADEZE
AEBSH
AEKER
AENEX
AFKWA
AFTJW
AFXIZ
AGHFR
AGUBO
AGWIK
AGYEJ
AHHHB
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
AXJTR
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EO8
EO9
EP2
EP3
F5P
FDB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
IHE
J1W
KOM
LG9
M41
MO0
MOBAO
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
ROL
RPZ
SDF
SDG
SDP
SES
SPC
SPCBC
SSN
SSV
SSZ
T5K
ZMT
~G-
29N
9DU
AAQXK
AATTM
AAXKI
AAYWO
AAYXX
ABWVN
ABXDB
ACLOT
ACNNM
ACRPL
ACVFH
ADCNI
ADJOM
ADMUD
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
ASPBG
AVWKF
AZFZN
CITATION
EFKBS
EJD
FEDTE
FGOYB
HLZ
HVGLF
HZ~
R2-
SBC
SEW
WUQ
XPP
~HD
ID FETCH-LOGICAL-c418t-2aa6fb58b05dcabb94c05ee8a2f55b45b0211cae03946fb7467bc5b2aebf14d23
ISICitedReferencesCount 81
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000663418300009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0925-2312
IngestDate Sat Nov 29 07:15:37 EST 2025
Tue Nov 18 21:44:12 EST 2025
Fri Feb 23 02:43:48 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords Convolutional recurrent neural network
Attention mechanism
Environmental sound classification
Language English
License This is an open access article under the CC BY-NC-ND license.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c418t-2aa6fb58b05dcabb94c05ee8a2f55b45b0211cae03946fb7467bc5b2aebf14d23
OpenAccessLink https://dx.doi.org/10.1016/j.neucom.2020.08.069
PageCount 8
ParticipantIDs crossref_citationtrail_10_1016_j_neucom_2020_08_069
crossref_primary_10_1016_j_neucom_2020_08_069
elsevier_sciencedirect_doi_10_1016_j_neucom_2020_08_069
PublicationCentury 2000
PublicationDate 2021-09-17
PublicationDateYYYYMMDD 2021-09-17
PublicationDate_xml – month: 09
  year: 2021
  text: 2021-09-17
  day: 17
PublicationDecade 2020
PublicationTitle Neurocomputing (Amsterdam)
PublicationYear 2021
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References W. Jun, L. Shengchen, Self-Attention Mechanism Based System for DCASE2018 Challenge Task1 and Task4. DCASE2018 Challenge, Tech. Rep, 2018
Z. Ren, et. al., Attention-based Convolutional Neural Networks for Acoustic Scene Classification. DCASE2018 Challenge, Tech. Rep., 2018.
McLoughlin, Zhang, Xie, Song, Xiao (b0085) 2015; 23
Geiger, Helwani (b0050) 2015
H. Zhang, M. Cisse, Y.N. Dauphin, D. Lopez-Paz, Mixup: Beyond Empirical Risk Minimization, 2017. arXiv preprint arXiv:1710.09412.
K.J. Piczak, ESC: Dataset for Environmental Sound Classification, in: Proc. 23rd ACM Int. Conf. Multimedia, 2015, pp. 1015–1018.
Valero, Alias (b0135) 2012; 14
Bisot, Serizel, Essid, Richard (b0020) 2017; 25
Zhang, Xu, Qiao, Zhang, Cao (b0160) 2019
D. Bahdanau, K. Cho, Y. Bengio, Neural Machine Translation by Jointly Learning to Align and Translate, 2014. arXiv preprint arXiv:1409.0473
R. Radhakrishnan, A. Divakaran, A. Smaragdis, Audio Analysis for Surveillance Applications, in: Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., 2005, pp. 158–161.
Chu, Narayanan, Kuo (b0035) 2009; 17
K.J. Piczak, Environmental Sound Classification with Convolutional Neural Networks, in: Proc. 25th Int. Workshop Mach. Learning Signal Process., 2015, pp. 1–6.
B. Sankaran, H. Mi, Y. Al-Onaizan, A. Ittycheriah, Temporal Attention Model for Neural Machine Translation, 2016. arXiv preprint arXiv:1608.02927
Dhanalakshmi, Palanivel, Ramalingam (b0045) 2011; 11
Guo, Xu, Li, Alwan (b0055) 2017
Vacher, Serignat, Chaillol (b0130) 2007
Y. Tokozume, Y. Ushiku, T. Harada, Learning from Between-Class Examples for Deep Sound Recognition, 2017. arXiv preprint arXiv:1711.10282.
T.H. Vu, J.C. Wang, Acoustic Scene and Event Recognition Using Recurrent Neural Networks. DCASE2016 Challenge, Tech. Rep., 2016.
Yang, Yang, Dyer, He, Smola, Hovy (b0145) 2016
Barchiesi, Giannoulis, Stowell, Plumbley (b0015) 2015; 32
Zhang, Xu, Cao, Zhang (b0155) 2018
Li, Yao, Hu, Liu, Yao, Hu (b0070) 2018; 8
Lyon (b0080) 2010; 27
J.K. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, Y. Bengio, Attention-based Models for Speech Recognition, in: Proc. Int. Conf. Neural Inf. Process. Syst., 2015, pp. 577–585.
Mesaros, Heittola, Benetos, Foster, Lagrange, Virtanen, Plumbley (b0090) 2018; 26
Boddapati, Petef, Rasmusson, Lundberg (b0025) 2017; 112
S. Ioffe, C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, 2015. arXiv preprint arXiv:1502.03167.
W. Dai, C. Dai, S. Qu, J. Li, S. Das, Very Deep Convolutional Neural Networks for Raw Waveforms, in: Proc. Int. Conf. Acoust., Speech, Signal Process., 2017, pp. 421–425.
B. Zhu, C. Wang, F. Liu, J. Lei, Z. Lu, Y. Peng, Learning Environmental Sounds with Multi-scale Convolutional Neural Network, 2018. arXiv preprint arXiv:1803.10219
X. Li, V. Chebiyyam, K. Kirchhoff, Multi-stream Network with Temporal Attention for Environmental Sound Classification, 2019. arXiv preprint arXiv:1901.08608.
Aytar, Vondrick, Torralba (b0005) 2016
Piczak (b0105) 2015
10.1016/j.neucom.2020.08.069_b0125
Vacher (10.1016/j.neucom.2020.08.069_b0130) 2007
Zhang (10.1016/j.neucom.2020.08.069_b0160) 2019
10.1016/j.neucom.2020.08.069_b0100
Barchiesi (10.1016/j.neucom.2020.08.069_b0015) 2015; 32
10.1016/j.neucom.2020.08.069_b0040
Zhang (10.1016/j.neucom.2020.08.069_b0155) 2018
Li (10.1016/j.neucom.2020.08.069_b0070) 2018; 8
10.1016/j.neucom.2020.08.069_b0060
10.1016/j.neucom.2020.08.069_b0165
10.1016/j.neucom.2020.08.069_b0065
10.1016/j.neucom.2020.08.069_b0120
10.1016/j.neucom.2020.08.069_b0140
Yang (10.1016/j.neucom.2020.08.069_b0145) 2016
Boddapati (10.1016/j.neucom.2020.08.069_b0025) 2017; 112
Geiger (10.1016/j.neucom.2020.08.069_b0050) 2015
Lyon (10.1016/j.neucom.2020.08.069_b0080) 2010; 27
Bisot (10.1016/j.neucom.2020.08.069_b0020) 2017; 25
Dhanalakshmi (10.1016/j.neucom.2020.08.069_b0045) 2011; 11
McLoughlin (10.1016/j.neucom.2020.08.069_b0085) 2015; 23
10.1016/j.neucom.2020.08.069_b0115
10.1016/j.neucom.2020.08.069_b0095
10.1016/j.neucom.2020.08.069_b0150
10.1016/j.neucom.2020.08.069_b0110
10.1016/j.neucom.2020.08.069_b0010
10.1016/j.neucom.2020.08.069_b0075
Piczak (10.1016/j.neucom.2020.08.069_b0105) 2015
10.1016/j.neucom.2020.08.069_b0030
Chu (10.1016/j.neucom.2020.08.069_b0035) 2009; 17
Mesaros (10.1016/j.neucom.2020.08.069_b0090) 2018; 26
Guo (10.1016/j.neucom.2020.08.069_b0055) 2017
Valero (10.1016/j.neucom.2020.08.069_b0135) 2012; 14
Aytar (10.1016/j.neucom.2020.08.069_b0005) 2016
References_xml – reference: S. Ioffe, C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, 2015. arXiv preprint arXiv:1502.03167.
– volume: 14
  start-page: 1684
  year: 2012
  end-page: 1689
  ident: b0135
  article-title: Gammatone cepstral coefficients: biologically inspired features for non-speech audio classification
  publication-title: IEEE Trans. Multimedia
– volume: 112
  start-page: 2048
  year: 2017
  end-page: 2056
  ident: b0025
  article-title: Classifying environmental sounds using image recognition networks
  publication-title: Proc. Comput. Sci.
– volume: 26
  start-page: 379
  year: 2018
  end-page: 393
  ident: b0090
  article-title: Detection and classification of acoustic scenes and events: outcome of the DCASE 2016 challenge
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
– reference: Z. Ren, et. al., Attention-based Convolutional Neural Networks for Acoustic Scene Classification. DCASE2018 Challenge, Tech. Rep., 2018.
– volume: 25
  start-page: 1216
  year: 2017
  end-page: 1229
  ident: b0020
  article-title: Feature learning with matrix factorization applied to acoustic scene classification
  publication-title: IEEE/ACM Trans. Audio Speech Language Process
– reference: T.H. Vu, J.C. Wang, Acoustic Scene and Event Recognition Using Recurrent Neural Networks. DCASE2016 Challenge, Tech. Rep., 2016.
– reference: R. Radhakrishnan, A. Divakaran, A. Smaragdis, Audio Analysis for Surveillance Applications, in: Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., 2005, pp. 158–161.
– volume: 27
  start-page: 131
  year: 2010
  end-page: 139
  ident: b0080
  article-title: Machine hearing: an emerging field [Exploratory DSP]
  publication-title: IEEE Signal Process. Mag.
– reference: W. Dai, C. Dai, S. Qu, J. Li, S. Das, Very Deep Convolutional Neural Networks for Raw Waveforms, in: Proc. Int. Conf. Acoust., Speech, Signal Process., 2017, pp. 421–425.
– volume: 11
  start-page: 716
  year: 2011
  end-page: 723
  ident: b0045
  article-title: Classification of audio signals using AANN and GMM
  publication-title: Appl. Soft Comput.
– reference: K.J. Piczak, Environmental Sound Classification with Convolutional Neural Networks, in: Proc. 25th Int. Workshop Mach. Learning Signal Process., 2015, pp. 1–6.
– reference: K.J. Piczak, ESC: Dataset for Environmental Sound Classification, in: Proc. 23rd ACM Int. Conf. Multimedia, 2015, pp. 1015–1018.
– reference: D. Bahdanau, K. Cho, Y. Bengio, Neural Machine Translation by Jointly Learning to Align and Translate, 2014. arXiv preprint arXiv:1409.0473
– volume: 17
  start-page: 1142
  year: 2009
  end-page: 1158
  ident: b0035
  article-title: Environmental sound recognition with time-frequency audio features
  publication-title: IEEE Trans. Audio Speech Language Process.
– reference: B. Sankaran, H. Mi, Y. Al-Onaizan, A. Ittycheriah, Temporal Attention Model for Neural Machine Translation, 2016. arXiv preprint arXiv:1608.02927
– reference: B. Zhu, C. Wang, F. Liu, J. Lei, Z. Lu, Y. Peng, Learning Environmental Sounds with Multi-scale Convolutional Neural Network, 2018. arXiv preprint arXiv:1803.10219
– start-page: 892
  year: 2016
  end-page: 900
  ident: b0005
  article-title: Soundnet: learning sound representations from unlabeled video
  publication-title: Proc. Int. Conf. Neural Inf. Process. Syst.
– volume: 8
  start-page: 1152
  year: 2018
  ident: b0070
  article-title: An ensemble stacked convolutional neural network model for environmental event sound recognition
  publication-title: Appl. Sci.
– start-page: 1480
  year: 2016
  end-page: 1489
  ident: b0145
  article-title: Hierarchical attention networks for document classification
  publication-title: Proc. NAACL-HLT
– volume: 23
  start-page: 540
  year: 2015
  end-page: 552
  ident: b0085
  article-title: Robust sound event classification using deep neural networks
  publication-title: IEEE/ACM Trans. Audio, Speech, Language Process.
– reference: H. Zhang, M. Cisse, Y.N. Dauphin, D. Lopez-Paz, Mixup: Beyond Empirical Risk Minimization, 2017. arXiv preprint arXiv:1710.09412.
– start-page: 356
  year: 2018
  end-page: 367
  ident: b0155
  article-title: Deep Convolutional Neural Network with Mixup for Environmental Sound Classification
  publication-title: Proc. Chinese Conf. Pattern Recognit. Comput. Vision
– reference: W. Jun, L. Shengchen, Self-Attention Mechanism Based System for DCASE2018 Challenge Task1 and Task4. DCASE2018 Challenge, Tech. Rep, 2018
– start-page: 135
  year: 2007
  end-page: 146
  ident: b0130
  article-title: Sound classification in a smart room environment: an approach using GMM and HMM methods
  publication-title: Proc. 4th IEEE Conf. Speech Technique, Human-Computer Dialogue
– reference: X. Li, V. Chebiyyam, K. Kirchhoff, Multi-stream Network with Temporal Attention for Environmental Sound Classification, 2019. arXiv preprint arXiv:1901.08608.
– start-page: 1015
  year: 2015
  end-page: 1018
  ident: b0105
  article-title: ESC: dataset for environmental sound classification
  publication-title: Proc. Int. Conf. Multimedia
– start-page: 714
  year: 2015
  end-page: 718
  ident: b0050
  article-title: Improving event detection for audio surveillance using gabor filterbank features
  publication-title: Proc. Euro. Signal Process. Conf.
– start-page: 261
  year: 2019
  end-page: 271
  ident: b0160
  article-title: Attention Based Convolutional Recurrent Neural Network for Environmental Sound Classification
  publication-title: Proc. Chinese Conf. Pattern Recognit. Comput. Vision
– start-page: 469
  year: 2017
  end-page: 473
  ident: b0055
  article-title: Attention based CLDNNs for short-duration acoustic scene classification
  publication-title: Proc. Interspeech
– reference: Y. Tokozume, Y. Ushiku, T. Harada, Learning from Between-Class Examples for Deep Sound Recognition, 2017. arXiv preprint arXiv:1711.10282.
– volume: 32
  start-page: 16
  year: 2015
  end-page: 34
  ident: b0015
  article-title: Acoustic scene classification: classifying environments from the sounds they produce
  publication-title: IEEE Signal Process. Mag.
– reference: J.K. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, Y. Bengio, Attention-based Models for Speech Recognition, in: Proc. Int. Conf. Neural Inf. Process. Syst., 2015, pp. 577–585.
– start-page: 356
  year: 2018
  ident: 10.1016/j.neucom.2020.08.069_b0155
  article-title: Deep Convolutional Neural Network with Mixup for Environmental Sound Classification
– volume: 27
  start-page: 131
  year: 2010
  ident: 10.1016/j.neucom.2020.08.069_b0080
  article-title: Machine hearing: an emerging field [Exploratory DSP]
  publication-title: IEEE Signal Process. Mag.
  doi: 10.1109/MSP.2010.937498
– volume: 26
  start-page: 379
  year: 2018
  ident: 10.1016/j.neucom.2020.08.069_b0090
  article-title: Detection and classification of acoustic scenes and events: outcome of the DCASE 2016 challenge
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
  doi: 10.1109/TASLP.2017.2778423
– start-page: 135
  year: 2007
  ident: 10.1016/j.neucom.2020.08.069_b0130
  article-title: Sound classification in a smart room environment: an approach using GMM and HMM methods
  publication-title: Proc. 4th IEEE Conf. Speech Technique, Human-Computer Dialogue
– ident: 10.1016/j.neucom.2020.08.069_b0150
– volume: 25
  start-page: 1216
  year: 2017
  ident: 10.1016/j.neucom.2020.08.069_b0020
  article-title: Feature learning with matrix factorization applied to acoustic scene classification
  publication-title: IEEE/ACM Trans. Audio Speech Language Process
  doi: 10.1109/TASLP.2017.2690570
– ident: 10.1016/j.neucom.2020.08.069_b0115
– ident: 10.1016/j.neucom.2020.08.069_b0140
– ident: 10.1016/j.neucom.2020.08.069_b0060
– volume: 17
  start-page: 1142
  year: 2009
  ident: 10.1016/j.neucom.2020.08.069_b0035
  article-title: Environmental sound recognition with time-frequency audio features
  publication-title: IEEE Trans. Audio Speech Language Process.
  doi: 10.1109/TASL.2009.2017438
– volume: 11
  start-page: 716
  year: 2011
  ident: 10.1016/j.neucom.2020.08.069_b0045
  article-title: Classification of audio signals using AANN and GMM
  publication-title: Appl. Soft Comput.
  doi: 10.1016/j.asoc.2009.12.033
– ident: 10.1016/j.neucom.2020.08.069_b0100
  doi: 10.1145/2733373.2806390
– ident: 10.1016/j.neucom.2020.08.069_b0125
– ident: 10.1016/j.neucom.2020.08.069_b0010
– ident: 10.1016/j.neucom.2020.08.069_b0110
  doi: 10.1109/ASPAA.2005.1540194
– volume: 112
  start-page: 2048
  year: 2017
  ident: 10.1016/j.neucom.2020.08.069_b0025
  article-title: Classifying environmental sounds using image recognition networks
  publication-title: Proc. Comput. Sci.
  doi: 10.1016/j.procs.2017.08.250
– ident: 10.1016/j.neucom.2020.08.069_b0065
– ident: 10.1016/j.neucom.2020.08.069_b0165
  doi: 10.1109/IJCNN.2018.8489641
– start-page: 714
  year: 2015
  ident: 10.1016/j.neucom.2020.08.069_b0050
  article-title: Improving event detection for audio surveillance using gabor filterbank features
  publication-title: Proc. Euro. Signal Process. Conf.
– volume: 23
  start-page: 540
  year: 2015
  ident: 10.1016/j.neucom.2020.08.069_b0085
  article-title: Robust sound event classification using deep neural networks
  publication-title: IEEE/ACM Trans. Audio, Speech, Language Process.
  doi: 10.1109/TASLP.2015.2389618
– ident: 10.1016/j.neucom.2020.08.069_b0075
  doi: 10.21437/Interspeech.2019-3019
– start-page: 1480
  year: 2016
  ident: 10.1016/j.neucom.2020.08.069_b0145
  article-title: Hierarchical attention networks for document classification
  publication-title: Proc. NAACL-HLT
– volume: 32
  start-page: 16
  year: 2015
  ident: 10.1016/j.neucom.2020.08.069_b0015
  article-title: Acoustic scene classification: classifying environments from the sounds they produce
  publication-title: IEEE Signal Process. Mag.
  doi: 10.1109/MSP.2014.2326181
– start-page: 261
  year: 2019
  ident: 10.1016/j.neucom.2020.08.069_b0160
  article-title: Attention Based Convolutional Recurrent Neural Network for Environmental Sound Classification
– start-page: 1015
  year: 2015
  ident: 10.1016/j.neucom.2020.08.069_b0105
  article-title: ESC: dataset for environmental sound classification
  publication-title: Proc. Int. Conf. Multimedia
– volume: 14
  start-page: 1684
  year: 2012
  ident: 10.1016/j.neucom.2020.08.069_b0135
  article-title: Gammatone cepstral coefficients: biologically inspired features for non-speech audio classification
  publication-title: IEEE Trans. Multimedia
  doi: 10.1109/TMM.2012.2199972
– start-page: 892
  year: 2016
  ident: 10.1016/j.neucom.2020.08.069_b0005
  article-title: Soundnet: learning sound representations from unlabeled video
– ident: 10.1016/j.neucom.2020.08.069_b0095
  doi: 10.1109/MLSP.2015.7324337
– ident: 10.1016/j.neucom.2020.08.069_b0120
– ident: 10.1016/j.neucom.2020.08.069_b0030
– ident: 10.1016/j.neucom.2020.08.069_b0040
  doi: 10.1109/ICASSP.2017.7952190
– start-page: 469
  year: 2017
  ident: 10.1016/j.neucom.2020.08.069_b0055
  article-title: Attention based CLDNNs for short-duration acoustic scene classification
  publication-title: Proc. Interspeech
– volume: 8
  start-page: 1152
  year: 2018
  ident: 10.1016/j.neucom.2020.08.069_b0070
  article-title: An ensemble stacked convolutional neural network model for environmental event sound recognition
  publication-title: Appl. Sci.
  doi: 10.3390/app8071152
SSID ssj0017129
Score 2.6142106
Snippet [Display omitted] •We employ an attention model to automatically focus on the semantically relevant frames for ESC.•We propose a novel convolutional RNN model...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 896
SubjectTerms Attention mechanism
Convolutional recurrent neural network
Environmental sound classification
Title Attention based convolutional recurrent neural network for environmental sound classification
URI https://dx.doi.org/10.1016/j.neucom.2020.08.069
Volume 453
WOSCitedRecordID wos000663418300009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1872-8286
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017129
  issn: 0925-2312
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwELaWLQcuvBHlJR-4IaN1Yq_j4woVAYcKpCKtkFBkOza7qzZbSlL1X_CXGcd2sqGo0AOXKPImdjbzZWYymfkGoZdSWvDjrSJWUE6Yyx3RmROEaz0HiwRvFK7qmk2Iw8NiuZQfJ5OfqRbm_FjUdXFxIU__q6hhDITtS2evIe5-UhiAfRA6bEHssP0nwS-aJqYwegtVdXnlccGOxN9ERibPZAkDdcgDD9TfQ9WbLyPxHZdeGe9e-3yiQYSbxPjUgvXrukLEeMPixNMuVB5jfXyhj0h_WfkM_W0aX7Zd4HXVfttG4zkKX6_a-vt6-OHTWm0DtFSdJomxioz6xIpQmpmCjhkn4FGO9C_j-Y4GLeR8xxjLjgDhsp4PIYfNa7hVPukH1poFJlY52LX0Lf83c9cnIab8tk0ZZin9LKVvyjmXN9BeJrgspmhv8f5g-aH_MCVoFugb4x9J1ZhdyuDlq_mzt7PjwRzdRbfjqwdeBMjcQxNb30d3UlsPHLX8A_S1RxDuEIRHCMI9gnBAEI4IwoAgPEIQ7hCExwh6iD6_PTh6847ELhzEMFo0JFNq7jQv9IxXRmktmZlxawuVOc414xrkTI2ys1wyONC3r9GG60xZ7SirsvwRmtbb2j5G2BmlKl5QCxqACekk1UVFnXaCKU5ztY_ydLdKEynqfaeU4_IqWe0j0p91Giha_nK8SIIoo5sZ3McS0HXlmU-uudJTdGt4Cp6haXPW2ufopjlv1j_OXkRo_QLl6Kh2
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Attention+based+convolutional+recurrent+neural+network+for+environmental+sound+classification&rft.jtitle=Neurocomputing+%28Amsterdam%29&rft.au=Zhang%2C+Zhichao&rft.au=Xu%2C+Shugong&rft.au=Zhang%2C+Shunqing&rft.au=Qiao%2C+Tianhao&rft.date=2021-09-17&rft.issn=0925-2312&rft.volume=453&rft.spage=896&rft.epage=903&rft_id=info:doi/10.1016%2Fj.neucom.2020.08.069&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_neucom_2020_08_069
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0925-2312&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0925-2312&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0925-2312&client=summon