Attention based convolutional recurrent neural network for environmental sound classification
[Display omitted] •We employ an attention model to automatically focus on the semantically relevant frames for ESC.•We propose a novel convolutional RNN model to analyze temporal relations for ESC.•We apply a data augmentation pipeline to further improve perfromance for ESC. Environmental sound clas...
Saved in:
| Published in: | Neurocomputing (Amsterdam) Vol. 453; pp. 896 - 903 |
|---|---|
| Main Authors: | , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier B.V
17.09.2021
|
| Subjects: | |
| ISSN: | 0925-2312, 1872-8286 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | [Display omitted]
•We employ an attention model to automatically focus on the semantically relevant frames for ESC.•We propose a novel convolutional RNN model to analyze temporal relations for ESC.•We apply a data augmentation pipeline to further improve perfromance for ESC.
Environmental sound classification (ESC) is a challenging problem due to the complexity of sounds. The classification performance is heavily dependent on the effectiveness of representative features extracted from the environmental sounds. However, ESC often suffers from the semantically irrelevant frames and silent frames. In order to deal with this, we employ a frame-level attention model to focus on the semantically relevant frames and salient frames. Specifically, we first propose a convolutional recurrent neural network to learn spectro-temporal features and temporal correlations. Then, we extend our convolutional RNN model with a frame-level attention mechanism to learn discriminative feature representations for ESC. We investigated the classification performance when using different attention scaling function and applying different layers. Experiments were conducted on ESC-50 and ESC-10 datasets. Experimental results demonstrated the effectiveness of the proposed method and our method achieved the state-of-the-art or competitive classification accuracy with lower computational complexity. We also visualized our attention results and observed that the proposed attention mechanism was able to lead the network tofocus on the semantically relevant parts of environmental sounds. |
|---|---|
| AbstractList | [Display omitted]
•We employ an attention model to automatically focus on the semantically relevant frames for ESC.•We propose a novel convolutional RNN model to analyze temporal relations for ESC.•We apply a data augmentation pipeline to further improve perfromance for ESC.
Environmental sound classification (ESC) is a challenging problem due to the complexity of sounds. The classification performance is heavily dependent on the effectiveness of representative features extracted from the environmental sounds. However, ESC often suffers from the semantically irrelevant frames and silent frames. In order to deal with this, we employ a frame-level attention model to focus on the semantically relevant frames and salient frames. Specifically, we first propose a convolutional recurrent neural network to learn spectro-temporal features and temporal correlations. Then, we extend our convolutional RNN model with a frame-level attention mechanism to learn discriminative feature representations for ESC. We investigated the classification performance when using different attention scaling function and applying different layers. Experiments were conducted on ESC-50 and ESC-10 datasets. Experimental results demonstrated the effectiveness of the proposed method and our method achieved the state-of-the-art or competitive classification accuracy with lower computational complexity. We also visualized our attention results and observed that the proposed attention mechanism was able to lead the network tofocus on the semantically relevant parts of environmental sounds. |
| Author | Cao, Shan Zhang, Zhichao Zhang, Shunqing Xu, Shugong Qiao, Tianhao |
| Author_xml | – sequence: 1 givenname: Zhichao surname: Zhang fullname: Zhang, Zhichao – sequence: 2 givenname: Shugong surname: Xu fullname: Xu, Shugong email: shugong@shu.edu.cn – sequence: 3 givenname: Shunqing surname: Zhang fullname: Zhang, Shunqing – sequence: 4 givenname: Tianhao surname: Qiao fullname: Qiao, Tianhao – sequence: 5 givenname: Shan surname: Cao fullname: Cao, Shan |
| BookMark | eNqFkM9OAyEQh4mpiW31DTzsC-wKFHZZDyZN47-kiRc9GgLsbELdggG2xreXtZ486GmSmfl-A98CzZx3gNAlwRXBpL7aVQ5G4_cVxRRXWFS4bk_QnIiGloKKeobmuKW8pCtCz9Aixh3GpCG0naPXdUrgkvWu0CpCVxjvDn4Yp44aigBmDCEvFPlCyA0H6cOHt6L3oQB3sMG7fR7nSfSjy_igYrS9NWpKOEenvRoiXPzUJXq5u33ePJTbp_vHzXpbGkZEKqlSda-50Jh3RmndMoM5gFC051wzrjElxCjAq5blxYbVjTZcUwW6J6yjqyW6Puaa4GMM0Etj0_cLUlB2kATLSZTcyaMoOYmSWMgsKsPsF_we7F6Fz_-wmyMG-WMHC0FGY8EZ6Gy2lmTn7d8BX_bvi0k |
| CitedBy_id | crossref_primary_10_1109_ACCESS_2022_3232807 crossref_primary_10_1007_s11042_025_20820_3 crossref_primary_10_1109_ACCESS_2024_3459423 crossref_primary_10_3390_app15158413 crossref_primary_10_1016_j_oceaneng_2023_115863 crossref_primary_10_3390_jmse12101862 crossref_primary_10_1016_j_asoc_2024_112619 crossref_primary_10_1016_j_dsp_2023_104170 crossref_primary_10_1016_j_engappai_2025_110622 crossref_primary_10_1007_s11042_023_17066_2 crossref_primary_10_1038_s41598_022_10382_x crossref_primary_10_1007_s11265_021_01702_x crossref_primary_10_1016_j_apacoust_2025_110636 crossref_primary_10_3390_app13169358 crossref_primary_10_1109_ACCESS_2022_3222495 crossref_primary_10_1109_ACCESS_2025_3590626 crossref_primary_10_1016_j_eswa_2024_123768 crossref_primary_10_1038_s41598_022_13237_7 crossref_primary_10_1109_TASLP_2023_3244507 crossref_primary_10_32604_cmc_2023_032719 crossref_primary_10_1007_s11042_023_16024_2 crossref_primary_10_1016_j_artmed_2024_102903 crossref_primary_10_1109_TIM_2023_3260282 crossref_primary_10_1007_s10462_023_10625_x crossref_primary_10_1016_j_compeleceng_2022_108252 crossref_primary_10_1007_s11042_024_18421_7 crossref_primary_10_1016_j_apacoust_2022_108813 crossref_primary_10_3390_electronics11223743 crossref_primary_10_3390_s22228608 crossref_primary_10_1016_j_apacoust_2021_108437 crossref_primary_10_1007_s11042_023_17982_3 crossref_primary_10_1016_j_culher_2024_06_011 crossref_primary_10_1016_j_ecoinf_2024_102471 crossref_primary_10_1016_j_bspc_2024_107086 crossref_primary_10_3390_s22186818 crossref_primary_10_1016_j_apacoust_2024_110463 crossref_primary_10_1109_TCDS_2022_3222350 crossref_primary_10_1016_j_csl_2025_101868 crossref_primary_10_1016_j_dsp_2025_105234 crossref_primary_10_1007_s42044_025_00289_x crossref_primary_10_3389_fcomp_2025_1517346 crossref_primary_10_1016_j_ecoinf_2023_102065 crossref_primary_10_3390_app15052758 crossref_primary_10_1177_14613484251347090 crossref_primary_10_1049_cit2_12375 crossref_primary_10_1016_j_apacoust_2022_109168 crossref_primary_10_1109_ACCESS_2022_3185224 crossref_primary_10_1016_j_procs_2023_12_111 crossref_primary_10_1142_S0219649223500284 crossref_primary_10_3390_s22228874 crossref_primary_10_1016_j_neucom_2024_128727 crossref_primary_10_1007_s43926_023_00049_y crossref_primary_10_3390_s22155566 crossref_primary_10_1007_s10489_023_04973_y crossref_primary_10_1016_j_scitotenv_2024_176083 crossref_primary_10_1371_journal_pone_0274395 crossref_primary_10_1016_j_neucom_2024_128136 crossref_primary_10_1016_j_asoc_2023_110423 crossref_primary_10_1109_ACCESS_2023_3318015 crossref_primary_10_1186_s12880_022_00933_z crossref_primary_10_3390_fi15020065 crossref_primary_10_1016_j_neucom_2022_07_056 crossref_primary_10_1007_s11042_022_11994_1 crossref_primary_10_3390_app12073502 crossref_primary_10_3390_app14219711 crossref_primary_10_3390_s22093118 crossref_primary_10_1016_j_iswa_2022_200115 crossref_primary_10_3390_d16080509 crossref_primary_10_3390_s22124453 |
| Cites_doi | 10.1109/MSP.2010.937498 10.1109/TASLP.2017.2778423 10.1109/TASLP.2017.2690570 10.1109/TASL.2009.2017438 10.1016/j.asoc.2009.12.033 10.1145/2733373.2806390 10.1109/ASPAA.2005.1540194 10.1016/j.procs.2017.08.250 10.1109/IJCNN.2018.8489641 10.1109/TASLP.2015.2389618 10.21437/Interspeech.2019-3019 10.1109/MSP.2014.2326181 10.1109/TMM.2012.2199972 10.1109/MLSP.2015.7324337 10.1109/ICASSP.2017.7952190 10.3390/app8071152 |
| ContentType | Journal Article |
| Copyright | 2020 The Authors |
| Copyright_xml | – notice: 2020 The Authors |
| DBID | 6I. AAFTH AAYXX CITATION |
| DOI | 10.1016/j.neucom.2020.08.069 |
| DatabaseName | ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1872-8286 |
| EndPage | 903 |
| ExternalDocumentID | 10_1016_j_neucom_2020_08_069 S0925231220313618 |
| GroupedDBID | --- --K --M .DC .~1 0R~ 123 1B1 1~. 1~5 4.4 457 4G. 53G 5VS 6I. 7-5 71M 8P~ 9JM 9JN AABNK AACTN AADPK AAEDT AAEDW AAFTH AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAXLA AAXUO AAYFN ABBOA ABCQJ ABFNM ABJNI ABMAC ABYKQ ACDAQ ACGFS ACRLP ACZNC ADBBV ADEZE AEBSH AEKER AENEX AFKWA AFTJW AFXIZ AGHFR AGUBO AGWIK AGYEJ AHHHB AHZHX AIALX AIEXJ AIKHN AITUG AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD AXJTR BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EO8 EO9 EP2 EP3 F5P FDB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ IHE J1W KOM LG9 M41 MO0 MOBAO N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 ROL RPZ SDF SDG SDP SES SPC SPCBC SSN SSV SSZ T5K ZMT ~G- 29N 9DU AAQXK AATTM AAXKI AAYWO AAYXX ABWVN ABXDB ACLOT ACNNM ACRPL ACVFH ADCNI ADJOM ADMUD ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP ASPBG AVWKF AZFZN CITATION EFKBS EJD FEDTE FGOYB HLZ HVGLF HZ~ R2- SBC SEW WUQ XPP ~HD |
| ID | FETCH-LOGICAL-c418t-2aa6fb58b05dcabb94c05ee8a2f55b45b0211cae03946fb7467bc5b2aebf14d23 |
| ISICitedReferencesCount | 81 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000663418300009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0925-2312 |
| IngestDate | Sat Nov 29 07:15:37 EST 2025 Tue Nov 18 21:44:12 EST 2025 Fri Feb 23 02:43:48 EST 2024 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Convolutional recurrent neural network Attention mechanism Environmental sound classification |
| Language | English |
| License | This is an open access article under the CC BY-NC-ND license. |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c418t-2aa6fb58b05dcabb94c05ee8a2f55b45b0211cae03946fb7467bc5b2aebf14d23 |
| OpenAccessLink | https://dx.doi.org/10.1016/j.neucom.2020.08.069 |
| PageCount | 8 |
| ParticipantIDs | crossref_citationtrail_10_1016_j_neucom_2020_08_069 crossref_primary_10_1016_j_neucom_2020_08_069 elsevier_sciencedirect_doi_10_1016_j_neucom_2020_08_069 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-09-17 |
| PublicationDateYYYYMMDD | 2021-09-17 |
| PublicationDate_xml | – month: 09 year: 2021 text: 2021-09-17 day: 17 |
| PublicationDecade | 2020 |
| PublicationTitle | Neurocomputing (Amsterdam) |
| PublicationYear | 2021 |
| Publisher | Elsevier B.V |
| Publisher_xml | – name: Elsevier B.V |
| References | W. Jun, L. Shengchen, Self-Attention Mechanism Based System for DCASE2018 Challenge Task1 and Task4. DCASE2018 Challenge, Tech. Rep, 2018 Z. Ren, et. al., Attention-based Convolutional Neural Networks for Acoustic Scene Classification. DCASE2018 Challenge, Tech. Rep., 2018. McLoughlin, Zhang, Xie, Song, Xiao (b0085) 2015; 23 Geiger, Helwani (b0050) 2015 H. Zhang, M. Cisse, Y.N. Dauphin, D. Lopez-Paz, Mixup: Beyond Empirical Risk Minimization, 2017. arXiv preprint arXiv:1710.09412. K.J. Piczak, ESC: Dataset for Environmental Sound Classification, in: Proc. 23rd ACM Int. Conf. Multimedia, 2015, pp. 1015–1018. Valero, Alias (b0135) 2012; 14 Bisot, Serizel, Essid, Richard (b0020) 2017; 25 Zhang, Xu, Qiao, Zhang, Cao (b0160) 2019 D. Bahdanau, K. Cho, Y. Bengio, Neural Machine Translation by Jointly Learning to Align and Translate, 2014. arXiv preprint arXiv:1409.0473 R. Radhakrishnan, A. Divakaran, A. Smaragdis, Audio Analysis for Surveillance Applications, in: Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., 2005, pp. 158–161. Chu, Narayanan, Kuo (b0035) 2009; 17 K.J. Piczak, Environmental Sound Classification with Convolutional Neural Networks, in: Proc. 25th Int. Workshop Mach. Learning Signal Process., 2015, pp. 1–6. B. Sankaran, H. Mi, Y. Al-Onaizan, A. Ittycheriah, Temporal Attention Model for Neural Machine Translation, 2016. arXiv preprint arXiv:1608.02927 Dhanalakshmi, Palanivel, Ramalingam (b0045) 2011; 11 Guo, Xu, Li, Alwan (b0055) 2017 Vacher, Serignat, Chaillol (b0130) 2007 Y. Tokozume, Y. Ushiku, T. Harada, Learning from Between-Class Examples for Deep Sound Recognition, 2017. arXiv preprint arXiv:1711.10282. T.H. Vu, J.C. Wang, Acoustic Scene and Event Recognition Using Recurrent Neural Networks. DCASE2016 Challenge, Tech. Rep., 2016. Yang, Yang, Dyer, He, Smola, Hovy (b0145) 2016 Barchiesi, Giannoulis, Stowell, Plumbley (b0015) 2015; 32 Zhang, Xu, Cao, Zhang (b0155) 2018 Li, Yao, Hu, Liu, Yao, Hu (b0070) 2018; 8 Lyon (b0080) 2010; 27 J.K. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, Y. Bengio, Attention-based Models for Speech Recognition, in: Proc. Int. Conf. Neural Inf. Process. Syst., 2015, pp. 577–585. Mesaros, Heittola, Benetos, Foster, Lagrange, Virtanen, Plumbley (b0090) 2018; 26 Boddapati, Petef, Rasmusson, Lundberg (b0025) 2017; 112 S. Ioffe, C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, 2015. arXiv preprint arXiv:1502.03167. W. Dai, C. Dai, S. Qu, J. Li, S. Das, Very Deep Convolutional Neural Networks for Raw Waveforms, in: Proc. Int. Conf. Acoust., Speech, Signal Process., 2017, pp. 421–425. B. Zhu, C. Wang, F. Liu, J. Lei, Z. Lu, Y. Peng, Learning Environmental Sounds with Multi-scale Convolutional Neural Network, 2018. arXiv preprint arXiv:1803.10219 X. Li, V. Chebiyyam, K. Kirchhoff, Multi-stream Network with Temporal Attention for Environmental Sound Classification, 2019. arXiv preprint arXiv:1901.08608. Aytar, Vondrick, Torralba (b0005) 2016 Piczak (b0105) 2015 10.1016/j.neucom.2020.08.069_b0125 Vacher (10.1016/j.neucom.2020.08.069_b0130) 2007 Zhang (10.1016/j.neucom.2020.08.069_b0160) 2019 10.1016/j.neucom.2020.08.069_b0100 Barchiesi (10.1016/j.neucom.2020.08.069_b0015) 2015; 32 10.1016/j.neucom.2020.08.069_b0040 Zhang (10.1016/j.neucom.2020.08.069_b0155) 2018 Li (10.1016/j.neucom.2020.08.069_b0070) 2018; 8 10.1016/j.neucom.2020.08.069_b0060 10.1016/j.neucom.2020.08.069_b0165 10.1016/j.neucom.2020.08.069_b0065 10.1016/j.neucom.2020.08.069_b0120 10.1016/j.neucom.2020.08.069_b0140 Yang (10.1016/j.neucom.2020.08.069_b0145) 2016 Boddapati (10.1016/j.neucom.2020.08.069_b0025) 2017; 112 Geiger (10.1016/j.neucom.2020.08.069_b0050) 2015 Lyon (10.1016/j.neucom.2020.08.069_b0080) 2010; 27 Bisot (10.1016/j.neucom.2020.08.069_b0020) 2017; 25 Dhanalakshmi (10.1016/j.neucom.2020.08.069_b0045) 2011; 11 McLoughlin (10.1016/j.neucom.2020.08.069_b0085) 2015; 23 10.1016/j.neucom.2020.08.069_b0115 10.1016/j.neucom.2020.08.069_b0095 10.1016/j.neucom.2020.08.069_b0150 10.1016/j.neucom.2020.08.069_b0110 10.1016/j.neucom.2020.08.069_b0010 10.1016/j.neucom.2020.08.069_b0075 Piczak (10.1016/j.neucom.2020.08.069_b0105) 2015 10.1016/j.neucom.2020.08.069_b0030 Chu (10.1016/j.neucom.2020.08.069_b0035) 2009; 17 Mesaros (10.1016/j.neucom.2020.08.069_b0090) 2018; 26 Guo (10.1016/j.neucom.2020.08.069_b0055) 2017 Valero (10.1016/j.neucom.2020.08.069_b0135) 2012; 14 Aytar (10.1016/j.neucom.2020.08.069_b0005) 2016 |
| References_xml | – reference: S. Ioffe, C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, 2015. arXiv preprint arXiv:1502.03167. – volume: 14 start-page: 1684 year: 2012 end-page: 1689 ident: b0135 article-title: Gammatone cepstral coefficients: biologically inspired features for non-speech audio classification publication-title: IEEE Trans. Multimedia – volume: 112 start-page: 2048 year: 2017 end-page: 2056 ident: b0025 article-title: Classifying environmental sounds using image recognition networks publication-title: Proc. Comput. Sci. – volume: 26 start-page: 379 year: 2018 end-page: 393 ident: b0090 article-title: Detection and classification of acoustic scenes and events: outcome of the DCASE 2016 challenge publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. – reference: Z. Ren, et. al., Attention-based Convolutional Neural Networks for Acoustic Scene Classification. DCASE2018 Challenge, Tech. Rep., 2018. – volume: 25 start-page: 1216 year: 2017 end-page: 1229 ident: b0020 article-title: Feature learning with matrix factorization applied to acoustic scene classification publication-title: IEEE/ACM Trans. Audio Speech Language Process – reference: T.H. Vu, J.C. Wang, Acoustic Scene and Event Recognition Using Recurrent Neural Networks. DCASE2016 Challenge, Tech. Rep., 2016. – reference: R. Radhakrishnan, A. Divakaran, A. Smaragdis, Audio Analysis for Surveillance Applications, in: Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., 2005, pp. 158–161. – volume: 27 start-page: 131 year: 2010 end-page: 139 ident: b0080 article-title: Machine hearing: an emerging field [Exploratory DSP] publication-title: IEEE Signal Process. Mag. – reference: W. Dai, C. Dai, S. Qu, J. Li, S. Das, Very Deep Convolutional Neural Networks for Raw Waveforms, in: Proc. Int. Conf. Acoust., Speech, Signal Process., 2017, pp. 421–425. – volume: 11 start-page: 716 year: 2011 end-page: 723 ident: b0045 article-title: Classification of audio signals using AANN and GMM publication-title: Appl. Soft Comput. – reference: K.J. Piczak, Environmental Sound Classification with Convolutional Neural Networks, in: Proc. 25th Int. Workshop Mach. Learning Signal Process., 2015, pp. 1–6. – reference: K.J. Piczak, ESC: Dataset for Environmental Sound Classification, in: Proc. 23rd ACM Int. Conf. Multimedia, 2015, pp. 1015–1018. – reference: D. Bahdanau, K. Cho, Y. Bengio, Neural Machine Translation by Jointly Learning to Align and Translate, 2014. arXiv preprint arXiv:1409.0473 – volume: 17 start-page: 1142 year: 2009 end-page: 1158 ident: b0035 article-title: Environmental sound recognition with time-frequency audio features publication-title: IEEE Trans. Audio Speech Language Process. – reference: B. Sankaran, H. Mi, Y. Al-Onaizan, A. Ittycheriah, Temporal Attention Model for Neural Machine Translation, 2016. arXiv preprint arXiv:1608.02927 – reference: B. Zhu, C. Wang, F. Liu, J. Lei, Z. Lu, Y. Peng, Learning Environmental Sounds with Multi-scale Convolutional Neural Network, 2018. arXiv preprint arXiv:1803.10219 – start-page: 892 year: 2016 end-page: 900 ident: b0005 article-title: Soundnet: learning sound representations from unlabeled video publication-title: Proc. Int. Conf. Neural Inf. Process. Syst. – volume: 8 start-page: 1152 year: 2018 ident: b0070 article-title: An ensemble stacked convolutional neural network model for environmental event sound recognition publication-title: Appl. Sci. – start-page: 1480 year: 2016 end-page: 1489 ident: b0145 article-title: Hierarchical attention networks for document classification publication-title: Proc. NAACL-HLT – volume: 23 start-page: 540 year: 2015 end-page: 552 ident: b0085 article-title: Robust sound event classification using deep neural networks publication-title: IEEE/ACM Trans. Audio, Speech, Language Process. – reference: H. Zhang, M. Cisse, Y.N. Dauphin, D. Lopez-Paz, Mixup: Beyond Empirical Risk Minimization, 2017. arXiv preprint arXiv:1710.09412. – start-page: 356 year: 2018 end-page: 367 ident: b0155 article-title: Deep Convolutional Neural Network with Mixup for Environmental Sound Classification publication-title: Proc. Chinese Conf. Pattern Recognit. Comput. Vision – reference: W. Jun, L. Shengchen, Self-Attention Mechanism Based System for DCASE2018 Challenge Task1 and Task4. DCASE2018 Challenge, Tech. Rep, 2018 – start-page: 135 year: 2007 end-page: 146 ident: b0130 article-title: Sound classification in a smart room environment: an approach using GMM and HMM methods publication-title: Proc. 4th IEEE Conf. Speech Technique, Human-Computer Dialogue – reference: X. Li, V. Chebiyyam, K. Kirchhoff, Multi-stream Network with Temporal Attention for Environmental Sound Classification, 2019. arXiv preprint arXiv:1901.08608. – start-page: 1015 year: 2015 end-page: 1018 ident: b0105 article-title: ESC: dataset for environmental sound classification publication-title: Proc. Int. Conf. Multimedia – start-page: 714 year: 2015 end-page: 718 ident: b0050 article-title: Improving event detection for audio surveillance using gabor filterbank features publication-title: Proc. Euro. Signal Process. Conf. – start-page: 261 year: 2019 end-page: 271 ident: b0160 article-title: Attention Based Convolutional Recurrent Neural Network for Environmental Sound Classification publication-title: Proc. Chinese Conf. Pattern Recognit. Comput. Vision – start-page: 469 year: 2017 end-page: 473 ident: b0055 article-title: Attention based CLDNNs for short-duration acoustic scene classification publication-title: Proc. Interspeech – reference: Y. Tokozume, Y. Ushiku, T. Harada, Learning from Between-Class Examples for Deep Sound Recognition, 2017. arXiv preprint arXiv:1711.10282. – volume: 32 start-page: 16 year: 2015 end-page: 34 ident: b0015 article-title: Acoustic scene classification: classifying environments from the sounds they produce publication-title: IEEE Signal Process. Mag. – reference: J.K. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, Y. Bengio, Attention-based Models for Speech Recognition, in: Proc. Int. Conf. Neural Inf. Process. Syst., 2015, pp. 577–585. – start-page: 356 year: 2018 ident: 10.1016/j.neucom.2020.08.069_b0155 article-title: Deep Convolutional Neural Network with Mixup for Environmental Sound Classification – volume: 27 start-page: 131 year: 2010 ident: 10.1016/j.neucom.2020.08.069_b0080 article-title: Machine hearing: an emerging field [Exploratory DSP] publication-title: IEEE Signal Process. Mag. doi: 10.1109/MSP.2010.937498 – volume: 26 start-page: 379 year: 2018 ident: 10.1016/j.neucom.2020.08.069_b0090 article-title: Detection and classification of acoustic scenes and events: outcome of the DCASE 2016 challenge publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. doi: 10.1109/TASLP.2017.2778423 – start-page: 135 year: 2007 ident: 10.1016/j.neucom.2020.08.069_b0130 article-title: Sound classification in a smart room environment: an approach using GMM and HMM methods publication-title: Proc. 4th IEEE Conf. Speech Technique, Human-Computer Dialogue – ident: 10.1016/j.neucom.2020.08.069_b0150 – volume: 25 start-page: 1216 year: 2017 ident: 10.1016/j.neucom.2020.08.069_b0020 article-title: Feature learning with matrix factorization applied to acoustic scene classification publication-title: IEEE/ACM Trans. Audio Speech Language Process doi: 10.1109/TASLP.2017.2690570 – ident: 10.1016/j.neucom.2020.08.069_b0115 – ident: 10.1016/j.neucom.2020.08.069_b0140 – ident: 10.1016/j.neucom.2020.08.069_b0060 – volume: 17 start-page: 1142 year: 2009 ident: 10.1016/j.neucom.2020.08.069_b0035 article-title: Environmental sound recognition with time-frequency audio features publication-title: IEEE Trans. Audio Speech Language Process. doi: 10.1109/TASL.2009.2017438 – volume: 11 start-page: 716 year: 2011 ident: 10.1016/j.neucom.2020.08.069_b0045 article-title: Classification of audio signals using AANN and GMM publication-title: Appl. Soft Comput. doi: 10.1016/j.asoc.2009.12.033 – ident: 10.1016/j.neucom.2020.08.069_b0100 doi: 10.1145/2733373.2806390 – ident: 10.1016/j.neucom.2020.08.069_b0125 – ident: 10.1016/j.neucom.2020.08.069_b0010 – ident: 10.1016/j.neucom.2020.08.069_b0110 doi: 10.1109/ASPAA.2005.1540194 – volume: 112 start-page: 2048 year: 2017 ident: 10.1016/j.neucom.2020.08.069_b0025 article-title: Classifying environmental sounds using image recognition networks publication-title: Proc. Comput. Sci. doi: 10.1016/j.procs.2017.08.250 – ident: 10.1016/j.neucom.2020.08.069_b0065 – ident: 10.1016/j.neucom.2020.08.069_b0165 doi: 10.1109/IJCNN.2018.8489641 – start-page: 714 year: 2015 ident: 10.1016/j.neucom.2020.08.069_b0050 article-title: Improving event detection for audio surveillance using gabor filterbank features publication-title: Proc. Euro. Signal Process. Conf. – volume: 23 start-page: 540 year: 2015 ident: 10.1016/j.neucom.2020.08.069_b0085 article-title: Robust sound event classification using deep neural networks publication-title: IEEE/ACM Trans. Audio, Speech, Language Process. doi: 10.1109/TASLP.2015.2389618 – ident: 10.1016/j.neucom.2020.08.069_b0075 doi: 10.21437/Interspeech.2019-3019 – start-page: 1480 year: 2016 ident: 10.1016/j.neucom.2020.08.069_b0145 article-title: Hierarchical attention networks for document classification publication-title: Proc. NAACL-HLT – volume: 32 start-page: 16 year: 2015 ident: 10.1016/j.neucom.2020.08.069_b0015 article-title: Acoustic scene classification: classifying environments from the sounds they produce publication-title: IEEE Signal Process. Mag. doi: 10.1109/MSP.2014.2326181 – start-page: 261 year: 2019 ident: 10.1016/j.neucom.2020.08.069_b0160 article-title: Attention Based Convolutional Recurrent Neural Network for Environmental Sound Classification – start-page: 1015 year: 2015 ident: 10.1016/j.neucom.2020.08.069_b0105 article-title: ESC: dataset for environmental sound classification publication-title: Proc. Int. Conf. Multimedia – volume: 14 start-page: 1684 year: 2012 ident: 10.1016/j.neucom.2020.08.069_b0135 article-title: Gammatone cepstral coefficients: biologically inspired features for non-speech audio classification publication-title: IEEE Trans. Multimedia doi: 10.1109/TMM.2012.2199972 – start-page: 892 year: 2016 ident: 10.1016/j.neucom.2020.08.069_b0005 article-title: Soundnet: learning sound representations from unlabeled video – ident: 10.1016/j.neucom.2020.08.069_b0095 doi: 10.1109/MLSP.2015.7324337 – ident: 10.1016/j.neucom.2020.08.069_b0120 – ident: 10.1016/j.neucom.2020.08.069_b0030 – ident: 10.1016/j.neucom.2020.08.069_b0040 doi: 10.1109/ICASSP.2017.7952190 – start-page: 469 year: 2017 ident: 10.1016/j.neucom.2020.08.069_b0055 article-title: Attention based CLDNNs for short-duration acoustic scene classification publication-title: Proc. Interspeech – volume: 8 start-page: 1152 year: 2018 ident: 10.1016/j.neucom.2020.08.069_b0070 article-title: An ensemble stacked convolutional neural network model for environmental event sound recognition publication-title: Appl. Sci. doi: 10.3390/app8071152 |
| SSID | ssj0017129 |
| Score | 2.6142652 |
| Snippet | [Display omitted]
•We employ an attention model to automatically focus on the semantically relevant frames for ESC.•We propose a novel convolutional RNN model... |
| SourceID | crossref elsevier |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 896 |
| SubjectTerms | Attention mechanism Convolutional recurrent neural network Environmental sound classification |
| Title | Attention based convolutional recurrent neural network for environmental sound classification |
| URI | https://dx.doi.org/10.1016/j.neucom.2020.08.069 |
| Volume | 453 |
| WOSCitedRecordID | wos000663418300009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1872-8286 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017129 issn: 0925-2312 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lj9MwELZKlwMX3ohdHvKBGwpKnDixjxVaBAitQCyoQkKRX6GtlnTZTVb7S_i9jF9JqyJeEhercuPEmvnqmUxnvkHoiTGqYYw3SUEMDFlJE0lymqgGzFuhlTauPPrjm-roiM3n_O1k8j3WwlycVG3LLi_56X9VNcyBsm3p7F-oe7gpTMBnUDqMoHYY_0jxs64LKYzWQmmXVx4e6Ej8VWBkskyWMNH6PHBP_T1WvdkyEttx6amy7rXNJxpVuIqMTz1YP9cVIsQbZl8t7YK2GBviC0NE-tPCZuiv4_y8d4HXRf9lHYznVvh60bffluMX75Zi7aEl2niTEKsgmU2s8KWZPoC2U0TjI5GEJuBm-kPZ-HOYVcRVuG8e1AXNN45axssNq80dU8KuQfCxidUzkKnNDoJNpZ6ylY8GcEhLfG-3YndCLKNlmbEraI9UlLMp2pu9Opy_Hv6fqjLiWRzD1mNRpssc3H3Wz52eDUfm-Ca6Ht5A8Mwj5xaamPY2uhG7e-Bw2N9BnwcgYQckvAUkPAAJeyDhACQMQMJbQMIOSHgbSHfRhxeHx89fJqEZR6KKjHUJEaJsJGUypVoJKXmhUmoME6ShVBZUgrozJUya8wIutF1spKKSCCObrNAkv4em7bo19xHWJbiNqUlLQ2QBoha6lJwTDe95JtWV2Ed5lFatAlO9bZhyUseUxFXtZVxbGde2j2rJ91EyrDr1TC2_ub6KiqiDt-m9yBqw88uVB_-88gG6Nv4sHqJpd9abR-iquuiW52ePA8h-AN5srMc |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Attention+based+convolutional+recurrent+neural+network+for+environmental+sound+classification&rft.jtitle=Neurocomputing+%28Amsterdam%29&rft.au=Zhang%2C+Zhichao&rft.au=Xu%2C+Shugong&rft.au=Zhang%2C+Shunqing&rft.au=Qiao%2C+Tianhao&rft.date=2021-09-17&rft.pub=Elsevier+B.V&rft.issn=0925-2312&rft.eissn=1872-8286&rft.volume=453&rft.spage=896&rft.epage=903&rft_id=info:doi/10.1016%2Fj.neucom.2020.08.069&rft.externalDocID=S0925231220313618 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0925-2312&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0925-2312&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0925-2312&client=summon |