Fusion-ConvBERT: Parallel Convolution and BERT Fusion for Speech Emotion Recognition
Speech emotion recognition predicts the emotional state of a speaker based on the person’s speech. It brings an additional element for creating more natural human–computer interactions. Earlier studies on emotional recognition have been primarily based on handcrafted features and manual labels. With...
Saved in:
| Published in: | Sensors (Basel, Switzerland) Vol. 20; no. 22; p. 6688 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Switzerland
MDPI AG
23.11.2020
MDPI |
| Subjects: | |
| ISSN: | 1424-8220, 1424-8220 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Speech emotion recognition predicts the emotional state of a speaker based on the person’s speech. It brings an additional element for creating more natural human–computer interactions. Earlier studies on emotional recognition have been primarily based on handcrafted features and manual labels. With the advent of deep learning, there have been some efforts in applying the deep-network-based approach to the problem of emotion recognition. As deep learning automatically extracts salient features correlated to speaker emotion, it brings certain advantages over the handcrafted-feature-based methods. There are, however, some challenges in applying them to the emotion recognition problem, because data required for properly training deep networks are often lacking. Therefore, there is a need for a new deep-learning-based approach which can exploit available information from given speech signals to the maximum extent possible. Our proposed method, called “Fusion-ConvBERT”, is a parallel fusion model consisting of bidirectional encoder representations from transformers and convolutional neural networks. Extensive experiments were conducted on the proposed model using the EMO-DB and Interactive Emotional Dyadic Motion Capture Database emotion corpus, and it was shown that the proposed method outperformed state-of-the-art techniques in most of the test configurations. |
|---|---|
| AbstractList | Speech emotion recognition predicts the emotional state of a speaker based on the person’s speech. It brings an additional element for creating more natural human–computer interactions. Earlier studies on emotional recognition have been primarily based on handcrafted features and manual labels. With the advent of deep learning, there have been some efforts in applying the deep-network-based approach to the problem of emotion recognition. As deep learning automatically extracts salient features correlated to speaker emotion, it brings certain advantages over the handcrafted-feature-based methods. There are, however, some challenges in applying them to the emotion recognition problem, because data required for properly training deep networks are often lacking. Therefore, there is a need for a new deep-learning-based approach which can exploit available information from given speech signals to the maximum extent possible. Our proposed method, called “Fusion-ConvBERT”, is a parallel fusion model consisting of bidirectional encoder representations from transformers and convolutional neural networks. Extensive experiments were conducted on the proposed model using the EMO-DB and Interactive Emotional Dyadic Motion Capture Database emotion corpus, and it was shown that the proposed method outperformed state-of-the-art techniques in most of the test configurations. Speech emotion recognition predicts the emotional state of a speaker based on the person's speech. It brings an additional element for creating more natural human-computer interactions. Earlier studies on emotional recognition have been primarily based on handcrafted features and manual labels. With the advent of deep learning, there have been some efforts in applying the deep-network-based approach to the problem of emotion recognition. As deep learning automatically extracts salient features correlated to speaker emotion, it brings certain advantages over the handcrafted-feature-based methods. There are, however, some challenges in applying them to the emotion recognition problem, because data required for properly training deep networks are often lacking. Therefore, there is a need for a new deep-learning-based approach which can exploit available information from given speech signals to the maximum extent possible. Our proposed method, called "Fusion-ConvBERT", is a parallel fusion model consisting of bidirectional encoder representations from transformers and convolutional neural networks. Extensive experiments were conducted on the proposed model using the EMO-DB and Interactive Emotional Dyadic Motion Capture Database emotion corpus, and it was shown that the proposed method outperformed state-of-the-art techniques in most of the test configurations.Speech emotion recognition predicts the emotional state of a speaker based on the person's speech. It brings an additional element for creating more natural human-computer interactions. Earlier studies on emotional recognition have been primarily based on handcrafted features and manual labels. With the advent of deep learning, there have been some efforts in applying the deep-network-based approach to the problem of emotion recognition. As deep learning automatically extracts salient features correlated to speaker emotion, it brings certain advantages over the handcrafted-feature-based methods. There are, however, some challenges in applying them to the emotion recognition problem, because data required for properly training deep networks are often lacking. Therefore, there is a need for a new deep-learning-based approach which can exploit available information from given speech signals to the maximum extent possible. Our proposed method, called "Fusion-ConvBERT", is a parallel fusion model consisting of bidirectional encoder representations from transformers and convolutional neural networks. Extensive experiments were conducted on the proposed model using the EMO-DB and Interactive Emotional Dyadic Motion Capture Database emotion corpus, and it was shown that the proposed method outperformed state-of-the-art techniques in most of the test configurations. |
| Author | Ko, Hanseok Han, David K. Lee, Sanghyun |
| AuthorAffiliation | 1 Department of Electronics and Electrical Engineering, Korea University, Seoul 136-713, Korea; shlee@ispl.korea.ac.kr 2 Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA 19104, USA; dkh42@drexel.edu |
| AuthorAffiliation_xml | – name: 2 Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA 19104, USA; dkh42@drexel.edu – name: 1 Department of Electronics and Electrical Engineering, Korea University, Seoul 136-713, Korea; shlee@ispl.korea.ac.kr |
| Author_xml | – sequence: 1 givenname: Sanghyun surname: Lee fullname: Lee, Sanghyun – sequence: 2 givenname: David K. surname: Han fullname: Han, David K. – sequence: 3 givenname: Hanseok orcidid: 0000-0002-8744-4514 surname: Ko fullname: Ko, Hanseok |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/33238396$$D View this record in MEDLINE/PubMed |
| BookMark | eNptksFq3DAQhkVJaZJND32BYuglPbiRJVmWeiiky6YNBFqS7VnI8mijxSttJTvQt6-8my5J6EnDzKefmX_mFB354AGhdxX-RKnEF4lgQjgX4hU6qRhhpSAEHz2Jj9FpSmuMCaVUvEHHlBIqqOQnaHk1Jhd8OQ_-4evidvm5-Kmj7nvoiykV-nHI5UL7rpjKxR4vbIjF3RbA3BeLTdght2DCyrspPkOvre4TvH18Z-jX1WI5_17e_Ph2Pb-8KU2N8VAartu27QAbrEEQS7EwkpPGakmYFpY2jHUAlndW1JUwtNFEmLbpWC2obgydoeu9bhf0Wm2j2-j4RwXt1C4R4krpODjTg6IN1JYITjTGTEgpBGgLsuOG1MYAz1pf9lrbsd1AZ8AP2Ydnos8r3t2rVXhQTYPx5OcMnT8KxPB7hDSojUsG-l57CGNShHHGMaOkyuiHF-g6jNFnq3aUpFzWLFPvn3Z0aOXf8jLwcQ-YGFKKYA9IhdV0GOpwGJm9eMEaN-hpWXkY1__nx1975LkV |
| CitedBy_id | crossref_primary_10_1155_2022_7463091 crossref_primary_10_1109_ACCESS_2023_3297715 crossref_primary_10_1007_s00521_023_08798_1 crossref_primary_10_1016_j_asoc_2022_109648 crossref_primary_10_1016_j_cmpb_2022_106646 crossref_primary_10_1109_ACCESS_2024_3428336 crossref_primary_10_1016_j_imavis_2024_104901 crossref_primary_10_1109_ACCESS_2022_3163856 crossref_primary_10_3390_electronics12194034 crossref_primary_10_1109_TAFFC_2024_3399729 crossref_primary_10_3390_electronics12224703 crossref_primary_10_1007_s11042_024_19321_6 crossref_primary_10_1155_2023_9645611 crossref_primary_10_1155_2022_6005446 crossref_primary_10_1186_s13634_023_01073_4 crossref_primary_10_1016_j_apacoust_2025_110590 crossref_primary_10_3390_app14083276 crossref_primary_10_3390_s22010020 crossref_primary_10_1109_ACCESS_2021_3092735 crossref_primary_10_2196_74260 |
| Cites_doi | 10.1109/ICASSP.2016.7472669 10.1109/ACII.2015.7344669 10.1007/s11042-017-5292-7 10.1109/ICME.2003.1220939 10.3390/s17071694 10.1109/KST.2013.6512793 10.1109/ACCESS.2018.2888882 10.21437/Interspeech.2019-2680 10.3390/s19122730 10.1155/2017/1945630 10.1109/ICASSP.2011.5947651 10.1109/LSP.2018.2860246 10.1109/TMM.2014.2360798 10.1109/ACII.2017.8273599 10.1109/ICASSP.2017.7952552 10.1109/TMM.2013.2269314 10.1109/TMM.2011.2171334 10.21437/Interspeech.2019-1873 10.1016/j.patcog.2010.09.020 10.1007/s10579-008-9076-6 10.21437/Interspeech.2018-1832 10.1145/3129340 10.21437/Interspeech.2016-488 10.25080/Majora-7b98e3ed-003 10.21437/Interspeech.2005-446 10.18653/v1/N18-1202 10.1016/S0167-6393(03)00099-2 10.1109/ACCESS.2019.2921390 10.1145/2661806.2661810 10.1109/ACCESS.2019.2924597 10.1016/j.dsp.2007.12.004 10.1109/ICASSP.2015.7178964 10.1145/2647868.2654984 10.1109/ICASSP40776.2020.9054458 10.1109/ICASSP.2013.6638346 10.21437/Interspeech.2017-1637 10.1109/ACCESS.2019.2927384 10.1109/APSIPA.2016.7820699 10.1109/ICASSP40776.2020.9053176 10.1109/ACCESS.2019.2938007 10.1109/IJCNN.2016.7727636 10.1016/j.eij.2015.05.004 10.1145/3123266.3123371 10.1109/ACCESS.2019.2928017 10.21437/Interspeech.2013-438 10.21437/Interspeech.2019-1649 |
| ContentType | Journal Article |
| Copyright | 2020. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. 2020 by the authors. 2020 |
| Copyright_xml | – notice: 2020. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: 2020 by the authors. 2020 |
| DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM 3V. 7X7 7XB 88E 8FI 8FJ 8FK ABUWG AFKRA AZQEC BENPR CCPQU DWQXO FYUFA GHDGH K9. M0S M1P PHGZM PHGZT PIMPY PJZUB PKEHL PPXIY PQEST PQQKQ PQUKI PRINS 7X8 5PM DOA |
| DOI | 10.3390/s20226688 |
| DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed ProQuest Central (Corporate) Health & Medical Collection ProQuest Central (purchase pre-March 2016) Medical Database (Alumni Edition) Hospital Premium Collection Hospital Premium Collection (Alumni Edition) ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials - QC ProQuest Central ProQuest One ProQuest Central Korea ProQuest Health & Medical Collection Health Research Premium Collection (Alumni) ProQuest Health & Medical Complete (Alumni) Health & Medical Collection (Alumni Edition) Medical Database Proquest Central Premium ProQuest One Academic (New) ProQuest Publicly Available Content Database ProQuest Health & Medical Research Collection ProQuest One Academic Middle East (New) ProQuest One Health & Nursing ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic (retired) ProQuest One Academic UKI Edition ProQuest Central China MEDLINE - Academic PubMed Central (Full Participant titles) DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest Health & Medical Complete (Alumni) ProQuest Central (Alumni Edition) ProQuest One Community College ProQuest One Health & Nursing ProQuest Central China ProQuest Central ProQuest Health & Medical Research Collection Health Research Premium Collection Health and Medicine Complete (Alumni Edition) ProQuest Central Korea Health & Medical Research Collection ProQuest Central (New) ProQuest Medical Library (Alumni) ProQuest One Academic Eastern Edition ProQuest Hospital Collection Health Research Premium Collection (Alumni) ProQuest Hospital Collection (Alumni) ProQuest Health & Medical Complete ProQuest Medical Library ProQuest One Academic UKI Edition ProQuest One Academic ProQuest One Academic (New) ProQuest Central (Alumni) MEDLINE - Academic |
| DatabaseTitleList | CrossRef Publicly Available Content Database MEDLINE - Academic MEDLINE |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 3 dbid: PIMPY name: Publicly Available Content Database url: http://search.proquest.com/publiccontent sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 1424-8220 |
| ExternalDocumentID | oai_doaj_org_article_37e5f2862a00489988eafe9d6c25cce6 PMC7700332 33238396 10_3390_s20226688 |
| Genre | Journal Article |
| GrantInformation_xml | – fundername: National Research Foundation (NRF) grant funded by the MSIP of Korea grantid: 2019R1A2C2009480 |
| GroupedDBID | --- 123 2WC 53G 5VS 7X7 88E 8FE 8FG 8FI 8FJ AADQD AAHBH AAYXX ABDBF ABUWG ACUHS ADBBV ADMLS AENEX AFFHD AFKRA AFZYC ALMA_UNASSIGNED_HOLDINGS BENPR BPHCQ BVXVI CCPQU CITATION CS3 D1I DU5 E3Z EBD ESX F5P FYUFA GROUPED_DOAJ GX1 HH5 HMCUK HYE KQ8 L6V M1P M48 MODMG M~E OK1 OVT P2P P62 PHGZM PHGZT PIMPY PJZUB PPXIY PQQKQ PROAC PSQYO RNS RPM TUS UKHRP XSB ~8M ALIPV CGR CUY CVF ECM EIF NPM 3V. 7XB 8FK AZQEC DWQXO K9. PKEHL PQEST PQUKI PRINS 7X8 PUEGO 5PM |
| ID | FETCH-LOGICAL-c500t-c6abbbde0c0ae82f308c9627fa924a8f3744deef6df8518c37a28cb7d4583a7c3 |
| IEDL.DBID | DOA |
| ISICitedReferencesCount | 22 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000595199400001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1424-8220 |
| IngestDate | Tue Oct 14 19:04:04 EDT 2025 Tue Nov 04 01:44:38 EST 2025 Thu Oct 02 11:18:07 EDT 2025 Tue Oct 07 06:58:22 EDT 2025 Thu Apr 03 07:07:49 EDT 2025 Tue Nov 18 20:52:04 EST 2025 Sat Nov 29 07:09:31 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 22 |
| Keywords | bidirectional encoder representations from transformers (BERT) convolutional neural networks (CNNs) speech emotion recognition fusion model transformer representation spatiotemporal representation |
| Language | English |
| License | Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c500t-c6abbbde0c0ae82f308c9627fa924a8f3744deef6df8518c37a28cb7d4583a7c3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ORCID | 0000-0002-8744-4514 |
| OpenAccessLink | https://doaj.org/article/37e5f2862a00489988eafe9d6c25cce6 |
| PMID | 33238396 |
| PQID | 2464936954 |
| PQPubID | 2032333 |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_37e5f2862a00489988eafe9d6c25cce6 pubmedcentral_primary_oai_pubmedcentral_nih_gov_7700332 proquest_miscellaneous_2464604321 proquest_journals_2464936954 pubmed_primary_33238396 crossref_primary_10_3390_s20226688 crossref_citationtrail_10_3390_s20226688 |
| PublicationCentury | 2000 |
| PublicationDate | 20201123 |
| PublicationDateYYYYMMDD | 2020-11-23 |
| PublicationDate_xml | – month: 11 year: 2020 text: 20201123 day: 23 |
| PublicationDecade | 2020 |
| PublicationPlace | Switzerland |
| PublicationPlace_xml | – name: Switzerland – name: Basel |
| PublicationTitle | Sensors (Basel, Switzerland) |
| PublicationTitleAlternate | Sensors (Basel) |
| PublicationYear | 2020 |
| Publisher | MDPI AG MDPI |
| Publisher_xml | – name: MDPI AG – name: MDPI |
| References | ref_50 Schuller (ref_2) 2018; 61 ref_58 ref_13 ref_12 ref_11 ref_55 ref_54 ref_52 ref_51 ref_19 ref_18 Badshah (ref_56) 2019; 78 ref_17 Chen (ref_15) 2018; 25 ref_16 ref_59 Lin (ref_4) 2011; 14 Altrov (ref_5) 2015; 6 Busso (ref_24) 2008; 42 Meng (ref_38) 2019; 7 Zhu (ref_10) 2017; 17 ref_25 ref_23 ref_22 ref_21 ref_20 Mao (ref_14) 2014; 16 ref_29 Jiang (ref_44) 2009; 19 ref_28 ref_27 Nalini (ref_31) 2016; 17 Khamparia (ref_33) 2019; 7 Jiang (ref_57) 2019; 7 Guo (ref_41) 2019; 7 ref_35 ref_32 Huang (ref_34) 2019; 7 ref_30 Kamel (ref_6) 2011; 44 ref_39 ref_37 Wu (ref_3) 2013; 15 Jiang (ref_40) 2019; 19 ref_47 ref_46 ref_45 ref_43 ref_42 Srivastava (ref_53) 2014; 15 ref_1 ref_49 ref_48 ref_9 ref_8 Ocquaye (ref_36) 2019; 7 ref_7 Nwe (ref_26) 2003; 41 |
| References_xml | – ident: ref_9 doi: 10.1109/ICASSP.2016.7472669 – ident: ref_58 doi: 10.1109/ACII.2015.7344669 – volume: 78 start-page: 5571 year: 2019 ident: ref_56 article-title: Deep features-based speech emotion recognition for smart affective services publication-title: Multimed. Tools Appl. doi: 10.1007/s11042-017-5292-7 – ident: ref_27 doi: 10.1109/ICME.2003.1220939 – ident: ref_49 – volume: 17 start-page: 1694 year: 2017 ident: ref_10 article-title: Emotion recognition from Chinese speech for smart affective services using a combination of SVM and DBN publication-title: Sensors doi: 10.3390/s17071694 – ident: ref_30 doi: 10.1109/KST.2013.6512793 – volume: 7 start-page: 7717 year: 2019 ident: ref_33 article-title: Sound classification using convolutional neural network and tensor deep stacking network publication-title: IEEE Access doi: 10.1109/ACCESS.2018.2888882 – ident: ref_55 doi: 10.21437/Interspeech.2019-2680 – ident: ref_16 – volume: 19 start-page: 2730 year: 2019 ident: ref_40 article-title: Speech emotion recognition with heterogeneous feature unification of deep neural network publication-title: Sensors doi: 10.3390/s19122730 – ident: ref_32 doi: 10.1155/2017/1945630 – ident: ref_12 doi: 10.1109/ICASSP.2011.5947651 – volume: 25 start-page: 1440 year: 2018 ident: ref_15 article-title: 3-D convolutional recurrent neural networks with attention model for speech emotion recognition publication-title: IEEE Signal Process. Lett. doi: 10.1109/LSP.2018.2860246 – volume: 16 start-page: 2203 year: 2014 ident: ref_14 article-title: Learning salient features for speech emotion recognition using convolutional neural networks publication-title: IEEE Trans. Multimed. doi: 10.1109/TMM.2014.2360798 – ident: ref_8 – ident: ref_7 doi: 10.1109/ACII.2017.8273599 – ident: ref_39 doi: 10.1109/ICASSP.2017.7952552 – volume: 15 start-page: 1880 year: 2013 ident: ref_3 article-title: Two-level hierarchical alignment for semi-coupled HMM-based audiovisual emotion recognition with temporal course publication-title: IEEE Trans. Multimed. doi: 10.1109/TMM.2013.2269314 – volume: 14 start-page: 142 year: 2011 ident: ref_4 article-title: Error weighted semi-coupled hidden Markov model for audio-visual emotion recognition publication-title: IEEE Trans. Multimed. doi: 10.1109/TMM.2011.2171334 – ident: ref_22 doi: 10.21437/Interspeech.2019-1873 – volume: 44 start-page: 572 year: 2011 ident: ref_6 article-title: Survey on speech emotion recognition: Features, classification schemes, and databases publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2010.09.020 – ident: ref_17 – ident: ref_45 – volume: 42 start-page: 335 year: 2008 ident: ref_24 article-title: IEMOCAP: Interactive emotional dyadic motion capture database publication-title: Lang. Resour. Eval. doi: 10.1007/s10579-008-9076-6 – ident: ref_59 doi: 10.21437/Interspeech.2018-1832 – volume: 61 start-page: 90 year: 2018 ident: ref_2 article-title: Speech emotion recognition: Two decades in a nutshell, benchmarks, and ongoing trends publication-title: Commun. ACM doi: 10.1145/3129340 – ident: ref_50 doi: 10.21437/Interspeech.2016-488 – ident: ref_28 – ident: ref_43 doi: 10.25080/Majora-7b98e3ed-003 – ident: ref_25 doi: 10.21437/Interspeech.2005-446 – ident: ref_47 – ident: ref_18 doi: 10.18653/v1/N18-1202 – volume: 6 start-page: 11 year: 2015 ident: ref_5 article-title: The influence of language and culture on the understanding of vocal emotions publication-title: J. Est. Finno Ugric Linguist. – volume: 41 start-page: 603 year: 2003 ident: ref_26 article-title: Speech emotion recognition using hidden Markov models publication-title: Speech Commun. doi: 10.1016/S0167-6393(03)00099-2 – volume: 7 start-page: 75798 year: 2019 ident: ref_41 article-title: Exploration of complementary features for speech emotion recognition based on kernel extreme learning machine publication-title: IEEE Access doi: 10.1109/ACCESS.2019.2921390 – ident: ref_1 doi: 10.1145/2661806.2661810 – volume: 7 start-page: 93847 year: 2019 ident: ref_36 article-title: Dual exclusive attentive transfer for unsupervised deep convolutional domain adaptation in speech emotion recognition publication-title: IEEE Access doi: 10.1109/ACCESS.2019.2924597 – volume: 19 start-page: 153 year: 2009 ident: ref_44 article-title: Time–frequency feature representation using energy concentration: An overview of recent advances publication-title: Digit. Signal Process. doi: 10.1016/j.dsp.2007.12.004 – ident: ref_52 doi: 10.1109/ICASSP.2015.7178964 – ident: ref_13 doi: 10.1145/2647868.2654984 – ident: ref_21 doi: 10.1109/ICASSP40776.2020.9054458 – volume: 15 start-page: 1929 year: 2014 ident: ref_53 article-title: Dropout: A simple way to prevent neural networks from overfitting publication-title: J. Mach. Learn. Res. – ident: ref_11 doi: 10.1109/ICASSP.2013.6638346 – ident: ref_23 doi: 10.21437/Interspeech.2017-1637 – volume: 7 start-page: 90368 year: 2019 ident: ref_57 article-title: Parallelized convolutional recurrent neural network with spectral features for speech emotion recognition publication-title: IEEE Access doi: 10.1109/ACCESS.2019.2927384 – ident: ref_42 doi: 10.1109/APSIPA.2016.7820699 – ident: ref_20 doi: 10.1109/ICASSP40776.2020.9053176 – volume: 7 start-page: 125868 year: 2019 ident: ref_38 article-title: Speech emotion recognition from 3D log-mel spectrograms with deep learning network publication-title: IEEE Access doi: 10.1109/ACCESS.2019.2938007 – ident: ref_29 – ident: ref_54 – ident: ref_46 – ident: ref_37 doi: 10.1109/IJCNN.2016.7727636 – volume: 17 start-page: 1 year: 2016 ident: ref_31 article-title: Music emotion recognition: The combined evidence of MFCC and residual phase publication-title: Egypt. Inform. J. doi: 10.1016/j.eij.2015.05.004 – ident: ref_35 doi: 10.1145/3123266.3123371 – volume: 7 start-page: 92871 year: 2019 ident: ref_34 article-title: ECG arrhythmia classification using STFT-based spectrogram and convolutional neural network publication-title: IEEE Access doi: 10.1109/ACCESS.2019.2928017 – ident: ref_19 – ident: ref_48 doi: 10.21437/Interspeech.2013-438 – ident: ref_51 doi: 10.21437/Interspeech.2019-1649 |
| SSID | ssj0023338 |
| Score | 2.470145 |
| Snippet | Speech emotion recognition predicts the emotional state of a speaker based on the person’s speech. It brings an additional element for creating more natural... Speech emotion recognition predicts the emotional state of a speaker based on the person's speech. It brings an additional element for creating more natural... |
| SourceID | doaj pubmedcentral proquest pubmed crossref |
| SourceType | Open Website Open Access Repository Aggregation Database Index Database Enrichment Source |
| StartPage | 6688 |
| SubjectTerms | Accuracy bidirectional encoder representations from transformers (BERT) convolutional neural networks (CNNs) Deep learning Emotions Experiments Humans Neural networks Neural Networks, Computer representation Signal processing spatiotemporal representation Speech speech emotion recognition transformer |
| SummonAdditionalLinks | – databaseName: ProQuest Central dbid: BENPR link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Nb9QwELVgywEOlG8WCjKIA5eoie3ECRfUrXbFAa1WyyL1Fjn2mFaqkmU_-vuZSbxpF1W99BqPoolmbM8bO-8x9kWBcj5RJlKJR4BSkZC7E2kEqrI-hTzRro30Tz2d5mdnxSw03NbhWuVuTWwXatdY6pEfC5UpEp9L1ffl34hUo-h0NUhoPGQHxFSmBuxgNJ7O5j3kkojAOj4hieD-eI1QH3ekVmblehdqyfpvqzD_vyh5Y-eZHN7X52fsaag5-UmXJM_ZA6hfsCc3mAhfssVkS22z6LSpr0bj-eIbn5kVyaxccnoU8pOb2nEa5p05x4qX_1oC2HM-7vSA-Hx3I6mpX7Hfk_Hi9EcUBBcim8bxJrKZqarKQWxjA7nwMs4tifN4gyjN5F5qpRyAz5zHQi23UhuR20o7Onw12srXbFA3Nbxl3KNxJuLC4CBCOPod1rpCeYAMMadIhuzrLgClDWzkJIpxWSIqoViVfayG7HNvuuwoOG4zGlEUewNizW4fNKs_ZZiEpdSQeoEYztDChUAzB-OhcJkVqbWQDdnRLo5lmMrr8jqIQ_apH8ZJSCcrpoZm29lkRG6I3_WmS5neEymxKpIFvlzvJdOeq_sj9cV5S_StNUntiXd3u_WePRbUBEiSSMgjNtistvCBPbJXm4v16mOYEf8A8f4XUw priority: 102 providerName: ProQuest |
| Title | Fusion-ConvBERT: Parallel Convolution and BERT Fusion for Speech Emotion Recognition |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/33238396 https://www.proquest.com/docview/2464936954 https://www.proquest.com/docview/2464604321 https://pubmed.ncbi.nlm.nih.gov/PMC7700332 https://doaj.org/article/37e5f2862a00489988eafe9d6c25cce6 |
| Volume | 20 |
| WOSCitedRecordID | wos000595199400001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 1424-8220 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0023338 issn: 1424-8220 databaseCode: DOA dateStart: 20010101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 1424-8220 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0023338 issn: 1424-8220 databaseCode: M~E dateStart: 20010101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVPQU databaseName: Health & Medical Collection customDbUrl: eissn: 1424-8220 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0023338 issn: 1424-8220 databaseCode: 7X7 dateStart: 20010101 isFulltext: true titleUrlDefault: https://search.proquest.com/healthcomplete providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: eissn: 1424-8220 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0023338 issn: 1424-8220 databaseCode: BENPR dateStart: 20010101 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: Publicly Available Content Database customDbUrl: eissn: 1424-8220 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0023338 issn: 1424-8220 databaseCode: PIMPY dateStart: 20010101 isFulltext: true titleUrlDefault: http://search.proquest.com/publiccontent providerName: ProQuest |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Na9wwEBVt2kNzKP3upumilh56MbEl2bJ76wYvLTSL2W5hezKyNCKB4A37kWN-e2Zsr7sbArn04oM0GGlmhOZZ8nuMfVGgnI-UCVTkEaBUJOTuRByAqqyPIY20ayL9S08m6XyeFTtSX3QnrKUHbh13IjXEXmDdbSjZEBykYDxkLrEithYasm2serZgqoNaEpFXyyMkEdSfrBDi407UyKv8230akv77Ksu7FyR3dpzxC_a8KxX593aIL9kjqF-xwx0CwddsNt7Q167gdFFfj_Lp7BsvzJLUUS45NXVpxU3tOHXz1pxjocp_XwHYc563Mj58ur1ItKjfsD_jfHb6I-h0EgIbh-E6sImpqspBaEMDqfAyTC1p6niD4MqkXmqlHIBPnMf6KrVSG5HaSjs6MzXayrfsoF7U8J5xj8aJCDODnYi86C9W6zLlARKEiiIasK9b_5W2IxEnLYvLEsEEubrsXT1gn3vTq5Y54z6jEQWhNyCy66YBU6DsUqB8KAUG7HgbwrJbgatSqESRWGGsBuxT341rhw5ETA2LTWuTECchzutdG_F-JFJiMSMzfLney4W9oe731BfnDT-31qSQJ47-x9w-sGeCEH4UBUIes4P1cgMf2VN7vb5YLYfssZ7r5pkO2ZNRPimmw2Yh4PPsJse24udZ8fcWI9gOYQ |
| linkProvider | Directory of Open Access Journals |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1ZbxMxEB6VFAl44D4CBQwCiZdVd23vhYQQLakaNY2ikkrlafH6oJWq3ZCjiD_Fb2Qme7RBFW994HU9suz15_F8PuYDeCOtNC6QypOBQ4KSk5C74aFnZa5daJMgNsuRHsTDYXJ0lI7W4HfzFoauVTY-cemoTalpj3yTy0iS-FwoP05-eKQaRaerjYRGBYs9--snUrbZh_5nHN-3nO_0xtu7Xq0q4OnQ9-eejlSe58b62lc24U74iSYFGqeQiqjEiVhKY62LjMNoJNEiVjzReWzohFHFWmC912BdItiTDqyP-vujry3FE8j4qvxFQqT-5ozjEhlVsi7nq95SHOCyiPbvi5kXVrqdO__bP7oLt-uYmn2qJsE9WLPFfbh1IdPiAxjvLGhb0Nsui7Ot3sH4PRupKcnInDL6VM8_pgrDqJhV5gwjevZlYq0-Zr1K74gdNDeuyuIhHF5Jtx5BpygL-wSYQ-OI-6nCQqSo9NxXm1Q6ayPk1DzowrtmwDNdZ1sn0Y_TDFkXYSNrsdGF163ppEoxcpnRFqGmNaCs4MsP5fR7VjuZTMQ2dBw5qiLHjEQ6scrZ1ESah1rbqAsbDW6y2lXNsnPQdOFVW4xOhk6OVGHLRWUTUfJG7NfjCqJtS4TAqE-kWHm8At6Vpq6WFCfHy0TmcUxSgvzpv5v1Em7sjvcH2aA_3HsGNzlteASBx8UGdObThX0O1_XZ_GQ2fVHPRgbfrhrcfwDF6Xb- |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Jb9QwFLZKi1A5sC8DBQwCiUs0ie3ECRJCtJ0Ro5ZRVAapnILjpa1UJcMsRfw1fh3vZWsHVdx64Bo_WXb8-fl9Xt5HyGthhXGBUJ4IHBCUHIXcDQs9K3LtQhsH0lQjvS_H4_jwMEnXyO_2LQxeq2x9YuWoTalxj7zPRCRQfC4Ufddci0h3hx-mPzxUkMKT1lZOo4bInv31E-jb_P1oF8b6DWPDwWTnk9coDHg69P2FpyOV57mxvvaVjZnjfqxRjcYpoCUqdlwKYax1kXEQmcSaS8VinUuDp41Kag71XiMbEJILmGMb6ehz-q2jexzYX53LiPPE788ZLJdRLfFyvgJWQgGXRbd_X9K8sOoNb__P_-sOudXE2vRjPTnukjVb3CM3L2RgvE8mwyVuF3o7ZXG2PTiYvKOpmqG8zCnFT828pKowFItpbU4h0qdfptbqYzqodZDoQXsTqywekK9X0q2HZL0oC_uYUAfGEfMTBYVAXfEZsDaJcNZGwLVZ0CNv28HPdJOFHcVATjNgY4iTrMNJj7zqTKd16pHLjLYRQZ0BZguvPpSzo6xxPhmXNnQMuKtChw0EO7bK2cREmoVa26hHtloMZY0Lm2fnAOqRl10xOB88UVKFLZe1TYRJHaFfj2q4di3hHKJBnkDlcgXIK01dLSlOjqsE51KixCB78u9mvSA3ANHZ_mi895RsMtwHCQKP8S2yvpgt7TNyXZ8tTuaz583EpOT7VWP7Dx9Qf74 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Fusion-ConvBERT%3A+Parallel+Convolution+and+BERT+Fusion+for+Speech+Emotion+Recognition&rft.jtitle=Sensors+%28Basel%2C+Switzerland%29&rft.au=Sanghyun+Lee&rft.au=David+K.+Han&rft.au=Hanseok+Ko&rft.date=2020-11-23&rft.pub=MDPI+AG&rft.eissn=1424-8220&rft.volume=20&rft.issue=22&rft.spage=6688&rft_id=info:doi/10.3390%2Fs20226688&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_37e5f2862a00489988eafe9d6c25cce6 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1424-8220&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1424-8220&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1424-8220&client=summon |