Fusion-ConvBERT: Parallel Convolution and BERT Fusion for Speech Emotion Recognition

Speech emotion recognition predicts the emotional state of a speaker based on the person’s speech. It brings an additional element for creating more natural human–computer interactions. Earlier studies on emotional recognition have been primarily based on handcrafted features and manual labels. With...

Full description

Saved in:
Bibliographic Details
Published in:Sensors (Basel, Switzerland) Vol. 20; no. 22; p. 6688
Main Authors: Lee, Sanghyun, Han, David K., Ko, Hanseok
Format: Journal Article
Language:English
Published: Switzerland MDPI AG 23.11.2020
MDPI
Subjects:
ISSN:1424-8220, 1424-8220
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Speech emotion recognition predicts the emotional state of a speaker based on the person’s speech. It brings an additional element for creating more natural human–computer interactions. Earlier studies on emotional recognition have been primarily based on handcrafted features and manual labels. With the advent of deep learning, there have been some efforts in applying the deep-network-based approach to the problem of emotion recognition. As deep learning automatically extracts salient features correlated to speaker emotion, it brings certain advantages over the handcrafted-feature-based methods. There are, however, some challenges in applying them to the emotion recognition problem, because data required for properly training deep networks are often lacking. Therefore, there is a need for a new deep-learning-based approach which can exploit available information from given speech signals to the maximum extent possible. Our proposed method, called “Fusion-ConvBERT”, is a parallel fusion model consisting of bidirectional encoder representations from transformers and convolutional neural networks. Extensive experiments were conducted on the proposed model using the EMO-DB and Interactive Emotional Dyadic Motion Capture Database emotion corpus, and it was shown that the proposed method outperformed state-of-the-art techniques in most of the test configurations.
AbstractList Speech emotion recognition predicts the emotional state of a speaker based on the person’s speech. It brings an additional element for creating more natural human–computer interactions. Earlier studies on emotional recognition have been primarily based on handcrafted features and manual labels. With the advent of deep learning, there have been some efforts in applying the deep-network-based approach to the problem of emotion recognition. As deep learning automatically extracts salient features correlated to speaker emotion, it brings certain advantages over the handcrafted-feature-based methods. There are, however, some challenges in applying them to the emotion recognition problem, because data required for properly training deep networks are often lacking. Therefore, there is a need for a new deep-learning-based approach which can exploit available information from given speech signals to the maximum extent possible. Our proposed method, called “Fusion-ConvBERT”, is a parallel fusion model consisting of bidirectional encoder representations from transformers and convolutional neural networks. Extensive experiments were conducted on the proposed model using the EMO-DB and Interactive Emotional Dyadic Motion Capture Database emotion corpus, and it was shown that the proposed method outperformed state-of-the-art techniques in most of the test configurations.
Speech emotion recognition predicts the emotional state of a speaker based on the person's speech. It brings an additional element for creating more natural human-computer interactions. Earlier studies on emotional recognition have been primarily based on handcrafted features and manual labels. With the advent of deep learning, there have been some efforts in applying the deep-network-based approach to the problem of emotion recognition. As deep learning automatically extracts salient features correlated to speaker emotion, it brings certain advantages over the handcrafted-feature-based methods. There are, however, some challenges in applying them to the emotion recognition problem, because data required for properly training deep networks are often lacking. Therefore, there is a need for a new deep-learning-based approach which can exploit available information from given speech signals to the maximum extent possible. Our proposed method, called "Fusion-ConvBERT", is a parallel fusion model consisting of bidirectional encoder representations from transformers and convolutional neural networks. Extensive experiments were conducted on the proposed model using the EMO-DB and Interactive Emotional Dyadic Motion Capture Database emotion corpus, and it was shown that the proposed method outperformed state-of-the-art techniques in most of the test configurations.Speech emotion recognition predicts the emotional state of a speaker based on the person's speech. It brings an additional element for creating more natural human-computer interactions. Earlier studies on emotional recognition have been primarily based on handcrafted features and manual labels. With the advent of deep learning, there have been some efforts in applying the deep-network-based approach to the problem of emotion recognition. As deep learning automatically extracts salient features correlated to speaker emotion, it brings certain advantages over the handcrafted-feature-based methods. There are, however, some challenges in applying them to the emotion recognition problem, because data required for properly training deep networks are often lacking. Therefore, there is a need for a new deep-learning-based approach which can exploit available information from given speech signals to the maximum extent possible. Our proposed method, called "Fusion-ConvBERT", is a parallel fusion model consisting of bidirectional encoder representations from transformers and convolutional neural networks. Extensive experiments were conducted on the proposed model using the EMO-DB and Interactive Emotional Dyadic Motion Capture Database emotion corpus, and it was shown that the proposed method outperformed state-of-the-art techniques in most of the test configurations.
Author Ko, Hanseok
Han, David K.
Lee, Sanghyun
AuthorAffiliation 1 Department of Electronics and Electrical Engineering, Korea University, Seoul 136-713, Korea; shlee@ispl.korea.ac.kr
2 Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA 19104, USA; dkh42@drexel.edu
AuthorAffiliation_xml – name: 2 Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA 19104, USA; dkh42@drexel.edu
– name: 1 Department of Electronics and Electrical Engineering, Korea University, Seoul 136-713, Korea; shlee@ispl.korea.ac.kr
Author_xml – sequence: 1
  givenname: Sanghyun
  surname: Lee
  fullname: Lee, Sanghyun
– sequence: 2
  givenname: David K.
  surname: Han
  fullname: Han, David K.
– sequence: 3
  givenname: Hanseok
  orcidid: 0000-0002-8744-4514
  surname: Ko
  fullname: Ko, Hanseok
BackLink https://www.ncbi.nlm.nih.gov/pubmed/33238396$$D View this record in MEDLINE/PubMed
BookMark eNptksFq3DAQhkVJaZJND32BYuglPbiRJVmWeiiky6YNBFqS7VnI8mijxSttJTvQt6-8my5J6EnDzKefmX_mFB354AGhdxX-RKnEF4lgQjgX4hU6qRhhpSAEHz2Jj9FpSmuMCaVUvEHHlBIqqOQnaHk1Jhd8OQ_-4evidvm5-Kmj7nvoiykV-nHI5UL7rpjKxR4vbIjF3RbA3BeLTdght2DCyrspPkOvre4TvH18Z-jX1WI5_17e_Ph2Pb-8KU2N8VAartu27QAbrEEQS7EwkpPGakmYFpY2jHUAlndW1JUwtNFEmLbpWC2obgydoeu9bhf0Wm2j2-j4RwXt1C4R4krpODjTg6IN1JYITjTGTEgpBGgLsuOG1MYAz1pf9lrbsd1AZ8AP2Ydnos8r3t2rVXhQTYPx5OcMnT8KxPB7hDSojUsG-l57CGNShHHGMaOkyuiHF-g6jNFnq3aUpFzWLFPvn3Z0aOXf8jLwcQ-YGFKKYA9IhdV0GOpwGJm9eMEaN-hpWXkY1__nx1975LkV
CitedBy_id crossref_primary_10_1155_2022_7463091
crossref_primary_10_1109_ACCESS_2023_3297715
crossref_primary_10_1007_s00521_023_08798_1
crossref_primary_10_1016_j_asoc_2022_109648
crossref_primary_10_1016_j_cmpb_2022_106646
crossref_primary_10_1109_ACCESS_2024_3428336
crossref_primary_10_1016_j_imavis_2024_104901
crossref_primary_10_1109_ACCESS_2022_3163856
crossref_primary_10_3390_electronics12194034
crossref_primary_10_1109_TAFFC_2024_3399729
crossref_primary_10_3390_electronics12224703
crossref_primary_10_1007_s11042_024_19321_6
crossref_primary_10_1155_2023_9645611
crossref_primary_10_1155_2022_6005446
crossref_primary_10_1186_s13634_023_01073_4
crossref_primary_10_1016_j_apacoust_2025_110590
crossref_primary_10_3390_app14083276
crossref_primary_10_3390_s22010020
crossref_primary_10_1109_ACCESS_2021_3092735
crossref_primary_10_2196_74260
Cites_doi 10.1109/ICASSP.2016.7472669
10.1109/ACII.2015.7344669
10.1007/s11042-017-5292-7
10.1109/ICME.2003.1220939
10.3390/s17071694
10.1109/KST.2013.6512793
10.1109/ACCESS.2018.2888882
10.21437/Interspeech.2019-2680
10.3390/s19122730
10.1155/2017/1945630
10.1109/ICASSP.2011.5947651
10.1109/LSP.2018.2860246
10.1109/TMM.2014.2360798
10.1109/ACII.2017.8273599
10.1109/ICASSP.2017.7952552
10.1109/TMM.2013.2269314
10.1109/TMM.2011.2171334
10.21437/Interspeech.2019-1873
10.1016/j.patcog.2010.09.020
10.1007/s10579-008-9076-6
10.21437/Interspeech.2018-1832
10.1145/3129340
10.21437/Interspeech.2016-488
10.25080/Majora-7b98e3ed-003
10.21437/Interspeech.2005-446
10.18653/v1/N18-1202
10.1016/S0167-6393(03)00099-2
10.1109/ACCESS.2019.2921390
10.1145/2661806.2661810
10.1109/ACCESS.2019.2924597
10.1016/j.dsp.2007.12.004
10.1109/ICASSP.2015.7178964
10.1145/2647868.2654984
10.1109/ICASSP40776.2020.9054458
10.1109/ICASSP.2013.6638346
10.21437/Interspeech.2017-1637
10.1109/ACCESS.2019.2927384
10.1109/APSIPA.2016.7820699
10.1109/ICASSP40776.2020.9053176
10.1109/ACCESS.2019.2938007
10.1109/IJCNN.2016.7727636
10.1016/j.eij.2015.05.004
10.1145/3123266.3123371
10.1109/ACCESS.2019.2928017
10.21437/Interspeech.2013-438
10.21437/Interspeech.2019-1649
ContentType Journal Article
Copyright 2020. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
2020 by the authors. 2020
Copyright_xml – notice: 2020. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
– notice: 2020 by the authors. 2020
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
3V.
7X7
7XB
88E
8FI
8FJ
8FK
ABUWG
AFKRA
AZQEC
BENPR
CCPQU
DWQXO
FYUFA
GHDGH
K9.
M0S
M1P
PHGZM
PHGZT
PIMPY
PJZUB
PKEHL
PPXIY
PQEST
PQQKQ
PQUKI
PRINS
7X8
5PM
DOA
DOI 10.3390/s20226688
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
ProQuest Central (Corporate)
Health & Medical Collection
ProQuest Central (purchase pre-March 2016)
Medical Database (Alumni Edition)
Hospital Premium Collection
Hospital Premium Collection (Alumni Edition)
ProQuest Central (Alumni) (purchase pre-March 2016)
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Central Essentials - QC
ProQuest Central
ProQuest One
ProQuest Central Korea
ProQuest Health & Medical Collection
Health Research Premium Collection (Alumni)
ProQuest Health & Medical Complete (Alumni)
Health & Medical Collection (Alumni Edition)
Medical Database
Proquest Central Premium
ProQuest One Academic (New)
ProQuest Publicly Available Content Database
ProQuest Health & Medical Research Collection
ProQuest One Academic Middle East (New)
ProQuest One Health & Nursing
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central China
MEDLINE - Academic
PubMed Central (Full Participant titles)
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest Central Essentials
ProQuest Health & Medical Complete (Alumni)
ProQuest Central (Alumni Edition)
ProQuest One Community College
ProQuest One Health & Nursing
ProQuest Central China
ProQuest Central
ProQuest Health & Medical Research Collection
Health Research Premium Collection
Health and Medicine Complete (Alumni Edition)
ProQuest Central Korea
Health & Medical Research Collection
ProQuest Central (New)
ProQuest Medical Library (Alumni)
ProQuest One Academic Eastern Edition
ProQuest Hospital Collection
Health Research Premium Collection (Alumni)
ProQuest Hospital Collection (Alumni)
ProQuest Health & Medical Complete
ProQuest Medical Library
ProQuest One Academic UKI Edition
ProQuest One Academic
ProQuest One Academic (New)
ProQuest Central (Alumni)
MEDLINE - Academic
DatabaseTitleList CrossRef
Publicly Available Content Database
MEDLINE - Academic

MEDLINE

Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 3
  dbid: PIMPY
  name: Publicly Available Content Database
  url: http://search.proquest.com/publiccontent
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1424-8220
ExternalDocumentID oai_doaj_org_article_37e5f2862a00489988eafe9d6c25cce6
PMC7700332
33238396
10_3390_s20226688
Genre Journal Article
GrantInformation_xml – fundername: National Research Foundation (NRF) grant funded by the MSIP of Korea
  grantid: 2019R1A2C2009480
GroupedDBID ---
123
2WC
53G
5VS
7X7
88E
8FE
8FG
8FI
8FJ
AADQD
AAHBH
AAYXX
ABDBF
ABUWG
ACUHS
ADBBV
ADMLS
AENEX
AFFHD
AFKRA
AFZYC
ALMA_UNASSIGNED_HOLDINGS
BENPR
BPHCQ
BVXVI
CCPQU
CITATION
CS3
D1I
DU5
E3Z
EBD
ESX
F5P
FYUFA
GROUPED_DOAJ
GX1
HH5
HMCUK
HYE
KQ8
L6V
M1P
M48
MODMG
M~E
OK1
OVT
P2P
P62
PHGZM
PHGZT
PIMPY
PJZUB
PPXIY
PQQKQ
PROAC
PSQYO
RNS
RPM
TUS
UKHRP
XSB
~8M
ALIPV
CGR
CUY
CVF
ECM
EIF
NPM
3V.
7XB
8FK
AZQEC
DWQXO
K9.
PKEHL
PQEST
PQUKI
PRINS
7X8
PUEGO
5PM
ID FETCH-LOGICAL-c500t-c6abbbde0c0ae82f308c9627fa924a8f3744deef6df8518c37a28cb7d4583a7c3
IEDL.DBID DOA
ISICitedReferencesCount 22
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000595199400001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1424-8220
IngestDate Tue Oct 14 19:04:04 EDT 2025
Tue Nov 04 01:44:38 EST 2025
Thu Oct 02 11:18:07 EDT 2025
Tue Oct 07 06:58:22 EDT 2025
Thu Apr 03 07:07:49 EDT 2025
Tue Nov 18 20:52:04 EST 2025
Sat Nov 29 07:09:31 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 22
Keywords bidirectional encoder representations from transformers (BERT)
convolutional neural networks (CNNs)
speech emotion recognition
fusion model
transformer
representation
spatiotemporal representation
Language English
License Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c500t-c6abbbde0c0ae82f308c9627fa924a8f3744deef6df8518c37a28cb7d4583a7c3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0002-8744-4514
OpenAccessLink https://doaj.org/article/37e5f2862a00489988eafe9d6c25cce6
PMID 33238396
PQID 2464936954
PQPubID 2032333
ParticipantIDs doaj_primary_oai_doaj_org_article_37e5f2862a00489988eafe9d6c25cce6
pubmedcentral_primary_oai_pubmedcentral_nih_gov_7700332
proquest_miscellaneous_2464604321
proquest_journals_2464936954
pubmed_primary_33238396
crossref_primary_10_3390_s20226688
crossref_citationtrail_10_3390_s20226688
PublicationCentury 2000
PublicationDate 20201123
PublicationDateYYYYMMDD 2020-11-23
PublicationDate_xml – month: 11
  year: 2020
  text: 20201123
  day: 23
PublicationDecade 2020
PublicationPlace Switzerland
PublicationPlace_xml – name: Switzerland
– name: Basel
PublicationTitle Sensors (Basel, Switzerland)
PublicationTitleAlternate Sensors (Basel)
PublicationYear 2020
Publisher MDPI AG
MDPI
Publisher_xml – name: MDPI AG
– name: MDPI
References ref_50
Schuller (ref_2) 2018; 61
ref_58
ref_13
ref_12
ref_11
ref_55
ref_54
ref_52
ref_51
ref_19
ref_18
Badshah (ref_56) 2019; 78
ref_17
Chen (ref_15) 2018; 25
ref_16
ref_59
Lin (ref_4) 2011; 14
Altrov (ref_5) 2015; 6
Busso (ref_24) 2008; 42
Meng (ref_38) 2019; 7
Zhu (ref_10) 2017; 17
ref_25
ref_23
ref_22
ref_21
ref_20
Mao (ref_14) 2014; 16
ref_29
Jiang (ref_44) 2009; 19
ref_28
ref_27
Nalini (ref_31) 2016; 17
Khamparia (ref_33) 2019; 7
Jiang (ref_57) 2019; 7
Guo (ref_41) 2019; 7
ref_35
ref_32
Huang (ref_34) 2019; 7
ref_30
Kamel (ref_6) 2011; 44
ref_39
ref_37
Wu (ref_3) 2013; 15
Jiang (ref_40) 2019; 19
ref_47
ref_46
ref_45
ref_43
ref_42
Srivastava (ref_53) 2014; 15
ref_1
ref_49
ref_48
ref_9
ref_8
Ocquaye (ref_36) 2019; 7
ref_7
Nwe (ref_26) 2003; 41
References_xml – ident: ref_9
  doi: 10.1109/ICASSP.2016.7472669
– ident: ref_58
  doi: 10.1109/ACII.2015.7344669
– volume: 78
  start-page: 5571
  year: 2019
  ident: ref_56
  article-title: Deep features-based speech emotion recognition for smart affective services
  publication-title: Multimed. Tools Appl.
  doi: 10.1007/s11042-017-5292-7
– ident: ref_27
  doi: 10.1109/ICME.2003.1220939
– ident: ref_49
– volume: 17
  start-page: 1694
  year: 2017
  ident: ref_10
  article-title: Emotion recognition from Chinese speech for smart affective services using a combination of SVM and DBN
  publication-title: Sensors
  doi: 10.3390/s17071694
– ident: ref_30
  doi: 10.1109/KST.2013.6512793
– volume: 7
  start-page: 7717
  year: 2019
  ident: ref_33
  article-title: Sound classification using convolutional neural network and tensor deep stacking network
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2018.2888882
– ident: ref_55
  doi: 10.21437/Interspeech.2019-2680
– ident: ref_16
– volume: 19
  start-page: 2730
  year: 2019
  ident: ref_40
  article-title: Speech emotion recognition with heterogeneous feature unification of deep neural network
  publication-title: Sensors
  doi: 10.3390/s19122730
– ident: ref_32
  doi: 10.1155/2017/1945630
– ident: ref_12
  doi: 10.1109/ICASSP.2011.5947651
– volume: 25
  start-page: 1440
  year: 2018
  ident: ref_15
  article-title: 3-D convolutional recurrent neural networks with attention model for speech emotion recognition
  publication-title: IEEE Signal Process. Lett.
  doi: 10.1109/LSP.2018.2860246
– volume: 16
  start-page: 2203
  year: 2014
  ident: ref_14
  article-title: Learning salient features for speech emotion recognition using convolutional neural networks
  publication-title: IEEE Trans. Multimed.
  doi: 10.1109/TMM.2014.2360798
– ident: ref_8
– ident: ref_7
  doi: 10.1109/ACII.2017.8273599
– ident: ref_39
  doi: 10.1109/ICASSP.2017.7952552
– volume: 15
  start-page: 1880
  year: 2013
  ident: ref_3
  article-title: Two-level hierarchical alignment for semi-coupled HMM-based audiovisual emotion recognition with temporal course
  publication-title: IEEE Trans. Multimed.
  doi: 10.1109/TMM.2013.2269314
– volume: 14
  start-page: 142
  year: 2011
  ident: ref_4
  article-title: Error weighted semi-coupled hidden Markov model for audio-visual emotion recognition
  publication-title: IEEE Trans. Multimed.
  doi: 10.1109/TMM.2011.2171334
– ident: ref_22
  doi: 10.21437/Interspeech.2019-1873
– volume: 44
  start-page: 572
  year: 2011
  ident: ref_6
  article-title: Survey on speech emotion recognition: Features, classification schemes, and databases
  publication-title: Pattern Recognit.
  doi: 10.1016/j.patcog.2010.09.020
– ident: ref_17
– ident: ref_45
– volume: 42
  start-page: 335
  year: 2008
  ident: ref_24
  article-title: IEMOCAP: Interactive emotional dyadic motion capture database
  publication-title: Lang. Resour. Eval.
  doi: 10.1007/s10579-008-9076-6
– ident: ref_59
  doi: 10.21437/Interspeech.2018-1832
– volume: 61
  start-page: 90
  year: 2018
  ident: ref_2
  article-title: Speech emotion recognition: Two decades in a nutshell, benchmarks, and ongoing trends
  publication-title: Commun. ACM
  doi: 10.1145/3129340
– ident: ref_50
  doi: 10.21437/Interspeech.2016-488
– ident: ref_28
– ident: ref_43
  doi: 10.25080/Majora-7b98e3ed-003
– ident: ref_25
  doi: 10.21437/Interspeech.2005-446
– ident: ref_47
– ident: ref_18
  doi: 10.18653/v1/N18-1202
– volume: 6
  start-page: 11
  year: 2015
  ident: ref_5
  article-title: The influence of language and culture on the understanding of vocal emotions
  publication-title: J. Est. Finno Ugric Linguist.
– volume: 41
  start-page: 603
  year: 2003
  ident: ref_26
  article-title: Speech emotion recognition using hidden Markov models
  publication-title: Speech Commun.
  doi: 10.1016/S0167-6393(03)00099-2
– volume: 7
  start-page: 75798
  year: 2019
  ident: ref_41
  article-title: Exploration of complementary features for speech emotion recognition based on kernel extreme learning machine
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2019.2921390
– ident: ref_1
  doi: 10.1145/2661806.2661810
– volume: 7
  start-page: 93847
  year: 2019
  ident: ref_36
  article-title: Dual exclusive attentive transfer for unsupervised deep convolutional domain adaptation in speech emotion recognition
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2019.2924597
– volume: 19
  start-page: 153
  year: 2009
  ident: ref_44
  article-title: Time–frequency feature representation using energy concentration: An overview of recent advances
  publication-title: Digit. Signal Process.
  doi: 10.1016/j.dsp.2007.12.004
– ident: ref_52
  doi: 10.1109/ICASSP.2015.7178964
– ident: ref_13
  doi: 10.1145/2647868.2654984
– ident: ref_21
  doi: 10.1109/ICASSP40776.2020.9054458
– volume: 15
  start-page: 1929
  year: 2014
  ident: ref_53
  article-title: Dropout: A simple way to prevent neural networks from overfitting
  publication-title: J. Mach. Learn. Res.
– ident: ref_11
  doi: 10.1109/ICASSP.2013.6638346
– ident: ref_23
  doi: 10.21437/Interspeech.2017-1637
– volume: 7
  start-page: 90368
  year: 2019
  ident: ref_57
  article-title: Parallelized convolutional recurrent neural network with spectral features for speech emotion recognition
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2019.2927384
– ident: ref_42
  doi: 10.1109/APSIPA.2016.7820699
– ident: ref_20
  doi: 10.1109/ICASSP40776.2020.9053176
– volume: 7
  start-page: 125868
  year: 2019
  ident: ref_38
  article-title: Speech emotion recognition from 3D log-mel spectrograms with deep learning network
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2019.2938007
– ident: ref_29
– ident: ref_54
– ident: ref_46
– ident: ref_37
  doi: 10.1109/IJCNN.2016.7727636
– volume: 17
  start-page: 1
  year: 2016
  ident: ref_31
  article-title: Music emotion recognition: The combined evidence of MFCC and residual phase
  publication-title: Egypt. Inform. J.
  doi: 10.1016/j.eij.2015.05.004
– ident: ref_35
  doi: 10.1145/3123266.3123371
– volume: 7
  start-page: 92871
  year: 2019
  ident: ref_34
  article-title: ECG arrhythmia classification using STFT-based spectrogram and convolutional neural network
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2019.2928017
– ident: ref_19
– ident: ref_48
  doi: 10.21437/Interspeech.2013-438
– ident: ref_51
  doi: 10.21437/Interspeech.2019-1649
SSID ssj0023338
Score 2.470145
Snippet Speech emotion recognition predicts the emotional state of a speaker based on the person’s speech. It brings an additional element for creating more natural...
Speech emotion recognition predicts the emotional state of a speaker based on the person's speech. It brings an additional element for creating more natural...
SourceID doaj
pubmedcentral
proquest
pubmed
crossref
SourceType Open Website
Open Access Repository
Aggregation Database
Index Database
Enrichment Source
StartPage 6688
SubjectTerms Accuracy
bidirectional encoder representations from transformers (BERT)
convolutional neural networks (CNNs)
Deep learning
Emotions
Experiments
Humans
Neural networks
Neural Networks, Computer
representation
Signal processing
spatiotemporal representation
Speech
speech emotion recognition
transformer
SummonAdditionalLinks – databaseName: ProQuest Central
  dbid: BENPR
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Nb9QwELVgywEOlG8WCjKIA5eoie3ECRfUrXbFAa1WyyL1Fjn2mFaqkmU_-vuZSbxpF1W99BqPoolmbM8bO-8x9kWBcj5RJlKJR4BSkZC7E2kEqrI-hTzRro30Tz2d5mdnxSw03NbhWuVuTWwXatdY6pEfC5UpEp9L1ffl34hUo-h0NUhoPGQHxFSmBuxgNJ7O5j3kkojAOj4hieD-eI1QH3ekVmblehdqyfpvqzD_vyh5Y-eZHN7X52fsaag5-UmXJM_ZA6hfsCc3mAhfssVkS22z6LSpr0bj-eIbn5kVyaxccnoU8pOb2nEa5p05x4qX_1oC2HM-7vSA-Hx3I6mpX7Hfk_Hi9EcUBBcim8bxJrKZqarKQWxjA7nwMs4tifN4gyjN5F5qpRyAz5zHQi23UhuR20o7Onw12srXbFA3Nbxl3KNxJuLC4CBCOPod1rpCeYAMMadIhuzrLgClDWzkJIpxWSIqoViVfayG7HNvuuwoOG4zGlEUewNizW4fNKs_ZZiEpdSQeoEYztDChUAzB-OhcJkVqbWQDdnRLo5lmMrr8jqIQ_apH8ZJSCcrpoZm29lkRG6I3_WmS5neEymxKpIFvlzvJdOeq_sj9cV5S_StNUntiXd3u_WePRbUBEiSSMgjNtistvCBPbJXm4v16mOYEf8A8f4XUw
  priority: 102
  providerName: ProQuest
Title Fusion-ConvBERT: Parallel Convolution and BERT Fusion for Speech Emotion Recognition
URI https://www.ncbi.nlm.nih.gov/pubmed/33238396
https://www.proquest.com/docview/2464936954
https://www.proquest.com/docview/2464604321
https://pubmed.ncbi.nlm.nih.gov/PMC7700332
https://doaj.org/article/37e5f2862a00489988eafe9d6c25cce6
Volume 20
WOSCitedRecordID wos000595199400001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 1424-8220
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0023338
  issn: 1424-8220
  databaseCode: DOA
  dateStart: 20010101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 1424-8220
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0023338
  issn: 1424-8220
  databaseCode: M~E
  dateStart: 20010101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVPQU
  databaseName: Health & Medical Collection
  customDbUrl:
  eissn: 1424-8220
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0023338
  issn: 1424-8220
  databaseCode: 7X7
  dateStart: 20010101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/healthcomplete
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl:
  eissn: 1424-8220
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0023338
  issn: 1424-8220
  databaseCode: BENPR
  dateStart: 20010101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Publicly Available Content Database
  customDbUrl:
  eissn: 1424-8220
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0023338
  issn: 1424-8220
  databaseCode: PIMPY
  dateStart: 20010101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/publiccontent
  providerName: ProQuest
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Na9wwEBVt2kNzKP3upumilh56MbEl2bJ76wYvLTSL2W5hezKyNCKB4A37kWN-e2Zsr7sbArn04oM0GGlmhOZZ8nuMfVGgnI-UCVTkEaBUJOTuRByAqqyPIY20ayL9S08m6XyeFTtSX3QnrKUHbh13IjXEXmDdbSjZEBykYDxkLrEithYasm2serZgqoNaEpFXyyMkEdSfrBDi407UyKv8230akv77Ksu7FyR3dpzxC_a8KxX593aIL9kjqF-xwx0CwddsNt7Q167gdFFfj_Lp7BsvzJLUUS45NXVpxU3tOHXz1pxjocp_XwHYc563Mj58ur1ItKjfsD_jfHb6I-h0EgIbh-E6sImpqspBaEMDqfAyTC1p6niD4MqkXmqlHIBPnMf6KrVSG5HaSjs6MzXayrfsoF7U8J5xj8aJCDODnYi86C9W6zLlARKEiiIasK9b_5W2IxEnLYvLEsEEubrsXT1gn3vTq5Y54z6jEQWhNyCy66YBU6DsUqB8KAUG7HgbwrJbgatSqESRWGGsBuxT341rhw5ETA2LTWuTECchzutdG_F-JFJiMSMzfLney4W9oe731BfnDT-31qSQJ47-x9w-sGeCEH4UBUIes4P1cgMf2VN7vb5YLYfssZ7r5pkO2ZNRPimmw2Yh4PPsJse24udZ8fcWI9gOYQ
linkProvider Directory of Open Access Journals
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1ZbxMxEB6VFAl44D4CBQwCiZdVd23vhYQQLakaNY2ikkrlafH6oJWq3ZCjiD_Fb2Qme7RBFW994HU9suz15_F8PuYDeCOtNC6QypOBQ4KSk5C74aFnZa5daJMgNsuRHsTDYXJ0lI7W4HfzFoauVTY-cemoTalpj3yTy0iS-FwoP05-eKQaRaerjYRGBYs9--snUrbZh_5nHN-3nO_0xtu7Xq0q4OnQ9-eejlSe58b62lc24U74iSYFGqeQiqjEiVhKY62LjMNoJNEiVjzReWzohFHFWmC912BdItiTDqyP-vujry3FE8j4qvxFQqT-5ozjEhlVsi7nq95SHOCyiPbvi5kXVrqdO__bP7oLt-uYmn2qJsE9WLPFfbh1IdPiAxjvLGhb0Nsui7Ot3sH4PRupKcnInDL6VM8_pgrDqJhV5gwjevZlYq0-Zr1K74gdNDeuyuIhHF5Jtx5BpygL-wSYQ-OI-6nCQqSo9NxXm1Q6ayPk1DzowrtmwDNdZ1sn0Y_TDFkXYSNrsdGF163ppEoxcpnRFqGmNaCs4MsP5fR7VjuZTMQ2dBw5qiLHjEQ6scrZ1ESah1rbqAsbDW6y2lXNsnPQdOFVW4xOhk6OVGHLRWUTUfJG7NfjCqJtS4TAqE-kWHm8At6Vpq6WFCfHy0TmcUxSgvzpv5v1Em7sjvcH2aA_3HsGNzlteASBx8UGdObThX0O1_XZ_GQ2fVHPRgbfrhrcfwDF6Xb-
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Jb9QwFLZKi1A5sC8DBQwCiUs0ie3ECRJCtJ0Ro5ZRVAapnILjpa1UJcMsRfw1fh3vZWsHVdx64Bo_WXb8-fl9Xt5HyGthhXGBUJ4IHBCUHIXcDQs9K3LtQhsH0lQjvS_H4_jwMEnXyO_2LQxeq2x9YuWoTalxj7zPRCRQfC4Ufddci0h3hx-mPzxUkMKT1lZOo4bInv31E-jb_P1oF8b6DWPDwWTnk9coDHg69P2FpyOV57mxvvaVjZnjfqxRjcYpoCUqdlwKYax1kXEQmcSaS8VinUuDp41Kag71XiMbEJILmGMb6ehz-q2jexzYX53LiPPE788ZLJdRLfFyvgJWQgGXRbd_X9K8sOoNb__P_-sOudXE2vRjPTnukjVb3CM3L2RgvE8mwyVuF3o7ZXG2PTiYvKOpmqG8zCnFT828pKowFItpbU4h0qdfptbqYzqodZDoQXsTqywekK9X0q2HZL0oC_uYUAfGEfMTBYVAXfEZsDaJcNZGwLVZ0CNv28HPdJOFHcVATjNgY4iTrMNJj7zqTKd16pHLjLYRQZ0BZguvPpSzo6xxPhmXNnQMuKtChw0EO7bK2cREmoVa26hHtloMZY0Lm2fnAOqRl10xOB88UVKFLZe1TYRJHaFfj2q4di3hHKJBnkDlcgXIK01dLSlOjqsE51KixCB78u9mvSA3ANHZ_mi895RsMtwHCQKP8S2yvpgt7TNyXZ8tTuaz583EpOT7VWP7Dx9Qf74
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Fusion-ConvBERT%3A+Parallel+Convolution+and+BERT+Fusion+for+Speech+Emotion+Recognition&rft.jtitle=Sensors+%28Basel%2C+Switzerland%29&rft.au=Sanghyun+Lee&rft.au=David+K.+Han&rft.au=Hanseok+Ko&rft.date=2020-11-23&rft.pub=MDPI+AG&rft.eissn=1424-8220&rft.volume=20&rft.issue=22&rft.spage=6688&rft_id=info:doi/10.3390%2Fs20226688&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_37e5f2862a00489988eafe9d6c25cce6
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1424-8220&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1424-8220&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1424-8220&client=summon