A novel transformer autoencoder for multi-modal emotion recognition with incomplete data

Multi-modal signals have become essential data for emotion recognition since they can represent emotions more comprehensively. However, in real-world environments, it is often impossible to acquire complete data on multi-modal signals, and the problem of missing modalities causes severe performance...

Full description

Saved in:
Bibliographic Details
Published in:Neural networks Vol. 172; p. 106111
Main Authors: Cheng, Cheng, Liu, Wenzhe, Fan, Zhaoxin, Feng, Lin, Jia, Ziyu
Format: Journal Article
Language:English
Published: United States Elsevier Ltd 01.04.2024
Subjects:
ISSN:0893-6080, 1879-2782, 1879-2782
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Multi-modal signals have become essential data for emotion recognition since they can represent emotions more comprehensively. However, in real-world environments, it is often impossible to acquire complete data on multi-modal signals, and the problem of missing modalities causes severe performance degradation in emotion recognition. Therefore, this paper represents the first attempt to use a transformer-based architecture, aiming to fill the modality-incomplete data from partially observed data for multi-modal emotion recognition (MER). Concretely, this paper proposes a novel unified model called transformer autoencoder (TAE), comprising a modality-specific hybrid transformer encoder, an inter-modality transformer encoder, and a convolutional decoder. The modality-specific hybrid transformer encoder bridges a convolutional encoder and a transformer encoder, allowing the encoder to learn local and global context information within each particular modality. The inter-modality transformer encoder builds and aligns global cross-modal correlations and models long-range contextual information with different modalities. The convolutional decoder decodes the encoding features to produce more precise recognition. Besides, a regularization term is introduced into the convolutional decoder to force the decoder to fully leverage the complete and incomplete data for emotional recognition of missing data. 96.33%, 95.64%, and 92.69% accuracies are attained on the available data of the DEAP and SEED-IV datasets, and 93.25%, 92.23%, and 81.76% accuracies are obtained on the missing data. Particularly, the model acquires a 5.61% advantage with 70% missing data, demonstrating that the model outperforms some state-of-the-art approaches in incomplete multi-modal learning.
AbstractList Multi-modal signals have become essential data for emotion recognition since they can represent emotions more comprehensively. However, in real-world environments, it is often impossible to acquire complete data on multi-modal signals, and the problem of missing modalities causes severe performance degradation in emotion recognition. Therefore, this paper represents the first attempt to use a transformer-based architecture, aiming to fill the modality-incomplete data from partially observed data for multi-modal emotion recognition (MER). Concretely, this paper proposes a novel unified model called transformer autoencoder (TAE), comprising a modality-specific hybrid transformer encoder, an inter-modality transformer encoder, and a convolutional decoder. The modality-specific hybrid transformer encoder bridges a convolutional encoder and a transformer encoder, allowing the encoder to learn local and global context information within each particular modality. The inter-modality transformer encoder builds and aligns global cross-modal correlations and models long-range contextual information with different modalities. The convolutional decoder decodes the encoding features to produce more precise recognition. Besides, a regularization term is introduced into the convolutional decoder to force the decoder to fully leverage the complete and incomplete data for emotional recognition of missing data. 96.33%, 95.64%, and 92.69% accuracies are attained on the available data of the DEAP and SEED-IV datasets, and 93.25%, 92.23%, and 81.76% accuracies are obtained on the missing data. Particularly, the model acquires a 5.61% advantage with 70% missing data, demonstrating that the model outperforms some state-of-the-art approaches in incomplete multi-modal learning.Multi-modal signals have become essential data for emotion recognition since they can represent emotions more comprehensively. However, in real-world environments, it is often impossible to acquire complete data on multi-modal signals, and the problem of missing modalities causes severe performance degradation in emotion recognition. Therefore, this paper represents the first attempt to use a transformer-based architecture, aiming to fill the modality-incomplete data from partially observed data for multi-modal emotion recognition (MER). Concretely, this paper proposes a novel unified model called transformer autoencoder (TAE), comprising a modality-specific hybrid transformer encoder, an inter-modality transformer encoder, and a convolutional decoder. The modality-specific hybrid transformer encoder bridges a convolutional encoder and a transformer encoder, allowing the encoder to learn local and global context information within each particular modality. The inter-modality transformer encoder builds and aligns global cross-modal correlations and models long-range contextual information with different modalities. The convolutional decoder decodes the encoding features to produce more precise recognition. Besides, a regularization term is introduced into the convolutional decoder to force the decoder to fully leverage the complete and incomplete data for emotional recognition of missing data. 96.33%, 95.64%, and 92.69% accuracies are attained on the available data of the DEAP and SEED-IV datasets, and 93.25%, 92.23%, and 81.76% accuracies are obtained on the missing data. Particularly, the model acquires a 5.61% advantage with 70% missing data, demonstrating that the model outperforms some state-of-the-art approaches in incomplete multi-modal learning.
Multi-modal signals have become essential data for emotion recognition since they can represent emotions more comprehensively. However, in real-world environments, it is often impossible to acquire complete data on multi-modal signals, and the problem of missing modalities causes severe performance degradation in emotion recognition. Therefore, this paper represents the first attempt to use a transformer-based architecture, aiming to fill the modality-incomplete data from partially observed data for multi-modal emotion recognition (MER). Concretely, this paper proposes a novel unified model called transformer autoencoder (TAE), comprising a modality-specific hybrid transformer encoder, an inter-modality transformer encoder, and a convolutional decoder. The modality-specific hybrid transformer encoder bridges a convolutional encoder and a transformer encoder, allowing the encoder to learn local and global context information within each particular modality. The inter-modality transformer encoder builds and aligns global cross-modal correlations and models long-range contextual information with different modalities. The convolutional decoder decodes the encoding features to produce more precise recognition. Besides, a regularization term is introduced into the convolutional decoder to force the decoder to fully leverage the complete and incomplete data for emotional recognition of missing data. 96.33%, 95.64%, and 92.69% accuracies are attained on the available data of the DEAP and SEED-IV datasets, and 93.25%, 92.23%, and 81.76% accuracies are obtained on the missing data. Particularly, the model acquires a 5.61% advantage with 70% missing data, demonstrating that the model outperforms some state-of-the-art approaches in incomplete multi-modal learning.
ArticleNumber 106111
Author Cheng, Cheng
Jia, Ziyu
Liu, Wenzhe
Fan, Zhaoxin
Feng, Lin
Author_xml – sequence: 1
  givenname: Cheng
  orcidid: 0000-0002-2138-6286
  surname: Cheng
  fullname: Cheng, Cheng
  organization: Department of Computer Science and Technology, Dalian University of Technology, Dalian, China
– sequence: 2
  givenname: Wenzhe
  surname: Liu
  fullname: Liu, Wenzhe
  organization: School of Information Engineering, Huzhou University, Huzhou, China
– sequence: 3
  givenname: Zhaoxin
  surname: Fan
  fullname: Fan, Zhaoxin
  organization: Renmin University of China, Psyche AI Inc, Beijing, China
– sequence: 4
  givenname: Lin
  surname: Feng
  fullname: Feng, Lin
  email: fenglin@dlut.edu.cn
  organization: Department of Computer Science and Technology, Dalian University of Technology, Dalian, China
– sequence: 5
  givenname: Ziyu
  surname: Jia
  fullname: Jia, Ziyu
  organization: Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China
BackLink https://www.ncbi.nlm.nih.gov/pubmed/38237444$$D View this record in MEDLINE/PubMed
BookMark eNqFkE1P3DAQhq0K1F1o_wFCOfaSrb_iOD1UQghoJSQuIHGzHHvSepXYi-1Q9d_XS9gLh3Ka0eh9XmmeE3TkgweEzgjeEEzE1-3Gw-whbyimvJwEIeQDWhPZdjVtJT1Cayw7Vgss8QqdpLTFGAvJ2Ue0YpKylnO-Ro8XlQ_PMFY5ap-GECeIlZ5zAG-CLXs5VdM8ZldPweqxgilkF3wVwYRf3r3sf1z-XbkCTLsRMlRWZ_0JHQ96TPD5dZ6ih-ur-8sf9e3dzc_Li9vaMEFz3Q1UWAEUQy8aSi2XWoiub7CkjeaiJ6ZhwoDosbAG2rZnFpvODI1gDCTF7BR9WXp3MTzNkLKaXDIwjtpDmJOiHe1wI1vGS_T8NTr3E1i1i27S8a862CiBb0vAxJBShEEZl_X-xWLHjYpgtVevtmpRr_bq1aK-wPwNfOh_B_u-YFAkPTuIKhlX5IN1RXFWNrj_F_wDcYegEg
CitedBy_id crossref_primary_10_3390_s25030761
crossref_primary_10_1016_j_eswa_2024_125822
crossref_primary_10_3389_fncir_2025_1574877
crossref_primary_10_1007_s11571_024_10167_0
crossref_primary_10_1097_DM_2024_00001
crossref_primary_10_3390_sym16040471
crossref_primary_10_1016_j_neunet_2024_106624
crossref_primary_10_3389_frobt_2025_1462243
crossref_primary_10_70401_ec_2025_0010
crossref_primary_10_1016_j_neunet_2024_106837
crossref_primary_10_1007_s10115_025_02354_0
crossref_primary_10_1016_j_neucom_2024_129205
crossref_primary_10_1007_s11571_025_10277_3
crossref_primary_10_1016_j_knosys_2025_114182
crossref_primary_10_1016_j_neunet_2025_107267
crossref_primary_10_1021_acs_jpclett_5c02109
crossref_primary_10_1016_j_neunet_2025_107596
crossref_primary_10_1016_j_neunet_2024_106784
crossref_primary_10_1016_j_eswa_2024_125089
Cites_doi 10.1109/CVPRW56347.2022.00278
10.1109/TPAMI.2020.3037734
10.3389/fnhum.2023.1169949
10.1109/LSP.2022.3179946
10.1109/CVPRW56347.2022.00511
10.1109/TCDS.2021.3071170
10.1016/j.neunet.2019.10.010
10.1109/T-AFFC.2011.15
10.1609/aaai.v35i11.17231
10.1016/j.neuroimage.2012.03.059
10.32604/iasc.2023.025437
10.21437/Interspeech.2020-1190
10.1109/T-AFFC.2011.37
10.1109/CVPR52729.2023.01524
10.1137/080738970
10.1109/JSTARS.2017.2714338
10.1109/ACCESS.2021.3092735
10.1109/CVPR46437.2021.01102
10.1145/3474085.3475585
10.1145/3551626.3564965
10.1088/1741-2552/ac49a7
10.1609/aaai.v35i3.16330
10.1109/TCYB.2018.2797176
10.1109/CVPR52688.2022.02039
10.1109/CVPR46437.2021.00561
10.1016/j.specom.2022.02.006
10.1145/2487575.2487594
ContentType Journal Article
Copyright 2024 Elsevier Ltd
Copyright © 2024 Elsevier Ltd. All rights reserved.
Copyright_xml – notice: 2024 Elsevier Ltd
– notice: Copyright © 2024 Elsevier Ltd. All rights reserved.
DBID AAYXX
CITATION
NPM
7X8
DOI 10.1016/j.neunet.2024.106111
DatabaseName CrossRef
PubMed
MEDLINE - Academic
DatabaseTitle CrossRef
PubMed
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic

PubMed
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1879-2782
ExternalDocumentID 38237444
10_1016_j_neunet_2024_106111
S089360802400025X
Genre Journal Article
GroupedDBID ---
--K
--M
-~X
.DC
.~1
0R~
123
186
1B1
1RT
1~.
1~5
29N
4.4
457
4G.
53G
5RE
5VS
6TJ
7-5
71M
8P~
9JM
9JN
AABNK
AACTN
AAEDT
AAEDW
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXKI
AAXLA
AAXUO
AAYFN
ABAOU
ABBOA
ABCQJ
ABDPE
ABEFU
ABFNM
ABFRF
ABHFT
ABIVO
ABJNI
ABLJU
ABMAC
ABXDB
ACDAQ
ACGFO
ACGFS
ACIUM
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADGUI
ADJOM
ADMUD
ADRHT
AEBSH
AECPX
AEFWE
AEKER
AENEX
AFJKZ
AFKWA
AFTJW
AFXIZ
AGHFR
AGUBO
AGWIK
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJOXV
AKRWK
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ARUGR
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EJD
EO8
EO9
EP2
EP3
F0J
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
GBLVA
GBOLZ
HLZ
HMQ
HVGLF
HZ~
IHE
J1W
JJJVA
K-O
KOM
KZ1
LG9
LMP
M2V
M41
MHUIS
MO0
MOBAO
MVM
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
ROL
RPZ
SBC
SCC
SDF
SDG
SDP
SES
SEW
SNS
SPC
SPCBC
SSN
SST
SSV
SSW
SSZ
T5K
TAE
UAP
UNMZH
VOH
WUQ
XPP
ZMT
~G-
9DU
AATTM
AAYWO
AAYXX
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKYEP
ANKPU
APXCP
CITATION
EFKBS
EFLBG
~HD
AGCQF
AGRNS
BNPGV
NPM
SSH
7X8
ID FETCH-LOGICAL-c362t-9f26d6e20eb6522d48a669b50825a46b1c536ce6b06dce77b3d0c9cf5633e8203
ISICitedReferencesCount 23
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001163939200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0893-6080
1879-2782
IngestDate Thu Oct 02 12:07:49 EDT 2025
Mon Jul 21 05:55:30 EDT 2025
Sat Nov 29 05:33:06 EST 2025
Tue Nov 18 21:49:07 EST 2025
Sat Nov 09 15:59:23 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Emotion recognition
Incomplete data
Multi-modal signals
Convolutional encoder
Transformer autoencoder
Language English
License Copyright © 2024 Elsevier Ltd. All rights reserved.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c362t-9f26d6e20eb6522d48a669b50825a46b1c536ce6b06dce77b3d0c9cf5633e8203
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0002-2138-6286
PMID 38237444
PQID 2929058734
PQPubID 23479
ParticipantIDs proquest_miscellaneous_2929058734
pubmed_primary_38237444
crossref_citationtrail_10_1016_j_neunet_2024_106111
crossref_primary_10_1016_j_neunet_2024_106111
elsevier_sciencedirect_doi_10_1016_j_neunet_2024_106111
PublicationCentury 2000
PublicationDate 2024-04-01
PublicationDateYYYYMMDD 2024-04-01
PublicationDate_xml – month: 04
  year: 2024
  text: 2024-04-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Neural networks
PublicationTitleAlternate Neural Netw
PublicationYear 2024
Publisher Elsevier Ltd
Publisher_xml – name: Elsevier Ltd
References Liu, Qiu, Zheng, Lu (b23) 2022; 14
(pp. 21064–21075).
Praveen, R. G., de Melo, W. C., Ullah, N., Aslam, H., Zeeshan, O., Denorme, T., et al. (2022a). A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition. In
Liu, Zheng, Lu (b24) 2016
Kang, Zhao, Peng, Zhu, Zhou, Peng (b11) 2020; 122
Lopez-Paz, Sra, Smola, Ghahramani, Schölkopf (b25) 2014
(b33) 2022; 139
Lian, Liu, Tao (b18) 2022
Wu, Zheng, Li, Lu (b42) 2022; 19
(pp. 4400–4407).
(pp. 1–5).
Lee, Lin, Hsu, Hsu (b16) 2019
Lin, Y., Gou, Y., Liu, Z., Li, B., Lv, J., & Peng, X. (2021). COMPLETER: Incomplete multi-view clustering via contrastive prediction. In
Parthasarathy, Sundaram (b34) 2020
Cheng, Zhang, Liu, Liu, Feng (b4) 2022
Liu, Qiu, Zheng, Lu (b21) 2019
(pp. 2486–2495).
Soleymani, Pantic, Pun (b37) 2011; 3
Yuan, Wang, Thompson, Narayan, Ye, Initiative (b46) 2012; 61
Andrew, Arora, Bilmes, Livescu (b1) 2013
Mocanu, Tapu (b32) 2022
Lee, Han, Ko (b15) 2021; 9
Yuan, Z., Li, W., Xu, H., & Yu, W. (2021). Transformer-based feature reconstruction network for robust multimodal sentiment analysis. In
Ma, Huang, Zhang (b27) 2021
Ma, Xu, Huang, Zhang (b29) 2021
Koelstra, Muhl, Soleymani, Lee, Yazdani, Ebrahimi (b13) 2011; 3
(pp. 4652–4661).
Makiuchi, Uto, Shinoda (b30) 2021
Luo, Xu, Lai (b26) 2023
Xu, Li, Ren, Peng, Mo, Shi (b44) 2022
Cheng, Yu, Zhang, Feng (b3) 2023
Fan, Chen, Guo, Zhang, Kuang (b6) 2017; 10
Wang, Liu, Zhang, Zhang, Guo (b40) 2022
(pp. 4243–4247).
Chudasama, V., Kar, P., Gudmalwar, A., Shah, N., Wasnik, P., & Onoe, N. (2022). M2FNet: Multi-Modal Fusion Network for Emotion Recognition in Conversation. In
Cai, Candès, Shen (b2) 2010; 20
Zhang, Cui, Han, Zhou, Fu, Hu (b47) 2020
Zheng, Liu, Lu, Lu, Cichocki (b48) 2018; 49
Xiang, S., Yuan, L., Fan, W., Wang, Y., Thompson, P. M., & Ye, J. (2013). Multi-source learning with block-wise missing data for Alzheimer’s disease prediction. In
Gupta, V., Mittal, T., Mathur, P., Mishra, V., Maheshwari, M., Bera, A., et al. (2022). 3MASSIV: Multilingual, multimodal and multi-aspect dataset of social media short videos. In
Liu, Li, Tang, Xia, Xiong, Liu (b20) 2020; 43
(pp. 5661–5671).
Wang, H., Chen, Y., Ma, C., Avery, J., Hull, L., & Carneiro, G. (2023). Multi-Modal Learning With Missing Modality via Shared-Specific Feature Modelling. In
Hotelling (b9) 1992
Liu, Qiu, Zheng, Lu (b22) 2021; 14
Mittal, T., Mathur, P., Bera, A., & Manocha, D. (2021). Affect2mm: Affective analysis of multimedia content using emotion causality. In
Kavitha, RajivKannan (b12) 2023; 35
(pp. 15878–15887).
(pp. 185–193).
Gao, Fu, Ouyang, Wang (b7) 2022; 29
(pp. 10273–10281).
Ma, M., Ren, J., Zhao, L., Tulyakov, S., Wu, C., & Peng, X. (2021). Smil: Multimodal learning with severely missing modality. In
Krishna, D., & Patil, A. (2020). Multimodal Emotion Recognition Using Cross-Modal Attention and 1D Convolutional Neural Networks. In
Li, Pan, Huang, Pan, Wang (b17) 2023; 17
Praveen, R. G., de Melo, W. C., Ullah, N., Aslam, H., Zeeshan, O., Denorme, T., et al. (2022b). A joint cross-attention model for audio-visual fusion in dimensional emotion recognition. In
Wen, J., Zhang, Z., Zhang, Z., Zhu, L., Fei, L., Zhang, B., et al. (2021). Unified tensor framework for incomplete multi-view clustering and missing-view inferring. In
Wang, Ding, Tao, Gao, Fu (b39) 2018
John, V., & Kawanishi, Y. (2022). A Multimodal Sensor Fusion Framework Robust to Missing Modalities for Person Recognition. In
(pp. 11174–11183).
(pp. 2302–2310).
Lian (10.1016/j.neunet.2024.106111_b18) 2022
Zhang (10.1016/j.neunet.2024.106111_b47) 2020
Lee (10.1016/j.neunet.2024.106111_b15) 2021; 9
Cai (10.1016/j.neunet.2024.106111_b2) 2010; 20
Cheng (10.1016/j.neunet.2024.106111_b4) 2022
10.1016/j.neunet.2024.106111_b28
Lee (10.1016/j.neunet.2024.106111_b16) 2019
Liu (10.1016/j.neunet.2024.106111_b20) 2020; 43
Xu (10.1016/j.neunet.2024.106111_b44) 2022
10.1016/j.neunet.2024.106111_b35
10.1016/j.neunet.2024.106111_b14
Mocanu (10.1016/j.neunet.2024.106111_b32) 2022
(10.1016/j.neunet.2024.106111_b33) 2022; 139
10.1016/j.neunet.2024.106111_b36
Parthasarathy (10.1016/j.neunet.2024.106111_b34) 2020
Ma (10.1016/j.neunet.2024.106111_b29) 2021
Cheng (10.1016/j.neunet.2024.106111_b3) 2023
10.1016/j.neunet.2024.106111_b31
10.1016/j.neunet.2024.106111_b10
Luo (10.1016/j.neunet.2024.106111_b26) 2023
Wang (10.1016/j.neunet.2024.106111_b40) 2022
Wu (10.1016/j.neunet.2024.106111_b42) 2022; 19
Wang (10.1016/j.neunet.2024.106111_b39) 2018
Liu (10.1016/j.neunet.2024.106111_b22) 2021; 14
Kavitha (10.1016/j.neunet.2024.106111_b12) 2023; 35
Andrew (10.1016/j.neunet.2024.106111_b1) 2013
Hotelling (10.1016/j.neunet.2024.106111_b9) 1992
Koelstra (10.1016/j.neunet.2024.106111_b13) 2011; 3
10.1016/j.neunet.2024.106111_b19
Ma (10.1016/j.neunet.2024.106111_b27) 2021
10.1016/j.neunet.2024.106111_b5
Li (10.1016/j.neunet.2024.106111_b17) 2023; 17
Makiuchi (10.1016/j.neunet.2024.106111_b30) 2021
10.1016/j.neunet.2024.106111_b38
Yuan (10.1016/j.neunet.2024.106111_b46) 2012; 61
10.1016/j.neunet.2024.106111_b8
Liu (10.1016/j.neunet.2024.106111_b21) 2019
Zheng (10.1016/j.neunet.2024.106111_b48) 2018; 49
Liu (10.1016/j.neunet.2024.106111_b23) 2022; 14
Liu (10.1016/j.neunet.2024.106111_b24) 2016
10.1016/j.neunet.2024.106111_b45
Fan (10.1016/j.neunet.2024.106111_b6) 2017; 10
Lopez-Paz (10.1016/j.neunet.2024.106111_b25) 2014
Soleymani (10.1016/j.neunet.2024.106111_b37) 2011; 3
10.1016/j.neunet.2024.106111_b41
10.1016/j.neunet.2024.106111_b43
Kang (10.1016/j.neunet.2024.106111_b11) 2020; 122
Gao (10.1016/j.neunet.2024.106111_b7) 2022; 29
References_xml – reference: Xiang, S., Yuan, L., Fan, W., Wang, Y., Thompson, P. M., & Ye, J. (2013). Multi-source learning with block-wise missing data for Alzheimer’s disease prediction. In
– volume: 43
  start-page: 2634
  year: 2020
  end-page: 2646
  ident: b20
  article-title: Efficient and effective regularized incomplete multi-view clustering
  publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence
– reference: (pp. 10273–10281).
– reference: (pp. 15878–15887).
– reference: (pp. 2302–2310).
– year: 2022
  ident: b18
  article-title: Smin: Semi-supervised multi-modal interaction network for conversational emotion recognition
  publication-title: IEEE Transactions on Affective Computing
– reference: Mittal, T., Mathur, P., Bera, A., & Manocha, D. (2021). Affect2mm: Affective analysis of multimedia content using emotion causality. In
– volume: 17
  year: 2023
  ident: b17
  article-title: STGATE: Spatial-temporal graph attention network with a transformer encoder for EEG-based emotion recognition
  publication-title: Frontiers in Human Neuroscience
– start-page: 162
  year: 1992
  end-page: 190
  ident: b9
  article-title: Relations between two sets of variates
  publication-title: Breakthroughs in statistics: methodology and distribution
– reference: Krishna, D., & Patil, A. (2020). Multimodal Emotion Recognition Using Cross-Modal Attention and 1D Convolutional Neural Networks. In
– year: 2022
  ident: b4
  article-title: Multi-domain encoding of spatiotemporal dynamics in EEG for emotion recognition
  publication-title: IEEE Journal of Biomedical and Health Informatics
– volume: 10
  start-page: 4589
  year: 2017
  end-page: 4604
  ident: b6
  article-title: Hyperspectral image restoration using low-rank tensor recovery
  publication-title: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
– reference: Chudasama, V., Kar, P., Gudmalwar, A., Shah, N., Wasnik, P., & Onoe, N. (2022). M2FNet: Multi-Modal Fusion Network for Emotion Recognition in Conversation. In
– reference: (pp. 1–5).
– reference: (pp. 4652–4661).
– volume: 20
  start-page: 1956
  year: 2010
  end-page: 1982
  ident: b2
  article-title: A singular value thresholding algorithm for matrix completion
  publication-title: SIAM Journal on optimization
– volume: 139
  start-page: 1
  year: 2022
  end-page: 9
  ident: b33
  article-title: Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework
  publication-title: Speech Communication
– reference: Yuan, Z., Li, W., Xu, H., & Yu, W. (2021). Transformer-based feature reconstruction network for robust multimodal sentiment analysis. In
– volume: 14
  start-page: 715
  year: 2021
  end-page: 729
  ident: b22
  article-title: Comparing recognition performance and robustness of multimodal deep learning models for multimodal emotion recognition
  publication-title: IEEE Transactions on Cognitive and Developmental Systems
– reference: (pp. 4400–4407).
– reference: Gupta, V., Mittal, T., Mathur, P., Mishra, V., Maheshwari, M., Bera, A., et al. (2022). 3MASSIV: Multilingual, multimodal and multi-aspect dataset of social media short videos. In
– year: 2020
  ident: b47
  article-title: Deep partial multi-view learning
  publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence
– start-page: 1247
  year: 2013
  end-page: 1255
  ident: b1
  article-title: Deep canonical correlation analysis
  publication-title: International conference on machine learning
– start-page: 1359
  year: 2014
  end-page: 1367
  ident: b25
  article-title: Randomized nonlinear component analysis
  publication-title: International conference on machine learning
– reference: (pp. 4243–4247).
– start-page: 400
  year: 2020
  end-page: 404
  ident: b34
  article-title: Training strategies to handle missing modalities for audio-visual expression recognition
  publication-title: Companion publication of the 2020 international conference on multimodal interaction
– volume: 19
  year: 2022
  ident: b42
  article-title: Investigating EEG-based functional connectivity patterns for multimodal emotion recognition
  publication-title: Journal of Neural Engineering
– start-page: 610
  year: 2022
  end-page: 617
  ident: b40
  article-title: EEG-based emotion recognition using partial directed coherence dense graph propagation
  publication-title: 2022 14th international conference on measuring technology and mechatronics automation
– year: 2022
  ident: b44
  article-title: Deep incomplete multi-view clustering via mining cluster complementarity
– volume: 3
  start-page: 18
  year: 2011
  end-page: 31
  ident: b13
  article-title: Deap: A database for emotion analysis; using physiological signals
  publication-title: IEEE Transactions on Affective Computing
– start-page: 1290
  year: 2018
  end-page: 1295
  ident: b39
  article-title: Partial multi-view clustering via consistent GAN
  publication-title: 2018 IEEE international conference on data mining
– volume: 9
  start-page: 94557
  year: 2021
  end-page: 94572
  ident: b15
  article-title: Multimodal emotion recognition fusion analysis adapting BERT with heterogeneous feature unification
  publication-title: IEEE Access
– volume: 3
  start-page: 211
  year: 2011
  end-page: 223
  ident: b37
  article-title: Multimodal emotion recognition in response to videos
  publication-title: IEEE Transactions on Affective Computing
– year: 2023
  ident: b3
  article-title: Hybrid network using dynamic graph convolution and temporal self-attention for EEG-based emotion recognition
  publication-title: IEEE Transactions on Neural Networks and Learning Systems
– volume: 35
  year: 2023
  ident: b12
  article-title: Hybrid convolutional neural network and long short-term memory approach for facial expression recognition
  publication-title: Intelligent Automation & Soft Computing
– reference: (pp. 21064–21075).
– reference: Lin, Y., Gou, Y., Liu, Z., Li, B., Lv, J., & Peng, X. (2021). COMPLETER: Incomplete multi-view clustering via contrastive prediction. In
– volume: 122
  start-page: 279
  year: 2020
  end-page: 288
  ident: b11
  article-title: Partition level multiview subspace clustering
  publication-title: Neural Networks
– reference: Praveen, R. G., de Melo, W. C., Ullah, N., Aslam, H., Zeeshan, O., Denorme, T., et al. (2022b). A joint cross-attention model for audio-visual fusion in dimensional emotion recognition. In
– reference: (pp. 185–193).
– reference: (pp. 11174–11183).
– volume: 61
  start-page: 622
  year: 2012
  end-page: 632
  ident: b46
  article-title: Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data
  publication-title: NeuroImage
– reference: Wang, H., Chen, Y., Ma, C., Avery, J., Hull, L., & Carneiro, G. (2023). Multi-Modal Learning With Missing Modality via Shared-Specific Feature Modelling. In
– volume: 29
  start-page: 1574
  year: 2022
  end-page: 1578
  ident: b7
  article-title: EEG-GCN: Spatio-temporal and self-adaptive graph convolutional networks for single and multi-view EEG-based emotion recognition
  publication-title: IEEE Signal Processing Letters
– reference: (pp. 5661–5671).
– reference: Ma, M., Ren, J., Zhao, L., Tulyakov, S., Wu, C., & Peng, X. (2021). Smil: Multimodal learning with severely missing modality. In
– start-page: 411
  year: 2023
  end-page: 422
  ident: b26
  article-title: Multimodal reconstruct and align net for missing modality problem in sentiment analysis
  publication-title: International conference on multimedia modeling
– start-page: 1
  year: 2021
  end-page: 6
  ident: b27
  article-title: An efficient approach for audio-visual emotion recognition with missing labels and missing modalities
  publication-title: 2021 IEEE international conference on multimedia and expo
– start-page: 521
  year: 2016
  end-page: 529
  ident: b24
  article-title: Emotion recognition using multimodal deep learning
  publication-title: Neural information processing: 23rd international conference, ICONIP 2016, Kyoto, Japan, October 16–21, 2016, proceedings, Part II 23
– year: 2021
  ident: b29
  article-title: Maximum likelihood estimation for multimodal learning with missing modality
– start-page: 350
  year: 2021
  end-page: 357
  ident: b30
  article-title: Multimodal emotion recognition with high-level speech and text features
  publication-title: 2021 IEEE automatic speech recognition and understanding workshop
– start-page: 1
  year: 2022
  end-page: 5
  ident: b32
  article-title: Audio-video fusion with double attention for multimodal emotion recognition
  publication-title: 2022 IEEE 14th image, video, and multidimensional signal processing workshop
– reference: Praveen, R. G., de Melo, W. C., Ullah, N., Aslam, H., Zeeshan, O., Denorme, T., et al. (2022a). A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition. In
– reference: (pp. 2486–2495).
– reference: Wen, J., Zhang, Z., Zhang, Z., Zhu, L., Fei, L., Zhang, B., et al. (2021). Unified tensor framework for incomplete multi-view clustering and missing-view inferring. In
– year: 2019
  ident: b21
  article-title: Multimodal emotion recognition using deep canonical correlation analysis
– volume: 49
  start-page: 1110
  year: 2018
  end-page: 1122
  ident: b48
  article-title: Emotionmeter: A multimodal framework for recognizing human emotions
  publication-title: IEEE Transactions on Cybernetics
– reference: John, V., & Kawanishi, Y. (2022). A Multimodal Sensor Fusion Framework Robust to Missing Modalities for Person Recognition. In
– volume: 14
  start-page: 715
  year: 2022
  end-page: 729
  ident: b23
  article-title: Comparing recognition performance and robustness of multimodal deep learning models for multimodal emotion recognition
  publication-title: IEEE Transactions on Cognitive and Developmental Systems
– start-page: 3956
  year: 2019
  end-page: 3960
  ident: b16
  article-title: Audio feature generation for missing modality problem in video action recognition
  publication-title: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing
– year: 2023
  ident: 10.1016/j.neunet.2024.106111_b3
  article-title: Hybrid network using dynamic graph convolution and temporal self-attention for EEG-based emotion recognition
  publication-title: IEEE Transactions on Neural Networks and Learning Systems
– ident: 10.1016/j.neunet.2024.106111_b35
  doi: 10.1109/CVPRW56347.2022.00278
– year: 2020
  ident: 10.1016/j.neunet.2024.106111_b47
  article-title: Deep partial multi-view learning
  publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence
  doi: 10.1109/TPAMI.2020.3037734
– volume: 17
  year: 2023
  ident: 10.1016/j.neunet.2024.106111_b17
  article-title: STGATE: Spatial-temporal graph attention network with a transformer encoder for EEG-based emotion recognition
  publication-title: Frontiers in Human Neuroscience
  doi: 10.3389/fnhum.2023.1169949
– year: 2022
  ident: 10.1016/j.neunet.2024.106111_b18
  article-title: Smin: Semi-supervised multi-modal interaction network for conversational emotion recognition
  publication-title: IEEE Transactions on Affective Computing
– start-page: 3956
  year: 2019
  ident: 10.1016/j.neunet.2024.106111_b16
  article-title: Audio feature generation for missing modality problem in video action recognition
– ident: 10.1016/j.neunet.2024.106111_b36
  doi: 10.1109/CVPRW56347.2022.00278
– volume: 29
  start-page: 1574
  year: 2022
  ident: 10.1016/j.neunet.2024.106111_b7
  article-title: EEG-GCN: Spatio-temporal and self-adaptive graph convolutional networks for single and multi-view EEG-based emotion recognition
  publication-title: IEEE Signal Processing Letters
  doi: 10.1109/LSP.2022.3179946
– ident: 10.1016/j.neunet.2024.106111_b5
  doi: 10.1109/CVPRW56347.2022.00511
– start-page: 1359
  year: 2014
  ident: 10.1016/j.neunet.2024.106111_b25
  article-title: Randomized nonlinear component analysis
– start-page: 1
  year: 2021
  ident: 10.1016/j.neunet.2024.106111_b27
  article-title: An efficient approach for audio-visual emotion recognition with missing labels and missing modalities
– start-page: 162
  year: 1992
  ident: 10.1016/j.neunet.2024.106111_b9
  article-title: Relations between two sets of variates
– volume: 14
  start-page: 715
  issue: 2
  year: 2021
  ident: 10.1016/j.neunet.2024.106111_b22
  article-title: Comparing recognition performance and robustness of multimodal deep learning models for multimodal emotion recognition
  publication-title: IEEE Transactions on Cognitive and Developmental Systems
  doi: 10.1109/TCDS.2021.3071170
– volume: 122
  start-page: 279
  year: 2020
  ident: 10.1016/j.neunet.2024.106111_b11
  article-title: Partition level multiview subspace clustering
  publication-title: Neural Networks
  doi: 10.1016/j.neunet.2019.10.010
– volume: 3
  start-page: 18
  issue: 1
  year: 2011
  ident: 10.1016/j.neunet.2024.106111_b13
  article-title: Deap: A database for emotion analysis; using physiological signals
  publication-title: IEEE Transactions on Affective Computing
  doi: 10.1109/T-AFFC.2011.15
– ident: 10.1016/j.neunet.2024.106111_b41
  doi: 10.1609/aaai.v35i11.17231
– volume: 43
  start-page: 2634
  issue: 8
  year: 2020
  ident: 10.1016/j.neunet.2024.106111_b20
  article-title: Efficient and effective regularized incomplete multi-view clustering
  publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence
– volume: 61
  start-page: 622
  issue: 3
  year: 2012
  ident: 10.1016/j.neunet.2024.106111_b46
  article-title: Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data
  publication-title: NeuroImage
  doi: 10.1016/j.neuroimage.2012.03.059
– volume: 35
  issue: 1
  year: 2023
  ident: 10.1016/j.neunet.2024.106111_b12
  article-title: Hybrid convolutional neural network and long short-term memory approach for facial expression recognition
  publication-title: Intelligent Automation & Soft Computing
  doi: 10.32604/iasc.2023.025437
– ident: 10.1016/j.neunet.2024.106111_b14
  doi: 10.21437/Interspeech.2020-1190
– volume: 3
  start-page: 211
  issue: 2
  year: 2011
  ident: 10.1016/j.neunet.2024.106111_b37
  article-title: Multimodal emotion recognition in response to videos
  publication-title: IEEE Transactions on Affective Computing
  doi: 10.1109/T-AFFC.2011.37
– volume: 14
  start-page: 715
  issue: 2
  year: 2022
  ident: 10.1016/j.neunet.2024.106111_b23
  article-title: Comparing recognition performance and robustness of multimodal deep learning models for multimodal emotion recognition
  publication-title: IEEE Transactions on Cognitive and Developmental Systems
  doi: 10.1109/TCDS.2021.3071170
– year: 2021
  ident: 10.1016/j.neunet.2024.106111_b29
– ident: 10.1016/j.neunet.2024.106111_b38
  doi: 10.1109/CVPR52729.2023.01524
– start-page: 400
  year: 2020
  ident: 10.1016/j.neunet.2024.106111_b34
  article-title: Training strategies to handle missing modalities for audio-visual expression recognition
– volume: 20
  start-page: 1956
  issue: 4
  year: 2010
  ident: 10.1016/j.neunet.2024.106111_b2
  article-title: A singular value thresholding algorithm for matrix completion
  publication-title: SIAM Journal on optimization
  doi: 10.1137/080738970
– volume: 10
  start-page: 4589
  issue: 10
  year: 2017
  ident: 10.1016/j.neunet.2024.106111_b6
  article-title: Hyperspectral image restoration using low-rank tensor recovery
  publication-title: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
  doi: 10.1109/JSTARS.2017.2714338
– volume: 9
  start-page: 94557
  year: 2021
  ident: 10.1016/j.neunet.2024.106111_b15
  article-title: Multimodal emotion recognition fusion analysis adapting BERT with heterogeneous feature unification
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2021.3092735
– start-page: 411
  year: 2023
  ident: 10.1016/j.neunet.2024.106111_b26
  article-title: Multimodal reconstruct and align net for missing modality problem in sentiment analysis
– start-page: 1290
  year: 2018
  ident: 10.1016/j.neunet.2024.106111_b39
  article-title: Partial multi-view clustering via consistent GAN
– start-page: 350
  year: 2021
  ident: 10.1016/j.neunet.2024.106111_b30
  article-title: Multimodal emotion recognition with high-level speech and text features
– start-page: 1
  year: 2022
  ident: 10.1016/j.neunet.2024.106111_b32
  article-title: Audio-video fusion with double attention for multimodal emotion recognition
– start-page: 521
  year: 2016
  ident: 10.1016/j.neunet.2024.106111_b24
  article-title: Emotion recognition using multimodal deep learning
– start-page: 610
  year: 2022
  ident: 10.1016/j.neunet.2024.106111_b40
  article-title: EEG-based emotion recognition using partial directed coherence dense graph propagation
– ident: 10.1016/j.neunet.2024.106111_b19
  doi: 10.1109/CVPR46437.2021.01102
– ident: 10.1016/j.neunet.2024.106111_b45
  doi: 10.1145/3474085.3475585
– ident: 10.1016/j.neunet.2024.106111_b10
  doi: 10.1145/3551626.3564965
– volume: 19
  issue: 1
  year: 2022
  ident: 10.1016/j.neunet.2024.106111_b42
  article-title: Investigating EEG-based functional connectivity patterns for multimodal emotion recognition
  publication-title: Journal of Neural Engineering
  doi: 10.1088/1741-2552/ac49a7
– year: 2019
  ident: 10.1016/j.neunet.2024.106111_b21
– ident: 10.1016/j.neunet.2024.106111_b28
  doi: 10.1609/aaai.v35i3.16330
– volume: 49
  start-page: 1110
  issue: 3
  year: 2018
  ident: 10.1016/j.neunet.2024.106111_b48
  article-title: Emotionmeter: A multimodal framework for recognizing human emotions
  publication-title: IEEE Transactions on Cybernetics
  doi: 10.1109/TCYB.2018.2797176
– ident: 10.1016/j.neunet.2024.106111_b8
  doi: 10.1109/CVPR52688.2022.02039
– year: 2022
  ident: 10.1016/j.neunet.2024.106111_b44
– ident: 10.1016/j.neunet.2024.106111_b31
  doi: 10.1109/CVPR46437.2021.00561
– volume: 139
  start-page: 1
  year: 2022
  ident: 10.1016/j.neunet.2024.106111_b33
  article-title: Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework
  publication-title: Speech Communication
  doi: 10.1016/j.specom.2022.02.006
– ident: 10.1016/j.neunet.2024.106111_b43
  doi: 10.1145/2487575.2487594
– year: 2022
  ident: 10.1016/j.neunet.2024.106111_b4
  article-title: Multi-domain encoding of spatiotemporal dynamics in EEG for emotion recognition
  publication-title: IEEE Journal of Biomedical and Health Informatics
– start-page: 1247
  year: 2013
  ident: 10.1016/j.neunet.2024.106111_b1
  article-title: Deep canonical correlation analysis
SSID ssj0006843
Score 2.5276632
Snippet Multi-modal signals have become essential data for emotion recognition since they can represent emotions more comprehensively. However, in real-world...
SourceID proquest
pubmed
crossref
elsevier
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 106111
SubjectTerms Convolutional encoder
Emotion recognition
Incomplete data
Multi-modal signals
Transformer autoencoder
Title A novel transformer autoencoder for multi-modal emotion recognition with incomplete data
URI https://dx.doi.org/10.1016/j.neunet.2024.106111
https://www.ncbi.nlm.nih.gov/pubmed/38237444
https://www.proquest.com/docview/2929058734
Volume 172
WOSCitedRecordID wos001163939200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1879-2782
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0006843
  issn: 0893-6080
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3fa9swEBZpuoe97PePdFvRYG_GwbEV2XoMpWEdpQzWDbMXI8sXktLaJbVD2F_fkyUrHqV0G-zFGBFJju7L5Tv50x0hnyDhTOYMfFFEGKDEeeInijMfmXYeIwYUazdzfpzGZ2dJmoqvg8FJdxZmcxmXZbLdiuv_ampsQ2Pro7N_YW43KDbgPRodr2h2vP6R4WdeWW1AS8gtJYW1J5u60hkrdeIIrStsZYT-VVWggcAU8vGclKgyJ7s9nbdB5w6uwbMn2ByN1Sk9sGtpNOSOlh8twbiO9sZpfVZNq-SD8tfSwWhuNl5_LmW1XTmAzm3_U9tktyPCvoql3SPrzsnsREmtKxORzwNTs8n5XVOz544PN9sJF-MSGvwaYz3JWAeu1in_nh37mx5aj6y1sMjf0j2yH2IMFAzJ_uzkOP3i_pZ5YiSU3aN05yhbsd_due7jKffFIS0fOX9GnthAgs4MAJ6TAZQvyNOuSAe1PvslSWe0xQPt4YH28ECxifbwQC0eaA8PVOOB7vBANR5eke_z4_Ojz74tp-ErZCm1LxYhLziEAeQcWXfBEsm5yKd6k0Aynk_UNOIKeB7wQkEc51ERKKEWUx5FgEQxek2GZVXCW0IFsAmTaoHNjEkmpVRKQDRRSO9BQD4iUbd4mbK55nXJk8usExVeZGbJM73kmVnyEfFdr2uTa-WBz8edXTLLFw0PzBBKD_T82JkxQ3eq35HJEqrmJgsxXAimSRyxEXlj7OueRb8yjxljB_887zvyePebeU-G9bqBD-SR2tSrm_Uh2YvT5NDi9haerqj7
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+novel+transformer+autoencoder+for+multi-modal+emotion+recognition+with+incomplete+data&rft.jtitle=Neural+networks&rft.au=Cheng%2C+Cheng&rft.au=Liu%2C+Wenzhe&rft.au=Fan%2C+Zhaoxin&rft.au=Feng%2C+Lin&rft.date=2024-04-01&rft.pub=Elsevier+Ltd&rft.issn=0893-6080&rft.volume=172&rft_id=info:doi/10.1016%2Fj.neunet.2024.106111&rft.externalDocID=S089360802400025X
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0893-6080&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0893-6080&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0893-6080&client=summon