A novel transformer autoencoder for multi-modal emotion recognition with incomplete data
Multi-modal signals have become essential data for emotion recognition since they can represent emotions more comprehensively. However, in real-world environments, it is often impossible to acquire complete data on multi-modal signals, and the problem of missing modalities causes severe performance...
Saved in:
| Published in: | Neural networks Vol. 172; p. 106111 |
|---|---|
| Main Authors: | , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
United States
Elsevier Ltd
01.04.2024
|
| Subjects: | |
| ISSN: | 0893-6080, 1879-2782, 1879-2782 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Multi-modal signals have become essential data for emotion recognition since they can represent emotions more comprehensively. However, in real-world environments, it is often impossible to acquire complete data on multi-modal signals, and the problem of missing modalities causes severe performance degradation in emotion recognition. Therefore, this paper represents the first attempt to use a transformer-based architecture, aiming to fill the modality-incomplete data from partially observed data for multi-modal emotion recognition (MER). Concretely, this paper proposes a novel unified model called transformer autoencoder (TAE), comprising a modality-specific hybrid transformer encoder, an inter-modality transformer encoder, and a convolutional decoder. The modality-specific hybrid transformer encoder bridges a convolutional encoder and a transformer encoder, allowing the encoder to learn local and global context information within each particular modality. The inter-modality transformer encoder builds and aligns global cross-modal correlations and models long-range contextual information with different modalities. The convolutional decoder decodes the encoding features to produce more precise recognition. Besides, a regularization term is introduced into the convolutional decoder to force the decoder to fully leverage the complete and incomplete data for emotional recognition of missing data. 96.33%, 95.64%, and 92.69% accuracies are attained on the available data of the DEAP and SEED-IV datasets, and 93.25%, 92.23%, and 81.76% accuracies are obtained on the missing data. Particularly, the model acquires a 5.61% advantage with 70% missing data, demonstrating that the model outperforms some state-of-the-art approaches in incomplete multi-modal learning. |
|---|---|
| AbstractList | Multi-modal signals have become essential data for emotion recognition since they can represent emotions more comprehensively. However, in real-world environments, it is often impossible to acquire complete data on multi-modal signals, and the problem of missing modalities causes severe performance degradation in emotion recognition. Therefore, this paper represents the first attempt to use a transformer-based architecture, aiming to fill the modality-incomplete data from partially observed data for multi-modal emotion recognition (MER). Concretely, this paper proposes a novel unified model called transformer autoencoder (TAE), comprising a modality-specific hybrid transformer encoder, an inter-modality transformer encoder, and a convolutional decoder. The modality-specific hybrid transformer encoder bridges a convolutional encoder and a transformer encoder, allowing the encoder to learn local and global context information within each particular modality. The inter-modality transformer encoder builds and aligns global cross-modal correlations and models long-range contextual information with different modalities. The convolutional decoder decodes the encoding features to produce more precise recognition. Besides, a regularization term is introduced into the convolutional decoder to force the decoder to fully leverage the complete and incomplete data for emotional recognition of missing data. 96.33%, 95.64%, and 92.69% accuracies are attained on the available data of the DEAP and SEED-IV datasets, and 93.25%, 92.23%, and 81.76% accuracies are obtained on the missing data. Particularly, the model acquires a 5.61% advantage with 70% missing data, demonstrating that the model outperforms some state-of-the-art approaches in incomplete multi-modal learning.Multi-modal signals have become essential data for emotion recognition since they can represent emotions more comprehensively. However, in real-world environments, it is often impossible to acquire complete data on multi-modal signals, and the problem of missing modalities causes severe performance degradation in emotion recognition. Therefore, this paper represents the first attempt to use a transformer-based architecture, aiming to fill the modality-incomplete data from partially observed data for multi-modal emotion recognition (MER). Concretely, this paper proposes a novel unified model called transformer autoencoder (TAE), comprising a modality-specific hybrid transformer encoder, an inter-modality transformer encoder, and a convolutional decoder. The modality-specific hybrid transformer encoder bridges a convolutional encoder and a transformer encoder, allowing the encoder to learn local and global context information within each particular modality. The inter-modality transformer encoder builds and aligns global cross-modal correlations and models long-range contextual information with different modalities. The convolutional decoder decodes the encoding features to produce more precise recognition. Besides, a regularization term is introduced into the convolutional decoder to force the decoder to fully leverage the complete and incomplete data for emotional recognition of missing data. 96.33%, 95.64%, and 92.69% accuracies are attained on the available data of the DEAP and SEED-IV datasets, and 93.25%, 92.23%, and 81.76% accuracies are obtained on the missing data. Particularly, the model acquires a 5.61% advantage with 70% missing data, demonstrating that the model outperforms some state-of-the-art approaches in incomplete multi-modal learning. Multi-modal signals have become essential data for emotion recognition since they can represent emotions more comprehensively. However, in real-world environments, it is often impossible to acquire complete data on multi-modal signals, and the problem of missing modalities causes severe performance degradation in emotion recognition. Therefore, this paper represents the first attempt to use a transformer-based architecture, aiming to fill the modality-incomplete data from partially observed data for multi-modal emotion recognition (MER). Concretely, this paper proposes a novel unified model called transformer autoencoder (TAE), comprising a modality-specific hybrid transformer encoder, an inter-modality transformer encoder, and a convolutional decoder. The modality-specific hybrid transformer encoder bridges a convolutional encoder and a transformer encoder, allowing the encoder to learn local and global context information within each particular modality. The inter-modality transformer encoder builds and aligns global cross-modal correlations and models long-range contextual information with different modalities. The convolutional decoder decodes the encoding features to produce more precise recognition. Besides, a regularization term is introduced into the convolutional decoder to force the decoder to fully leverage the complete and incomplete data for emotional recognition of missing data. 96.33%, 95.64%, and 92.69% accuracies are attained on the available data of the DEAP and SEED-IV datasets, and 93.25%, 92.23%, and 81.76% accuracies are obtained on the missing data. Particularly, the model acquires a 5.61% advantage with 70% missing data, demonstrating that the model outperforms some state-of-the-art approaches in incomplete multi-modal learning. |
| ArticleNumber | 106111 |
| Author | Cheng, Cheng Jia, Ziyu Liu, Wenzhe Fan, Zhaoxin Feng, Lin |
| Author_xml | – sequence: 1 givenname: Cheng orcidid: 0000-0002-2138-6286 surname: Cheng fullname: Cheng, Cheng organization: Department of Computer Science and Technology, Dalian University of Technology, Dalian, China – sequence: 2 givenname: Wenzhe surname: Liu fullname: Liu, Wenzhe organization: School of Information Engineering, Huzhou University, Huzhou, China – sequence: 3 givenname: Zhaoxin surname: Fan fullname: Fan, Zhaoxin organization: Renmin University of China, Psyche AI Inc, Beijing, China – sequence: 4 givenname: Lin surname: Feng fullname: Feng, Lin email: fenglin@dlut.edu.cn organization: Department of Computer Science and Technology, Dalian University of Technology, Dalian, China – sequence: 5 givenname: Ziyu surname: Jia fullname: Jia, Ziyu organization: Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/38237444$$D View this record in MEDLINE/PubMed |
| BookMark | eNqFkE1P3DAQhq0K1F1o_wFCOfaSrb_iOD1UQghoJSQuIHGzHHvSepXYi-1Q9d_XS9gLh3Ka0eh9XmmeE3TkgweEzgjeEEzE1-3Gw-whbyimvJwEIeQDWhPZdjVtJT1Cayw7Vgss8QqdpLTFGAvJ2Ue0YpKylnO-Ro8XlQ_PMFY5ap-GECeIlZ5zAG-CLXs5VdM8ZldPweqxgilkF3wVwYRf3r3sf1z-XbkCTLsRMlRWZ_0JHQ96TPD5dZ6ih-ur-8sf9e3dzc_Li9vaMEFz3Q1UWAEUQy8aSi2XWoiub7CkjeaiJ6ZhwoDosbAG2rZnFpvODI1gDCTF7BR9WXp3MTzNkLKaXDIwjtpDmJOiHe1wI1vGS_T8NTr3E1i1i27S8a862CiBb0vAxJBShEEZl_X-xWLHjYpgtVevtmpRr_bq1aK-wPwNfOh_B_u-YFAkPTuIKhlX5IN1RXFWNrj_F_wDcYegEg |
| CitedBy_id | crossref_primary_10_3390_s25030761 crossref_primary_10_1016_j_eswa_2024_125822 crossref_primary_10_3389_fncir_2025_1574877 crossref_primary_10_1007_s11571_024_10167_0 crossref_primary_10_1097_DM_2024_00001 crossref_primary_10_3390_sym16040471 crossref_primary_10_1016_j_neunet_2024_106624 crossref_primary_10_3389_frobt_2025_1462243 crossref_primary_10_70401_ec_2025_0010 crossref_primary_10_1016_j_neunet_2024_106837 crossref_primary_10_1007_s10115_025_02354_0 crossref_primary_10_1016_j_neucom_2024_129205 crossref_primary_10_1007_s11571_025_10277_3 crossref_primary_10_1016_j_knosys_2025_114182 crossref_primary_10_1016_j_neunet_2025_107267 crossref_primary_10_1021_acs_jpclett_5c02109 crossref_primary_10_1016_j_neunet_2025_107596 crossref_primary_10_1016_j_neunet_2024_106784 crossref_primary_10_1016_j_eswa_2024_125089 |
| Cites_doi | 10.1109/CVPRW56347.2022.00278 10.1109/TPAMI.2020.3037734 10.3389/fnhum.2023.1169949 10.1109/LSP.2022.3179946 10.1109/CVPRW56347.2022.00511 10.1109/TCDS.2021.3071170 10.1016/j.neunet.2019.10.010 10.1109/T-AFFC.2011.15 10.1609/aaai.v35i11.17231 10.1016/j.neuroimage.2012.03.059 10.32604/iasc.2023.025437 10.21437/Interspeech.2020-1190 10.1109/T-AFFC.2011.37 10.1109/CVPR52729.2023.01524 10.1137/080738970 10.1109/JSTARS.2017.2714338 10.1109/ACCESS.2021.3092735 10.1109/CVPR46437.2021.01102 10.1145/3474085.3475585 10.1145/3551626.3564965 10.1088/1741-2552/ac49a7 10.1609/aaai.v35i3.16330 10.1109/TCYB.2018.2797176 10.1109/CVPR52688.2022.02039 10.1109/CVPR46437.2021.00561 10.1016/j.specom.2022.02.006 10.1145/2487575.2487594 |
| ContentType | Journal Article |
| Copyright | 2024 Elsevier Ltd Copyright © 2024 Elsevier Ltd. All rights reserved. |
| Copyright_xml | – notice: 2024 Elsevier Ltd – notice: Copyright © 2024 Elsevier Ltd. All rights reserved. |
| DBID | AAYXX CITATION NPM 7X8 |
| DOI | 10.1016/j.neunet.2024.106111 |
| DatabaseName | CrossRef PubMed MEDLINE - Academic |
| DatabaseTitle | CrossRef PubMed MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic PubMed |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1879-2782 |
| ExternalDocumentID | 38237444 10_1016_j_neunet_2024_106111 S089360802400025X |
| Genre | Journal Article |
| GroupedDBID | --- --K --M -~X .DC .~1 0R~ 123 186 1B1 1RT 1~. 1~5 29N 4.4 457 4G. 53G 5RE 5VS 6TJ 7-5 71M 8P~ 9JM 9JN AABNK AACTN AAEDT AAEDW AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXKI AAXLA AAXUO AAYFN ABAOU ABBOA ABCQJ ABDPE ABEFU ABFNM ABFRF ABHFT ABIVO ABJNI ABLJU ABMAC ABXDB ACDAQ ACGFO ACGFS ACIUM ACNNM ACRLP ACZNC ADBBV ADEZE ADGUI ADJOM ADMUD ADRHT AEBSH AECPX AEFWE AEKER AENEX AFJKZ AFKWA AFTJW AFXIZ AGHFR AGUBO AGWIK AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJOXV AKRWK ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ARUGR ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EJD EO8 EO9 EP2 EP3 F0J F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-2 G-Q GBLVA GBOLZ HLZ HMQ HVGLF HZ~ IHE J1W JJJVA K-O KOM KZ1 LG9 LMP M2V M41 MHUIS MO0 MOBAO MVM N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SBC SCC SDF SDG SDP SES SEW SNS SPC SPCBC SSN SST SSV SSW SSZ T5K TAE UAP UNMZH VOH WUQ XPP ZMT ~G- 9DU AATTM AAYWO AAYXX ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFPUW AGQPQ AIGII AIIUN AKBMS AKYEP ANKPU APXCP CITATION EFKBS EFLBG ~HD AGCQF AGRNS BNPGV NPM SSH 7X8 |
| ID | FETCH-LOGICAL-c362t-9f26d6e20eb6522d48a669b50825a46b1c536ce6b06dce77b3d0c9cf5633e8203 |
| ISICitedReferencesCount | 23 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001163939200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0893-6080 1879-2782 |
| IngestDate | Thu Oct 02 12:07:49 EDT 2025 Mon Jul 21 05:55:30 EDT 2025 Sat Nov 29 05:33:06 EST 2025 Tue Nov 18 21:49:07 EST 2025 Sat Nov 09 15:59:23 EST 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Emotion recognition Incomplete data Multi-modal signals Convolutional encoder Transformer autoencoder |
| Language | English |
| License | Copyright © 2024 Elsevier Ltd. All rights reserved. |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c362t-9f26d6e20eb6522d48a669b50825a46b1c536ce6b06dce77b3d0c9cf5633e8203 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ORCID | 0000-0002-2138-6286 |
| PMID | 38237444 |
| PQID | 2929058734 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_2929058734 pubmed_primary_38237444 crossref_citationtrail_10_1016_j_neunet_2024_106111 crossref_primary_10_1016_j_neunet_2024_106111 elsevier_sciencedirect_doi_10_1016_j_neunet_2024_106111 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-04-01 |
| PublicationDateYYYYMMDD | 2024-04-01 |
| PublicationDate_xml | – month: 04 year: 2024 text: 2024-04-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | Neural networks |
| PublicationTitleAlternate | Neural Netw |
| PublicationYear | 2024 |
| Publisher | Elsevier Ltd |
| Publisher_xml | – name: Elsevier Ltd |
| References | Liu, Qiu, Zheng, Lu (b23) 2022; 14 (pp. 21064–21075). Praveen, R. G., de Melo, W. C., Ullah, N., Aslam, H., Zeeshan, O., Denorme, T., et al. (2022a). A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition. In Liu, Zheng, Lu (b24) 2016 Kang, Zhao, Peng, Zhu, Zhou, Peng (b11) 2020; 122 Lopez-Paz, Sra, Smola, Ghahramani, Schölkopf (b25) 2014 (b33) 2022; 139 Lian, Liu, Tao (b18) 2022 Wu, Zheng, Li, Lu (b42) 2022; 19 (pp. 4400–4407). (pp. 1–5). Lee, Lin, Hsu, Hsu (b16) 2019 Lin, Y., Gou, Y., Liu, Z., Li, B., Lv, J., & Peng, X. (2021). COMPLETER: Incomplete multi-view clustering via contrastive prediction. In Parthasarathy, Sundaram (b34) 2020 Cheng, Zhang, Liu, Liu, Feng (b4) 2022 Liu, Qiu, Zheng, Lu (b21) 2019 (pp. 2486–2495). Soleymani, Pantic, Pun (b37) 2011; 3 Yuan, Wang, Thompson, Narayan, Ye, Initiative (b46) 2012; 61 Andrew, Arora, Bilmes, Livescu (b1) 2013 Mocanu, Tapu (b32) 2022 Lee, Han, Ko (b15) 2021; 9 Yuan, Z., Li, W., Xu, H., & Yu, W. (2021). Transformer-based feature reconstruction network for robust multimodal sentiment analysis. In Ma, Huang, Zhang (b27) 2021 Ma, Xu, Huang, Zhang (b29) 2021 Koelstra, Muhl, Soleymani, Lee, Yazdani, Ebrahimi (b13) 2011; 3 (pp. 4652–4661). Makiuchi, Uto, Shinoda (b30) 2021 Luo, Xu, Lai (b26) 2023 Xu, Li, Ren, Peng, Mo, Shi (b44) 2022 Cheng, Yu, Zhang, Feng (b3) 2023 Fan, Chen, Guo, Zhang, Kuang (b6) 2017; 10 Wang, Liu, Zhang, Zhang, Guo (b40) 2022 (pp. 4243–4247). Chudasama, V., Kar, P., Gudmalwar, A., Shah, N., Wasnik, P., & Onoe, N. (2022). M2FNet: Multi-Modal Fusion Network for Emotion Recognition in Conversation. In Cai, Candès, Shen (b2) 2010; 20 Zhang, Cui, Han, Zhou, Fu, Hu (b47) 2020 Zheng, Liu, Lu, Lu, Cichocki (b48) 2018; 49 Xiang, S., Yuan, L., Fan, W., Wang, Y., Thompson, P. M., & Ye, J. (2013). Multi-source learning with block-wise missing data for Alzheimer’s disease prediction. In Gupta, V., Mittal, T., Mathur, P., Mishra, V., Maheshwari, M., Bera, A., et al. (2022). 3MASSIV: Multilingual, multimodal and multi-aspect dataset of social media short videos. In Liu, Li, Tang, Xia, Xiong, Liu (b20) 2020; 43 (pp. 5661–5671). Wang, H., Chen, Y., Ma, C., Avery, J., Hull, L., & Carneiro, G. (2023). Multi-Modal Learning With Missing Modality via Shared-Specific Feature Modelling. In Hotelling (b9) 1992 Liu, Qiu, Zheng, Lu (b22) 2021; 14 Mittal, T., Mathur, P., Bera, A., & Manocha, D. (2021). Affect2mm: Affective analysis of multimedia content using emotion causality. In Kavitha, RajivKannan (b12) 2023; 35 (pp. 15878–15887). (pp. 185–193). Gao, Fu, Ouyang, Wang (b7) 2022; 29 (pp. 10273–10281). Ma, M., Ren, J., Zhao, L., Tulyakov, S., Wu, C., & Peng, X. (2021). Smil: Multimodal learning with severely missing modality. In Krishna, D., & Patil, A. (2020). Multimodal Emotion Recognition Using Cross-Modal Attention and 1D Convolutional Neural Networks. In Li, Pan, Huang, Pan, Wang (b17) 2023; 17 Praveen, R. G., de Melo, W. C., Ullah, N., Aslam, H., Zeeshan, O., Denorme, T., et al. (2022b). A joint cross-attention model for audio-visual fusion in dimensional emotion recognition. In Wen, J., Zhang, Z., Zhang, Z., Zhu, L., Fei, L., Zhang, B., et al. (2021). Unified tensor framework for incomplete multi-view clustering and missing-view inferring. In Wang, Ding, Tao, Gao, Fu (b39) 2018 John, V., & Kawanishi, Y. (2022). A Multimodal Sensor Fusion Framework Robust to Missing Modalities for Person Recognition. In (pp. 11174–11183). (pp. 2302–2310). Lian (10.1016/j.neunet.2024.106111_b18) 2022 Zhang (10.1016/j.neunet.2024.106111_b47) 2020 Lee (10.1016/j.neunet.2024.106111_b15) 2021; 9 Cai (10.1016/j.neunet.2024.106111_b2) 2010; 20 Cheng (10.1016/j.neunet.2024.106111_b4) 2022 10.1016/j.neunet.2024.106111_b28 Lee (10.1016/j.neunet.2024.106111_b16) 2019 Liu (10.1016/j.neunet.2024.106111_b20) 2020; 43 Xu (10.1016/j.neunet.2024.106111_b44) 2022 10.1016/j.neunet.2024.106111_b35 10.1016/j.neunet.2024.106111_b14 Mocanu (10.1016/j.neunet.2024.106111_b32) 2022 (10.1016/j.neunet.2024.106111_b33) 2022; 139 10.1016/j.neunet.2024.106111_b36 Parthasarathy (10.1016/j.neunet.2024.106111_b34) 2020 Ma (10.1016/j.neunet.2024.106111_b29) 2021 Cheng (10.1016/j.neunet.2024.106111_b3) 2023 10.1016/j.neunet.2024.106111_b31 10.1016/j.neunet.2024.106111_b10 Luo (10.1016/j.neunet.2024.106111_b26) 2023 Wang (10.1016/j.neunet.2024.106111_b40) 2022 Wu (10.1016/j.neunet.2024.106111_b42) 2022; 19 Wang (10.1016/j.neunet.2024.106111_b39) 2018 Liu (10.1016/j.neunet.2024.106111_b22) 2021; 14 Kavitha (10.1016/j.neunet.2024.106111_b12) 2023; 35 Andrew (10.1016/j.neunet.2024.106111_b1) 2013 Hotelling (10.1016/j.neunet.2024.106111_b9) 1992 Koelstra (10.1016/j.neunet.2024.106111_b13) 2011; 3 10.1016/j.neunet.2024.106111_b19 Ma (10.1016/j.neunet.2024.106111_b27) 2021 10.1016/j.neunet.2024.106111_b5 Li (10.1016/j.neunet.2024.106111_b17) 2023; 17 Makiuchi (10.1016/j.neunet.2024.106111_b30) 2021 10.1016/j.neunet.2024.106111_b38 Yuan (10.1016/j.neunet.2024.106111_b46) 2012; 61 10.1016/j.neunet.2024.106111_b8 Liu (10.1016/j.neunet.2024.106111_b21) 2019 Zheng (10.1016/j.neunet.2024.106111_b48) 2018; 49 Liu (10.1016/j.neunet.2024.106111_b23) 2022; 14 Liu (10.1016/j.neunet.2024.106111_b24) 2016 10.1016/j.neunet.2024.106111_b45 Fan (10.1016/j.neunet.2024.106111_b6) 2017; 10 Lopez-Paz (10.1016/j.neunet.2024.106111_b25) 2014 Soleymani (10.1016/j.neunet.2024.106111_b37) 2011; 3 10.1016/j.neunet.2024.106111_b41 10.1016/j.neunet.2024.106111_b43 Kang (10.1016/j.neunet.2024.106111_b11) 2020; 122 Gao (10.1016/j.neunet.2024.106111_b7) 2022; 29 |
| References_xml | – reference: Xiang, S., Yuan, L., Fan, W., Wang, Y., Thompson, P. M., & Ye, J. (2013). Multi-source learning with block-wise missing data for Alzheimer’s disease prediction. In – volume: 43 start-page: 2634 year: 2020 end-page: 2646 ident: b20 article-title: Efficient and effective regularized incomplete multi-view clustering publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence – reference: (pp. 10273–10281). – reference: (pp. 15878–15887). – reference: (pp. 2302–2310). – year: 2022 ident: b18 article-title: Smin: Semi-supervised multi-modal interaction network for conversational emotion recognition publication-title: IEEE Transactions on Affective Computing – reference: Mittal, T., Mathur, P., Bera, A., & Manocha, D. (2021). Affect2mm: Affective analysis of multimedia content using emotion causality. In – volume: 17 year: 2023 ident: b17 article-title: STGATE: Spatial-temporal graph attention network with a transformer encoder for EEG-based emotion recognition publication-title: Frontiers in Human Neuroscience – start-page: 162 year: 1992 end-page: 190 ident: b9 article-title: Relations between two sets of variates publication-title: Breakthroughs in statistics: methodology and distribution – reference: Krishna, D., & Patil, A. (2020). Multimodal Emotion Recognition Using Cross-Modal Attention and 1D Convolutional Neural Networks. In – year: 2022 ident: b4 article-title: Multi-domain encoding of spatiotemporal dynamics in EEG for emotion recognition publication-title: IEEE Journal of Biomedical and Health Informatics – volume: 10 start-page: 4589 year: 2017 end-page: 4604 ident: b6 article-title: Hyperspectral image restoration using low-rank tensor recovery publication-title: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing – reference: Chudasama, V., Kar, P., Gudmalwar, A., Shah, N., Wasnik, P., & Onoe, N. (2022). M2FNet: Multi-Modal Fusion Network for Emotion Recognition in Conversation. In – reference: (pp. 1–5). – reference: (pp. 4652–4661). – volume: 20 start-page: 1956 year: 2010 end-page: 1982 ident: b2 article-title: A singular value thresholding algorithm for matrix completion publication-title: SIAM Journal on optimization – volume: 139 start-page: 1 year: 2022 end-page: 9 ident: b33 article-title: Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework publication-title: Speech Communication – reference: Yuan, Z., Li, W., Xu, H., & Yu, W. (2021). Transformer-based feature reconstruction network for robust multimodal sentiment analysis. In – volume: 14 start-page: 715 year: 2021 end-page: 729 ident: b22 article-title: Comparing recognition performance and robustness of multimodal deep learning models for multimodal emotion recognition publication-title: IEEE Transactions on Cognitive and Developmental Systems – reference: (pp. 4400–4407). – reference: Gupta, V., Mittal, T., Mathur, P., Mishra, V., Maheshwari, M., Bera, A., et al. (2022). 3MASSIV: Multilingual, multimodal and multi-aspect dataset of social media short videos. In – year: 2020 ident: b47 article-title: Deep partial multi-view learning publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence – start-page: 1247 year: 2013 end-page: 1255 ident: b1 article-title: Deep canonical correlation analysis publication-title: International conference on machine learning – start-page: 1359 year: 2014 end-page: 1367 ident: b25 article-title: Randomized nonlinear component analysis publication-title: International conference on machine learning – reference: (pp. 4243–4247). – start-page: 400 year: 2020 end-page: 404 ident: b34 article-title: Training strategies to handle missing modalities for audio-visual expression recognition publication-title: Companion publication of the 2020 international conference on multimodal interaction – volume: 19 year: 2022 ident: b42 article-title: Investigating EEG-based functional connectivity patterns for multimodal emotion recognition publication-title: Journal of Neural Engineering – start-page: 610 year: 2022 end-page: 617 ident: b40 article-title: EEG-based emotion recognition using partial directed coherence dense graph propagation publication-title: 2022 14th international conference on measuring technology and mechatronics automation – year: 2022 ident: b44 article-title: Deep incomplete multi-view clustering via mining cluster complementarity – volume: 3 start-page: 18 year: 2011 end-page: 31 ident: b13 article-title: Deap: A database for emotion analysis; using physiological signals publication-title: IEEE Transactions on Affective Computing – start-page: 1290 year: 2018 end-page: 1295 ident: b39 article-title: Partial multi-view clustering via consistent GAN publication-title: 2018 IEEE international conference on data mining – volume: 9 start-page: 94557 year: 2021 end-page: 94572 ident: b15 article-title: Multimodal emotion recognition fusion analysis adapting BERT with heterogeneous feature unification publication-title: IEEE Access – volume: 3 start-page: 211 year: 2011 end-page: 223 ident: b37 article-title: Multimodal emotion recognition in response to videos publication-title: IEEE Transactions on Affective Computing – year: 2023 ident: b3 article-title: Hybrid network using dynamic graph convolution and temporal self-attention for EEG-based emotion recognition publication-title: IEEE Transactions on Neural Networks and Learning Systems – volume: 35 year: 2023 ident: b12 article-title: Hybrid convolutional neural network and long short-term memory approach for facial expression recognition publication-title: Intelligent Automation & Soft Computing – reference: (pp. 21064–21075). – reference: Lin, Y., Gou, Y., Liu, Z., Li, B., Lv, J., & Peng, X. (2021). COMPLETER: Incomplete multi-view clustering via contrastive prediction. In – volume: 122 start-page: 279 year: 2020 end-page: 288 ident: b11 article-title: Partition level multiview subspace clustering publication-title: Neural Networks – reference: Praveen, R. G., de Melo, W. C., Ullah, N., Aslam, H., Zeeshan, O., Denorme, T., et al. (2022b). A joint cross-attention model for audio-visual fusion in dimensional emotion recognition. In – reference: (pp. 185–193). – reference: (pp. 11174–11183). – volume: 61 start-page: 622 year: 2012 end-page: 632 ident: b46 article-title: Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data publication-title: NeuroImage – reference: Wang, H., Chen, Y., Ma, C., Avery, J., Hull, L., & Carneiro, G. (2023). Multi-Modal Learning With Missing Modality via Shared-Specific Feature Modelling. In – volume: 29 start-page: 1574 year: 2022 end-page: 1578 ident: b7 article-title: EEG-GCN: Spatio-temporal and self-adaptive graph convolutional networks for single and multi-view EEG-based emotion recognition publication-title: IEEE Signal Processing Letters – reference: (pp. 5661–5671). – reference: Ma, M., Ren, J., Zhao, L., Tulyakov, S., Wu, C., & Peng, X. (2021). Smil: Multimodal learning with severely missing modality. In – start-page: 411 year: 2023 end-page: 422 ident: b26 article-title: Multimodal reconstruct and align net for missing modality problem in sentiment analysis publication-title: International conference on multimedia modeling – start-page: 1 year: 2021 end-page: 6 ident: b27 article-title: An efficient approach for audio-visual emotion recognition with missing labels and missing modalities publication-title: 2021 IEEE international conference on multimedia and expo – start-page: 521 year: 2016 end-page: 529 ident: b24 article-title: Emotion recognition using multimodal deep learning publication-title: Neural information processing: 23rd international conference, ICONIP 2016, Kyoto, Japan, October 16–21, 2016, proceedings, Part II 23 – year: 2021 ident: b29 article-title: Maximum likelihood estimation for multimodal learning with missing modality – start-page: 350 year: 2021 end-page: 357 ident: b30 article-title: Multimodal emotion recognition with high-level speech and text features publication-title: 2021 IEEE automatic speech recognition and understanding workshop – start-page: 1 year: 2022 end-page: 5 ident: b32 article-title: Audio-video fusion with double attention for multimodal emotion recognition publication-title: 2022 IEEE 14th image, video, and multidimensional signal processing workshop – reference: Praveen, R. G., de Melo, W. C., Ullah, N., Aslam, H., Zeeshan, O., Denorme, T., et al. (2022a). A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition. In – reference: (pp. 2486–2495). – reference: Wen, J., Zhang, Z., Zhang, Z., Zhu, L., Fei, L., Zhang, B., et al. (2021). Unified tensor framework for incomplete multi-view clustering and missing-view inferring. In – year: 2019 ident: b21 article-title: Multimodal emotion recognition using deep canonical correlation analysis – volume: 49 start-page: 1110 year: 2018 end-page: 1122 ident: b48 article-title: Emotionmeter: A multimodal framework for recognizing human emotions publication-title: IEEE Transactions on Cybernetics – reference: John, V., & Kawanishi, Y. (2022). A Multimodal Sensor Fusion Framework Robust to Missing Modalities for Person Recognition. In – volume: 14 start-page: 715 year: 2022 end-page: 729 ident: b23 article-title: Comparing recognition performance and robustness of multimodal deep learning models for multimodal emotion recognition publication-title: IEEE Transactions on Cognitive and Developmental Systems – start-page: 3956 year: 2019 end-page: 3960 ident: b16 article-title: Audio feature generation for missing modality problem in video action recognition publication-title: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing – year: 2023 ident: 10.1016/j.neunet.2024.106111_b3 article-title: Hybrid network using dynamic graph convolution and temporal self-attention for EEG-based emotion recognition publication-title: IEEE Transactions on Neural Networks and Learning Systems – ident: 10.1016/j.neunet.2024.106111_b35 doi: 10.1109/CVPRW56347.2022.00278 – year: 2020 ident: 10.1016/j.neunet.2024.106111_b47 article-title: Deep partial multi-view learning publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence doi: 10.1109/TPAMI.2020.3037734 – volume: 17 year: 2023 ident: 10.1016/j.neunet.2024.106111_b17 article-title: STGATE: Spatial-temporal graph attention network with a transformer encoder for EEG-based emotion recognition publication-title: Frontiers in Human Neuroscience doi: 10.3389/fnhum.2023.1169949 – year: 2022 ident: 10.1016/j.neunet.2024.106111_b18 article-title: Smin: Semi-supervised multi-modal interaction network for conversational emotion recognition publication-title: IEEE Transactions on Affective Computing – start-page: 3956 year: 2019 ident: 10.1016/j.neunet.2024.106111_b16 article-title: Audio feature generation for missing modality problem in video action recognition – ident: 10.1016/j.neunet.2024.106111_b36 doi: 10.1109/CVPRW56347.2022.00278 – volume: 29 start-page: 1574 year: 2022 ident: 10.1016/j.neunet.2024.106111_b7 article-title: EEG-GCN: Spatio-temporal and self-adaptive graph convolutional networks for single and multi-view EEG-based emotion recognition publication-title: IEEE Signal Processing Letters doi: 10.1109/LSP.2022.3179946 – ident: 10.1016/j.neunet.2024.106111_b5 doi: 10.1109/CVPRW56347.2022.00511 – start-page: 1359 year: 2014 ident: 10.1016/j.neunet.2024.106111_b25 article-title: Randomized nonlinear component analysis – start-page: 1 year: 2021 ident: 10.1016/j.neunet.2024.106111_b27 article-title: An efficient approach for audio-visual emotion recognition with missing labels and missing modalities – start-page: 162 year: 1992 ident: 10.1016/j.neunet.2024.106111_b9 article-title: Relations between two sets of variates – volume: 14 start-page: 715 issue: 2 year: 2021 ident: 10.1016/j.neunet.2024.106111_b22 article-title: Comparing recognition performance and robustness of multimodal deep learning models for multimodal emotion recognition publication-title: IEEE Transactions on Cognitive and Developmental Systems doi: 10.1109/TCDS.2021.3071170 – volume: 122 start-page: 279 year: 2020 ident: 10.1016/j.neunet.2024.106111_b11 article-title: Partition level multiview subspace clustering publication-title: Neural Networks doi: 10.1016/j.neunet.2019.10.010 – volume: 3 start-page: 18 issue: 1 year: 2011 ident: 10.1016/j.neunet.2024.106111_b13 article-title: Deap: A database for emotion analysis; using physiological signals publication-title: IEEE Transactions on Affective Computing doi: 10.1109/T-AFFC.2011.15 – ident: 10.1016/j.neunet.2024.106111_b41 doi: 10.1609/aaai.v35i11.17231 – volume: 43 start-page: 2634 issue: 8 year: 2020 ident: 10.1016/j.neunet.2024.106111_b20 article-title: Efficient and effective regularized incomplete multi-view clustering publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence – volume: 61 start-page: 622 issue: 3 year: 2012 ident: 10.1016/j.neunet.2024.106111_b46 article-title: Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data publication-title: NeuroImage doi: 10.1016/j.neuroimage.2012.03.059 – volume: 35 issue: 1 year: 2023 ident: 10.1016/j.neunet.2024.106111_b12 article-title: Hybrid convolutional neural network and long short-term memory approach for facial expression recognition publication-title: Intelligent Automation & Soft Computing doi: 10.32604/iasc.2023.025437 – ident: 10.1016/j.neunet.2024.106111_b14 doi: 10.21437/Interspeech.2020-1190 – volume: 3 start-page: 211 issue: 2 year: 2011 ident: 10.1016/j.neunet.2024.106111_b37 article-title: Multimodal emotion recognition in response to videos publication-title: IEEE Transactions on Affective Computing doi: 10.1109/T-AFFC.2011.37 – volume: 14 start-page: 715 issue: 2 year: 2022 ident: 10.1016/j.neunet.2024.106111_b23 article-title: Comparing recognition performance and robustness of multimodal deep learning models for multimodal emotion recognition publication-title: IEEE Transactions on Cognitive and Developmental Systems doi: 10.1109/TCDS.2021.3071170 – year: 2021 ident: 10.1016/j.neunet.2024.106111_b29 – ident: 10.1016/j.neunet.2024.106111_b38 doi: 10.1109/CVPR52729.2023.01524 – start-page: 400 year: 2020 ident: 10.1016/j.neunet.2024.106111_b34 article-title: Training strategies to handle missing modalities for audio-visual expression recognition – volume: 20 start-page: 1956 issue: 4 year: 2010 ident: 10.1016/j.neunet.2024.106111_b2 article-title: A singular value thresholding algorithm for matrix completion publication-title: SIAM Journal on optimization doi: 10.1137/080738970 – volume: 10 start-page: 4589 issue: 10 year: 2017 ident: 10.1016/j.neunet.2024.106111_b6 article-title: Hyperspectral image restoration using low-rank tensor recovery publication-title: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing doi: 10.1109/JSTARS.2017.2714338 – volume: 9 start-page: 94557 year: 2021 ident: 10.1016/j.neunet.2024.106111_b15 article-title: Multimodal emotion recognition fusion analysis adapting BERT with heterogeneous feature unification publication-title: IEEE Access doi: 10.1109/ACCESS.2021.3092735 – start-page: 411 year: 2023 ident: 10.1016/j.neunet.2024.106111_b26 article-title: Multimodal reconstruct and align net for missing modality problem in sentiment analysis – start-page: 1290 year: 2018 ident: 10.1016/j.neunet.2024.106111_b39 article-title: Partial multi-view clustering via consistent GAN – start-page: 350 year: 2021 ident: 10.1016/j.neunet.2024.106111_b30 article-title: Multimodal emotion recognition with high-level speech and text features – start-page: 1 year: 2022 ident: 10.1016/j.neunet.2024.106111_b32 article-title: Audio-video fusion with double attention for multimodal emotion recognition – start-page: 521 year: 2016 ident: 10.1016/j.neunet.2024.106111_b24 article-title: Emotion recognition using multimodal deep learning – start-page: 610 year: 2022 ident: 10.1016/j.neunet.2024.106111_b40 article-title: EEG-based emotion recognition using partial directed coherence dense graph propagation – ident: 10.1016/j.neunet.2024.106111_b19 doi: 10.1109/CVPR46437.2021.01102 – ident: 10.1016/j.neunet.2024.106111_b45 doi: 10.1145/3474085.3475585 – ident: 10.1016/j.neunet.2024.106111_b10 doi: 10.1145/3551626.3564965 – volume: 19 issue: 1 year: 2022 ident: 10.1016/j.neunet.2024.106111_b42 article-title: Investigating EEG-based functional connectivity patterns for multimodal emotion recognition publication-title: Journal of Neural Engineering doi: 10.1088/1741-2552/ac49a7 – year: 2019 ident: 10.1016/j.neunet.2024.106111_b21 – ident: 10.1016/j.neunet.2024.106111_b28 doi: 10.1609/aaai.v35i3.16330 – volume: 49 start-page: 1110 issue: 3 year: 2018 ident: 10.1016/j.neunet.2024.106111_b48 article-title: Emotionmeter: A multimodal framework for recognizing human emotions publication-title: IEEE Transactions on Cybernetics doi: 10.1109/TCYB.2018.2797176 – ident: 10.1016/j.neunet.2024.106111_b8 doi: 10.1109/CVPR52688.2022.02039 – year: 2022 ident: 10.1016/j.neunet.2024.106111_b44 – ident: 10.1016/j.neunet.2024.106111_b31 doi: 10.1109/CVPR46437.2021.00561 – volume: 139 start-page: 1 year: 2022 ident: 10.1016/j.neunet.2024.106111_b33 article-title: Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework publication-title: Speech Communication doi: 10.1016/j.specom.2022.02.006 – ident: 10.1016/j.neunet.2024.106111_b43 doi: 10.1145/2487575.2487594 – year: 2022 ident: 10.1016/j.neunet.2024.106111_b4 article-title: Multi-domain encoding of spatiotemporal dynamics in EEG for emotion recognition publication-title: IEEE Journal of Biomedical and Health Informatics – start-page: 1247 year: 2013 ident: 10.1016/j.neunet.2024.106111_b1 article-title: Deep canonical correlation analysis |
| SSID | ssj0006843 |
| Score | 2.5276632 |
| Snippet | Multi-modal signals have become essential data for emotion recognition since they can represent emotions more comprehensively. However, in real-world... |
| SourceID | proquest pubmed crossref elsevier |
| SourceType | Aggregation Database Index Database Enrichment Source Publisher |
| StartPage | 106111 |
| SubjectTerms | Convolutional encoder Emotion recognition Incomplete data Multi-modal signals Transformer autoencoder |
| Title | A novel transformer autoencoder for multi-modal emotion recognition with incomplete data |
| URI | https://dx.doi.org/10.1016/j.neunet.2024.106111 https://www.ncbi.nlm.nih.gov/pubmed/38237444 https://www.proquest.com/docview/2929058734 |
| Volume | 172 |
| WOSCitedRecordID | wos001163939200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1879-2782 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0006843 issn: 0893-6080 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3fa9swEBZpuoe97PePdFvRYG_GwbEV2XoMpWEdpQzWDbMXI8sXktLaJbVD2F_fkyUrHqV0G-zFGBFJju7L5Tv50x0hnyDhTOYMfFFEGKDEeeInijMfmXYeIwYUazdzfpzGZ2dJmoqvg8FJdxZmcxmXZbLdiuv_ampsQ2Pro7N_YW43KDbgPRodr2h2vP6R4WdeWW1AS8gtJYW1J5u60hkrdeIIrStsZYT-VVWggcAU8vGclKgyJ7s9nbdB5w6uwbMn2ByN1Sk9sGtpNOSOlh8twbiO9sZpfVZNq-SD8tfSwWhuNl5_LmW1XTmAzm3_U9tktyPCvoql3SPrzsnsREmtKxORzwNTs8n5XVOz544PN9sJF-MSGvwaYz3JWAeu1in_nh37mx5aj6y1sMjf0j2yH2IMFAzJ_uzkOP3i_pZ5YiSU3aN05yhbsd_due7jKffFIS0fOX9GnthAgs4MAJ6TAZQvyNOuSAe1PvslSWe0xQPt4YH28ECxifbwQC0eaA8PVOOB7vBANR5eke_z4_Ojz74tp-ErZCm1LxYhLziEAeQcWXfBEsm5yKd6k0Aynk_UNOIKeB7wQkEc51ERKKEWUx5FgEQxek2GZVXCW0IFsAmTaoHNjEkmpVRKQDRRSO9BQD4iUbd4mbK55nXJk8usExVeZGbJM73kmVnyEfFdr2uTa-WBz8edXTLLFw0PzBBKD_T82JkxQ3eq35HJEqrmJgsxXAimSRyxEXlj7OueRb8yjxljB_887zvyePebeU-G9bqBD-SR2tSrm_Uh2YvT5NDi9haerqj7 |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+novel+transformer+autoencoder+for+multi-modal+emotion+recognition+with+incomplete+data&rft.jtitle=Neural+networks&rft.au=Cheng%2C+Cheng&rft.au=Liu%2C+Wenzhe&rft.au=Fan%2C+Zhaoxin&rft.au=Feng%2C+Lin&rft.date=2024-04-01&rft.pub=Elsevier+Ltd&rft.issn=0893-6080&rft.volume=172&rft_id=info:doi/10.1016%2Fj.neunet.2024.106111&rft.externalDocID=S089360802400025X |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0893-6080&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0893-6080&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0893-6080&client=summon |