Modality-uncertainty-aware knowledge distillation framework for multimodal sentiment analysis

Multimodal sentiment analysis (MSA) has become increasingly important for understanding human emotions, with applications in areas such as human-computer interaction, social media analysis, and emotion recognition. MSA leverages multimodal data, including text, audio, and visual inputs, to achieve b...

Full description

Saved in:
Bibliographic Details
Published in:Complex & intelligent systems Vol. 12; no. 1; pp. 14 - 22
Main Authors: Wang, Nan, Wang, Qi
Format: Journal Article
Language:English
Published: Cham Springer International Publishing 01.01.2026
Springer Nature B.V
Springer
Subjects:
ISSN:2199-4536, 2198-6053
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Multimodal sentiment analysis (MSA) has become increasingly important for understanding human emotions, with applications in areas such as human-computer interaction, social media analysis, and emotion recognition. MSA leverages multimodal data, including text, audio, and visual inputs, to achieve better performance in emotion recognition. However, existing methods face challenges, particularly when dealing with missing modalities. While some approaches attempt to handle modality dropout, they often fail to effectively recover missing information or account for the complex interactions between different modalities. Moreover, many models treat modalities equally, not fully utilizing the unique strengths of each modality. To address these limitations, we propose the Modality-Uncertainty-aware Knowledge Distillation Framework (MUKDF). Specifically, we introduce a modality random missing strategy that enhances the model's adaptability to uncertain modality scenarios. To further improve performance, we incorporate a Dual-Branch Modality Knowledge Extractor (DMKE) that balances feature contributions across modalities and a multimodal masked transformer (MMT) designed to capture nuanced interactions between modalities. Additionally, we present a contrastive feature-level and align-based representation distillation mechanism (CFD&ARD), which strengthens the alignment between teacher and student representations, ensuring effective knowledge transfer and improved robustness in learning. Comprehensive experiments conducted on two benchmark datasets demonstrate that MUKDF outperforms several baseline models, achieving superior performance not only under complete modality conditions but also in the more challenging scenario with incomplete modalities. This highlights the effectiveness of our framework in handling the uncertainty and complexities inherent in multimodal sentiment analysis.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2199-4536
2198-6053
DOI:10.1007/s40747-025-02135-w