Modality-uncertainty-aware knowledge distillation framework for multimodal sentiment analysis

Multimodal sentiment analysis (MSA) has become increasingly important for understanding human emotions, with applications in areas such as human-computer interaction, social media analysis, and emotion recognition. MSA leverages multimodal data, including text, audio, and visual inputs, to achieve b...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Complex & intelligent systems Jg. 12; H. 1; S. 14 - 22
Hauptverfasser: Wang, Nan, Wang, Qi
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Cham Springer International Publishing 01.01.2026
Springer Nature B.V
Springer
Schlagworte:
ISSN:2199-4536, 2198-6053
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Multimodal sentiment analysis (MSA) has become increasingly important for understanding human emotions, with applications in areas such as human-computer interaction, social media analysis, and emotion recognition. MSA leverages multimodal data, including text, audio, and visual inputs, to achieve better performance in emotion recognition. However, existing methods face challenges, particularly when dealing with missing modalities. While some approaches attempt to handle modality dropout, they often fail to effectively recover missing information or account for the complex interactions between different modalities. Moreover, many models treat modalities equally, not fully utilizing the unique strengths of each modality. To address these limitations, we propose the Modality-Uncertainty-aware Knowledge Distillation Framework (MUKDF). Specifically, we introduce a modality random missing strategy that enhances the model's adaptability to uncertain modality scenarios. To further improve performance, we incorporate a Dual-Branch Modality Knowledge Extractor (DMKE) that balances feature contributions across modalities and a multimodal masked transformer (MMT) designed to capture nuanced interactions between modalities. Additionally, we present a contrastive feature-level and align-based representation distillation mechanism (CFD&ARD), which strengthens the alignment between teacher and student representations, ensuring effective knowledge transfer and improved robustness in learning. Comprehensive experiments conducted on two benchmark datasets demonstrate that MUKDF outperforms several baseline models, achieving superior performance not only under complete modality conditions but also in the more challenging scenario with incomplete modalities. This highlights the effectiveness of our framework in handling the uncertainty and complexities inherent in multimodal sentiment analysis.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2199-4536
2198-6053
DOI:10.1007/s40747-025-02135-w