Modality-uncertainty-aware knowledge distillation framework for multimodal sentiment analysis
Multimodal sentiment analysis (MSA) has become increasingly important for understanding human emotions, with applications in areas such as human-computer interaction, social media analysis, and emotion recognition. MSA leverages multimodal data, including text, audio, and visual inputs, to achieve b...
Uloženo v:
| Vydáno v: | Complex & intelligent systems Ročník 12; číslo 1; s. 14 - 22 |
|---|---|
| Hlavní autoři: | , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Cham
Springer International Publishing
01.01.2026
Springer Nature B.V Springer |
| Témata: | |
| ISSN: | 2199-4536, 2198-6053 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Multimodal sentiment analysis (MSA) has become increasingly important for understanding human emotions, with applications in areas such as human-computer interaction, social media analysis, and emotion recognition. MSA leverages multimodal data, including text, audio, and visual inputs, to achieve better performance in emotion recognition. However, existing methods face challenges, particularly when dealing with missing modalities. While some approaches attempt to handle modality dropout, they often fail to effectively recover missing information or account for the complex interactions between different modalities. Moreover, many models treat modalities equally, not fully utilizing the unique strengths of each modality. To address these limitations, we propose the Modality-Uncertainty-aware Knowledge Distillation Framework (MUKDF). Specifically, we introduce a modality random missing strategy that enhances the model's adaptability to uncertain modality scenarios. To further improve performance, we incorporate a Dual-Branch Modality Knowledge Extractor (DMKE) that balances feature contributions across modalities and a multimodal masked transformer (MMT) designed to capture nuanced interactions between modalities. Additionally, we present a contrastive feature-level and align-based representation distillation mechanism (CFD&ARD), which strengthens the alignment between teacher and student representations, ensuring effective knowledge transfer and improved robustness in learning. Comprehensive experiments conducted on two benchmark datasets demonstrate that MUKDF outperforms several baseline models, achieving superior performance not only under complete modality conditions but also in the more challenging scenario with incomplete modalities. This highlights the effectiveness of our framework in handling the uncertainty and complexities inherent in multimodal sentiment analysis. |
|---|---|
| Bibliografie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 2199-4536 2198-6053 |
| DOI: | 10.1007/s40747-025-02135-w |