Content-aware sentiment understanding: cross-modal analysis with encoder-decoder architectures

The analysis of sentiment from social media data has attracted significant attention due to the proliferation of user-generated opinions and comments on these platforms. Social media content is often multi-modal, frequently combining images and text within single posts. To effectively estimate user...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of computational social science Jg. 8; H. 2
Hauptverfasser: Pakdaman, Zahra, Koochari, Abbas, Sharifi, Arash
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Singapore Springer Nature Singapore 01.05.2025
Schlagworte:
ISSN:2432-2717, 2432-2725
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The analysis of sentiment from social media data has attracted significant attention due to the proliferation of user-generated opinions and comments on these platforms. Social media content is often multi-modal, frequently combining images and text within single posts. To effectively estimate user sentiment across multiple content types, this study proposes a multimodal content-aware approach. It distinguishes text-dominant images, memes, and regular images, extracting embedded text from memes or text-dominant images. Using the Swin Transformer-GPT-2 (encoder-decoder) architecture, captions are generated for image analysis. The user’s sentiment is then estimated by analyzing embedded text, generated captions, and user-provided captions through a BiLSTM-LSTM (encoder-decoder) architecture and fully connected layers. The proposed method demonstrates superior performance, achieving 93% accuracy on the MVSA-Single dataset, 79% accuracy on the MVSA-Multiple dataset, and 90% accuracy on the TWITTER (Large) dataset surpassing current state-of-the-art methods.
ISSN:2432-2717
2432-2725
DOI:10.1007/s42001-025-00374-y