Content-aware sentiment understanding: cross-modal analysis with encoder-decoder architectures
The analysis of sentiment from social media data has attracted significant attention due to the proliferation of user-generated opinions and comments on these platforms. Social media content is often multi-modal, frequently combining images and text within single posts. To effectively estimate user...
Saved in:
| Published in: | Journal of computational social science Vol. 8; no. 2 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Singapore
Springer Nature Singapore
01.05.2025
|
| Subjects: | |
| ISSN: | 2432-2717, 2432-2725 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The analysis of sentiment from social media data has attracted significant attention due to the proliferation of user-generated opinions and comments on these platforms. Social media content is often multi-modal, frequently combining images and text within single posts. To effectively estimate user sentiment across multiple content types, this study proposes a multimodal content-aware approach. It distinguishes text-dominant images, memes, and regular images, extracting embedded text from memes or text-dominant images. Using the Swin Transformer-GPT-2 (encoder-decoder) architecture, captions are generated for image analysis. The user’s sentiment is then estimated by analyzing embedded text, generated captions, and user-provided captions through a BiLSTM-LSTM (encoder-decoder) architecture and fully connected layers. The proposed method demonstrates superior performance, achieving 93% accuracy on the MVSA-Single dataset, 79% accuracy on the MVSA-Multiple dataset, and 90% accuracy on the TWITTER (Large) dataset surpassing current state-of-the-art methods. |
|---|---|
| ISSN: | 2432-2717 2432-2725 |
| DOI: | 10.1007/s42001-025-00374-y |