An inter-modal attention-based deep learning framework using unified modality for multimodal fake news, hate speech and offensive language detection
Fake news, hate speech and offensive language are related evil triplets currently affecting modern societies. Text modality for the computational detection of these phenomena has been widely used. In recent times, multimodal studies in this direction are attracting a lot of interests because of the...
Saved in:
| Published in: | Information systems (Oxford) Vol. 123; p. 102378 |
|---|---|
| Main Authors: | , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier Ltd
01.07.2024
|
| Subjects: | |
| ISSN: | 0306-4379, 1873-6076 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Fake news, hate speech and offensive language are related evil triplets currently affecting modern societies. Text modality for the computational detection of these phenomena has been widely used. In recent times, multimodal studies in this direction are attracting a lot of interests because of the potentials offered by other modalities in contributing to the detection of these menaces. However, a major problem in multimodal content understanding is how to effectively model the complementarity of the different modalities due to their diverse characteristics and features. From a multimodal point of view, the three tasks have been studied mainly using image and text modalities. Improving the effectiveness of the diverse multimodal approaches is still an open research topic. In addition to the traditional text and image modalities, we consider image–texts which are rarely used in previous studies but which contain useful information for enhancing the effectiveness of a prediction model. In order to ease multimodal content understanding and enhance prediction, we leverage recent advances in computer vision and deep learning for these tasks. First, we unify the modalities by creating a text representation of the images and image–texts, in addition to the main text. Secondly, we propose a multi-layer deep neural network with inter-modal attention mechanism to model the complementarity among these modalities. We conduct extensive experiments involving three standard datasets covering the three tasks. Experimental results show that detection of fake news, hate speech and offensive language can benefit from this approach. Furthermore, we conduct robust ablation experiments to show the effectiveness of our approach. Our model predominantly outperforms prior works across the datasets.
•A unified deep learning model can be used for multimodal fake news, hate speech and offensive language detection.•Unifying modalities is useful for multimodal content understanding.•Inter-modal attention mechanism is effective for multimodal-based deep learning models.•The inter-modal attention deep learning framework is effective for fake news, hate speech and offensive language detection.•Incorporation of image-texts as additional modality improves performance. The model can be tuned to use desired number of modalities. |
|---|---|
| ISSN: | 0306-4379 1873-6076 |
| DOI: | 10.1016/j.is.2024.102378 |