An inter-modal attention-based deep learning framework using unified modality for multimodal fake news, hate speech and offensive language detection

Fake news, hate speech and offensive language are related evil triplets currently affecting modern societies. Text modality for the computational detection of these phenomena has been widely used. In recent times, multimodal studies in this direction are attracting a lot of interests because of the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information systems (Oxford) Jg. 123; S. 102378
Hauptverfasser: Ayetiran, Eniafe Festus, Özgöbek, Özlem
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier Ltd 01.07.2024
Schlagworte:
ISSN:0306-4379, 1873-6076
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Fake news, hate speech and offensive language are related evil triplets currently affecting modern societies. Text modality for the computational detection of these phenomena has been widely used. In recent times, multimodal studies in this direction are attracting a lot of interests because of the potentials offered by other modalities in contributing to the detection of these menaces. However, a major problem in multimodal content understanding is how to effectively model the complementarity of the different modalities due to their diverse characteristics and features. From a multimodal point of view, the three tasks have been studied mainly using image and text modalities. Improving the effectiveness of the diverse multimodal approaches is still an open research topic. In addition to the traditional text and image modalities, we consider image–texts which are rarely used in previous studies but which contain useful information for enhancing the effectiveness of a prediction model. In order to ease multimodal content understanding and enhance prediction, we leverage recent advances in computer vision and deep learning for these tasks. First, we unify the modalities by creating a text representation of the images and image–texts, in addition to the main text. Secondly, we propose a multi-layer deep neural network with inter-modal attention mechanism to model the complementarity among these modalities. We conduct extensive experiments involving three standard datasets covering the three tasks. Experimental results show that detection of fake news, hate speech and offensive language can benefit from this approach. Furthermore, we conduct robust ablation experiments to show the effectiveness of our approach. Our model predominantly outperforms prior works across the datasets. •A unified deep learning model can be used for multimodal fake news, hate speech and offensive language detection.•Unifying modalities is useful for multimodal content understanding.•Inter-modal attention mechanism is effective for multimodal-based deep learning models.•The inter-modal attention deep learning framework is effective for fake news, hate speech and offensive language detection.•Incorporation of image-texts as additional modality improves performance. The model can be tuned to use desired number of modalities.
ISSN:0306-4379
1873-6076
DOI:10.1016/j.is.2024.102378