An inter-modal attention-based deep learning framework using unified modality for multimodal fake news, hate speech and offensive language detection

Fake news, hate speech and offensive language are related evil triplets currently affecting modern societies. Text modality for the computational detection of these phenomena has been widely used. In recent times, multimodal studies in this direction are attracting a lot of interests because of the...

Full description

Saved in:

Bibliographic Details
Published in:	Information systems (Oxford) Vol. 123; p. 102378
Main Authors:	Ayetiran, Eniafe Festus, Özgöbek, Özlem
Format:	Journal Article
Language:	English
Published:	Elsevier Ltd 01.07.2024
Subjects:	BiLSTM-CNN Fake news Hate speech Inter-modal attention Multimodal content understanding Multimodal fusion Offensive language Unified modality Inter-modal attention Unified modality BiLSTM-CNN Hate speech Multimodal fusion Offensive language Fake news Multimodal content understanding
ISSN:	0306-4379, 1873-6076
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Fake news, hate speech and offensive language are related evil triplets currently affecting modern societies. Text modality for the computational detection of these phenomena has been widely used. In recent times, multimodal studies in this direction are attracting a lot of interests because of the potentials offered by other modalities in contributing to the detection of these menaces. However, a major problem in multimodal content understanding is how to effectively model the complementarity of the different modalities due to their diverse characteristics and features. From a multimodal point of view, the three tasks have been studied mainly using image and text modalities. Improving the effectiveness of the diverse multimodal approaches is still an open research topic. In addition to the traditional text and image modalities, we consider image–texts which are rarely used in previous studies but which contain useful information for enhancing the effectiveness of a prediction model. In order to ease multimodal content understanding and enhance prediction, we leverage recent advances in computer vision and deep learning for these tasks. First, we unify the modalities by creating a text representation of the images and image–texts, in addition to the main text. Secondly, we propose a multi-layer deep neural network with inter-modal attention mechanism to model the complementarity among these modalities. We conduct extensive experiments involving three standard datasets covering the three tasks. Experimental results show that detection of fake news, hate speech and offensive language can benefit from this approach. Furthermore, we conduct robust ablation experiments to show the effectiveness of our approach. Our model predominantly outperforms prior works across the datasets. •A unified deep learning model can be used for multimodal fake news, hate speech and offensive language detection.•Unifying modalities is useful for multimodal content understanding.•Inter-modal attention mechanism is effective for multimodal-based deep learning models.•The inter-modal attention deep learning framework is effective for fake news, hate speech and offensive language detection.•Incorporation of image-texts as additional modality improves performance. The model can be tuned to use desired number of modalities.
ISSN:	0306-4379 1873-6076
DOI:	10.1016/j.is.2024.102378