The dialects gap: A multi-task learning approach for enhancing hate speech detection in Arabic dialects

Hate speech is a complex and often debated concept within Arabic dialects. Handling and detecting hate speech in Arabic poses unique challenges due to the diverse dialects that exhibit several linguistic variations, whether in meaning or context. Previous studies have often used multiple Arabic dial...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications Vol. 295; p. 128584
Main Authors: Abdelsamie, Mahmoud Mohamed, Azab, Shahira Shaaban, Hefny, Hesham A.
Format: Journal Article
Language:English
Published: Elsevier Ltd 01.01.2026
Subjects:
ISSN:0957-4174
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Hate speech is a complex and often debated concept within Arabic dialects. Handling and detecting hate speech in Arabic poses unique challenges due to the diverse dialects that exhibit several linguistic variations, whether in meaning or context. Previous studies have often used multiple Arabic dialects combined within a single corpus without specifying the dialects used, which is problematic because it can lead to misidentification of hateful and non-hateful contexts related to a particular dialect. This research therefore aims to address the challenge of dialectal variation ambiguity, which has led to polarity misidentification in previous studies that often fail to distinguish between contexts or terms that have the same form and carry different meanings across different Arabic dialects. In this paper, we propose a multi-task learning approach built upon transformer architecture to bridge this gap in hate speech detection across Arabic dialects. Using publicly available datasets from various dialects, the proposed model is designed to identify and distinguish subtle hate speech patterns and use shared representation knowledge across five Arabic dialects: Egyptian, Saudi, Levant, Gulf, and Algerian. To the best of our knowledge, it is the first model to simultaneously address multiple dialects and recognize hate speech by using the distinctive characteristics of each dialect. Our findings show that the proposed model makes a significant contribution to advancing hate speech detection in the Arabic language, surpassing single-task models. It achieved F1 scores of 0.98, 0.84, 0.85, 0.76, and 0.80 for the respective dialects of Egyptian, Levant, Saudi, Algerian, and Gulf, representing overall improvements of 14% compared to previous research. These results showcase the effectiveness of our approach, demonstrating not only high performance but also an accurate understanding of dialect-specific hate speech.
ISSN:0957-4174
DOI:10.1016/j.eswa.2025.128584