The dialects gap: A multi-task learning approach for enhancing hate speech detection in Arabic dialects

Hate speech is a complex and often debated concept within Arabic dialects. Handling and detecting hate speech in Arabic poses unique challenges due to the diverse dialects that exhibit several linguistic variations, whether in meaning or context. Previous studies have often used multiple Arabic dial...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Expert systems with applications Ročník 295; s. 128584
Hlavní autoři: Abdelsamie, Mahmoud Mohamed, Azab, Shahira Shaaban, Hefny, Hesham A.
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Ltd 01.01.2026
Témata:
ISSN:0957-4174
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Hate speech is a complex and often debated concept within Arabic dialects. Handling and detecting hate speech in Arabic poses unique challenges due to the diverse dialects that exhibit several linguistic variations, whether in meaning or context. Previous studies have often used multiple Arabic dialects combined within a single corpus without specifying the dialects used, which is problematic because it can lead to misidentification of hateful and non-hateful contexts related to a particular dialect. This research therefore aims to address the challenge of dialectal variation ambiguity, which has led to polarity misidentification in previous studies that often fail to distinguish between contexts or terms that have the same form and carry different meanings across different Arabic dialects. In this paper, we propose a multi-task learning approach built upon transformer architecture to bridge this gap in hate speech detection across Arabic dialects. Using publicly available datasets from various dialects, the proposed model is designed to identify and distinguish subtle hate speech patterns and use shared representation knowledge across five Arabic dialects: Egyptian, Saudi, Levant, Gulf, and Algerian. To the best of our knowledge, it is the first model to simultaneously address multiple dialects and recognize hate speech by using the distinctive characteristics of each dialect. Our findings show that the proposed model makes a significant contribution to advancing hate speech detection in the Arabic language, surpassing single-task models. It achieved F1 scores of 0.98, 0.84, 0.85, 0.76, and 0.80 for the respective dialects of Egyptian, Levant, Saudi, Algerian, and Gulf, representing overall improvements of 14% compared to previous research. These results showcase the effectiveness of our approach, demonstrating not only high performance but also an accurate understanding of dialect-specific hate speech.
ISSN:0957-4174
DOI:10.1016/j.eswa.2025.128584