Detecting Arabic Offensive Language in Microblogs Using Domain-Specific Word Embeddings and Deep Learning

In recent years, social media networks are emerging as a key player by providing platforms for opinions expression, communication, and content distribution. However, users often take advantage of perceived anonymity on social media platforms to share offensive or hateful content. Thus, offensive lan...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Tehnički glasnik Ročník 16; číslo 3; s. 394 - 400
Hlavní autoři: O. Aljuhani, Khulood, H. Alyoubi, Khaled, S. Alotaibi, Fahd
Médium: Journal Article
Jazyk:angličtina
Vydáno: University North 21.06.2022
Témata:
ISSN:1846-6168, 1848-5588
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:In recent years, social media networks are emerging as a key player by providing platforms for opinions expression, communication, and content distribution. However, users often take advantage of perceived anonymity on social media platforms to share offensive or hateful content. Thus, offensive language has grown as a significant issue with the increase in online communication and the popularity of social media platforms. This problem has attracted significant attention for devising methods for detecting offensive content and preventing its spread on online social networks. Therefore, this paper aims to develop an effective Arabic offensive language detection model by employing deep learning and semantic and contextual features. This paper proposes a deep learning approach that utilizes the bidirectional long short-term memory (BiLSTM) model and domain-specific word embeddings extracted from an Arabic offensive dataset. The detection approach was evaluated on an Arabic dataset collected from Twitter. The results showed the highest performance accuracy of 0.93% with the BiLSTM model trained using a combination of domain-specific and agnostic-domain word embeddings.
ISSN:1846-6168
1848-5588
DOI:10.31803/tg-20220305120018