Towards Accurate Detection of Offensive Language in Online Communication in Arabic

We present the results of predictive modelling for the detection of anti-social behaviour in online communication in Arabic, such as comments which contain obscene or offensive words and phrases. We collected and labelled a large dataset of YouTube comments in Arabic which contains a broad range of...

Full description

Saved in:
Bibliographic Details
Published in:Procedia computer science Vol. 142; pp. 315 - 320
Main Authors: Alakrot, Azalden, Murray, Liam, Nikolov, Nikola S.
Format: Journal Article
Language:English
Published: Elsevier B.V 2018
Subjects:
ISSN:1877-0509, 1877-0509
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We present the results of predictive modelling for the detection of anti-social behaviour in online communication in Arabic, such as comments which contain obscene or offensive words and phrases. We collected and labelled a large dataset of YouTube comments in Arabic which contains a broad range of both offensive and inoffensive comments. We used this dataset to train a Support Vector Machine classifier and experimented with combinations of word-level features, N-gram features and a variety of pre-processing techniques. We summarise the pre-processing steps and features that allow training a classifier which is more precise, with 90.05% accuracy, than classifiers reported by previous studies on Arabic text.
ISSN:1877-0509
1877-0509
DOI:10.1016/j.procs.2018.10.491