Towards Accurate Detection of Offensive Language in Online Communication in Arabic

We present the results of predictive modelling for the detection of anti-social behaviour in online communication in Arabic, such as comments which contain obscene or offensive words and phrases. We collected and labelled a large dataset of YouTube comments in Arabic which contains a broad range of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Procedia computer science Jg. 142; S. 315 - 320
Hauptverfasser: Alakrot, Azalden, Murray, Liam, Nikolov, Nikola S.
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier B.V 2018
Schlagworte:
ISSN:1877-0509, 1877-0509
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We present the results of predictive modelling for the detection of anti-social behaviour in online communication in Arabic, such as comments which contain obscene or offensive words and phrases. We collected and labelled a large dataset of YouTube comments in Arabic which contains a broad range of both offensive and inoffensive comments. We used this dataset to train a Support Vector Machine classifier and experimented with combinations of word-level features, N-gram features and a variety of pre-processing techniques. We summarise the pre-processing steps and features that allow training a classifier which is more precise, with 90.05% accuracy, than classifiers reported by previous studies on Arabic text.
ISSN:1877-0509
1877-0509
DOI:10.1016/j.procs.2018.10.491