Towards Accurate Detection of Offensive Language in Online Communication in Arabic

We present the results of predictive modelling for the detection of anti-social behaviour in online communication in Arabic, such as comments which contain obscene or offensive words and phrases. We collected and labelled a large dataset of YouTube comments in Arabic which contains a broad range of...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Procedia computer science Jg. 142; S. 315 - 320
Hauptverfasser:	Alakrot, Azalden, Murray, Liam, Nikolov, Nikola S.
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Elsevier B.V 2018
Schlagworte:	Anti-social behaviour online Arabic dataset harassment detection offensive language detection SVM for offensive language detection in Arabic text mining SVM for offensive language detection in Arabic Anti-social behaviour online offensive language detection Arabic dataset text mining harassment detection
ISSN:	1877-0509, 1877-0509
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We present the results of predictive modelling for the detection of anti-social behaviour in online communication in Arabic, such as comments which contain obscene or offensive words and phrases. We collected and labelled a large dataset of YouTube comments in Arabic which contains a broad range of both offensive and inoffensive comments. We used this dataset to train a Support Vector Machine classifier and experimented with combinations of word-level features, N-gram features and a variety of pre-processing techniques. We summarise the pre-processing steps and features that allow training a classifier which is more precise, with 90.05% accuracy, than classifiers reported by previous studies on Arabic text.
ISSN:	1877-0509 1877-0509
DOI:	10.1016/j.procs.2018.10.491