Towards Accurate Detection of Offensive Language in Online Communication in Arabic
We present the results of predictive modelling for the detection of anti-social behaviour in online communication in Arabic, such as comments which contain obscene or offensive words and phrases. We collected and labelled a large dataset of YouTube comments in Arabic which contains a broad range of...
Gespeichert in:
| Veröffentlicht in: | Procedia computer science Jg. 142; S. 315 - 320 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Elsevier B.V
2018
|
| Schlagworte: | |
| ISSN: | 1877-0509, 1877-0509 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | We present the results of predictive modelling for the detection of anti-social behaviour in online communication in Arabic, such as comments which contain obscene or offensive words and phrases. We collected and labelled a large dataset of YouTube comments in Arabic which contains a broad range of both offensive and inoffensive comments. We used this dataset to train a Support Vector Machine classifier and experimented with combinations of word-level features, N-gram features and a variety of pre-processing techniques. We summarise the pre-processing steps and features that allow training a classifier which is more precise, with 90.05% accuracy, than classifiers reported by previous studies on Arabic text. |
|---|---|
| ISSN: | 1877-0509 1877-0509 |
| DOI: | 10.1016/j.procs.2018.10.491 |