A probabilistic clustering model for hate speech classification in twitter
•A probabilistic clustering model for hate speech classification in twitter was developed.•The use of a naïve Bayes model to improve features representation.•The use of a modified Jaccard similarity measure for clustering real-time tweet into topic clusters.•The use of 4-level scale fuzzy model for...
Saved in:
| Published in: | Expert systems with applications Vol. 173; p. 114762 |
|---|---|
| Main Authors: | , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York
Elsevier Ltd
01.07.2021
Elsevier BV |
| Subjects: | |
| ISSN: | 0957-4174, 1873-6793 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | •A probabilistic clustering model for hate speech classification in twitter was developed.•The use of a naïve Bayes model to improve features representation.•The use of a modified Jaccard similarity measure for clustering real-time tweet into topic clusters.•The use of 4-level scale fuzzy model for hate speech classification.•The Paired Sample t-Test validated the efficiency of the developed model.
The key challenges for automatic hate-speech classification in Twitter are the lack of generic architecture, imprecision, threshold settings and fragmentation issues. Most studies used binary classifiers for hate speech classification, but these classifiers cannot really capture other emotions that may overlap between positive or negative class. Hence, a probabilistic clustering model for hate speech classification in twitter was developed to tackle problems with hate speech classification. A metadata extractor was used to collect tweets containing hate speech keywords and a crowd-sourced experts was employed to label the collected hate tweets into two categories: hate speech and non-hate speech. Features representation was done with Term Frequency- Inverse Document Frequency (TF-IDF) model and enhanced with topics inferred by a Bayes classifier. A rule-based clustering method was used to automatically classify real-time tweets into the correct topic clusters. Fuzzy logic was then used for hate speech classification using semantic fuzzy rules and a score computation module. From the evaluation results, it was observed that the developed model performed better in hate speech detection with F1-sore of 0.9256 using a 5-fold cross validation. Similarly, the developed model for hate speech classification performed better with F1-score of 91.5 compared to related models. The developed model also indicates a more perfect test having an AUC of 0.9645, when compared to similar methods. The Paired Sample t-Test validated the efficiency of the developed model for hate speech classification. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0957-4174 1873-6793 |
| DOI: | 10.1016/j.eswa.2021.114762 |