A probabilistic clustering model for hate speech classification in twitter

•A probabilistic clustering model for hate speech classification in twitter was developed.•The use of a naïve Bayes model to improve features representation.•The use of a modified Jaccard similarity measure for clustering real-time tweet into topic clusters.•The use of 4-level scale fuzzy model for...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications Vol. 173; p. 114762
Main Authors: Ayo, Femi Emmanuel, Folorunso, Olusegun, Ibharalu, Friday Thomas, Osinuga, Idowu Ademola, Abayomi-Alli, Adebayo
Format: Journal Article
Language:English
Published: New York Elsevier Ltd 01.07.2021
Elsevier BV
Subjects:
ISSN:0957-4174, 1873-6793
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•A probabilistic clustering model for hate speech classification in twitter was developed.•The use of a naïve Bayes model to improve features representation.•The use of a modified Jaccard similarity measure for clustering real-time tweet into topic clusters.•The use of 4-level scale fuzzy model for hate speech classification.•The Paired Sample t-Test validated the efficiency of the developed model. The key challenges for automatic hate-speech classification in Twitter are the lack of generic architecture, imprecision, threshold settings and fragmentation issues. Most studies used binary classifiers for hate speech classification, but these classifiers cannot really capture other emotions that may overlap between positive or negative class. Hence, a probabilistic clustering model for hate speech classification in twitter was developed to tackle problems with hate speech classification. A metadata extractor was used to collect tweets containing hate speech keywords and a crowd-sourced experts was employed to label the collected hate tweets into two categories: hate speech and non-hate speech. Features representation was done with Term Frequency- Inverse Document Frequency (TF-IDF) model and enhanced with topics inferred by a Bayes classifier. A rule-based clustering method was used to automatically classify real-time tweets into the correct topic clusters. Fuzzy logic was then used for hate speech classification using semantic fuzzy rules and a score computation module. From the evaluation results, it was observed that the developed model performed better in hate speech detection with F1-sore of 0.9256 using a 5-fold cross validation. Similarly, the developed model for hate speech classification performed better with F1-score of 91.5 compared to related models. The developed model also indicates a more perfect test having an AUC of 0.9645, when compared to similar methods. The Paired Sample t-Test validated the efficiency of the developed model for hate speech classification.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2021.114762