Machine learning techniques for hate speech classification of twitter data: State-of-the-art, future challenges and research directions

Twitter is a microblogging tool that allow the creation of big data through short digital contents. This study provides a survey of machine learning techniques for hate speech classification from Twitter data streams. Hate speech classification in Twitter data streams has remain a vibrant research f...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Computer science review Ročník 38; s. 100311
Hlavní autoři: Ayo, Femi Emmanuel, Folorunso, Olusegun, Ibharalu, Friday Thomas, Osinuga, Idowu Ademola
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Inc 01.11.2020
Témata:
ISSN:1574-0137, 1876-7745
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Twitter is a microblogging tool that allow the creation of big data through short digital contents. This study provides a survey of machine learning techniques for hate speech classification from Twitter data streams. Hate speech classification in Twitter data streams has remain a vibrant research focus, but little research efforts have been devoted to the design of a generic metadata architecture, threshold settings and fragmentation issues. Hate speech classification techniques presented in literature address some of the challenges inherent in Twitter data streams but limited in the aforementioned issues. This study presented collection of hate speech benchmarks datasets suitable for testing the efficiency of classification models. This study also presented the pros and cons for single and hybrid machine learning methods in hate speech classification. The summary of performance evaluation for the surveyed machine learning methods was also presented. The study also presented a generic metadata architecture for hate speech classification in Twitter to tackle issues with Twitter data streams. The developed generic metadata architecture was observed to performed better across all evaluation metrics for hate speech detection having 0.95, 0.93, 0.92 and 0.93 for accuracy, precision, recall and F1-score respectively, when compared to similar methods. Similarly, the developed generic metadata architecture for hate speech sentiment classification performed better with F1-score of 91.5% compared to related methods. The developed generic metadata architecture also indicates a more perfect test having an AUC of 0.97, when compared to similar methods. The statistical validation of results points out the efficiency of the developed system. Finally, the results also showed that the developed system is very good for automatic topic detection and categorization. •This study presented collection of hate speech benchmarks datasets.•This study also presented the pros and cons for single and hybrid machine learning methods in hate speech classification.•The summary of performance evaluation for the surveyed machine learning methods was also presented.•The study also presented a generic metadata architecture for hate speech classification in Twitter data.•The results showed that the developed generic metadata model is good for topic detection and categorization.
ISSN:1574-0137
1876-7745
DOI:10.1016/j.cosrev.2020.100311