Hate speech recognition in multilingual text: hinglish documents

The Internet is a boon for mankind but its misuse has been increasing drastically. Social networking platforms such as Facebook, Twitter and Instagram play a predominant role in expressing views by the users. Sometimes users wield abusive or inflammatory language, that may provoke readers. This pape...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of information technology (Singapore. Online) Jg. 15; H. 3; S. 1319 - 1331
Hauptverfasser:	Yadav, Arun Kumar, Kumar, Mohit, Kumar, Abhishek, Shivani, Kusum, Yadav, Divakar
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Singapore Springer Nature Singapore 01.03.2023 Springer Nature B.V
Schlagworte:	Accuracy Artificial Intelligence Computer Imaging Computer Science Datasets Deep learning Digital media Embedding Hate speech Image Processing and Computer Vision Internet Literature reviews Machine Learning Misogyny Original Research Pattern Recognition and Graphics Social networks Software Engineering Speech recognition Violence Vision Words (language) India Deep learning CNN BiLSTM Hate speech Machine learning Word2Vec
ISSN:	2511-2104, 2511-2112
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The Internet is a boon for mankind but its misuse has been increasing drastically. Social networking platforms such as Facebook, Twitter and Instagram play a predominant role in expressing views by the users. Sometimes users wield abusive or inflammatory language, that may provoke readers. This paper aims to evaluate various machine learning and deep learning techniques to detect hate speech on various social media platforms in the Hinglish (English-Hindi code-mix) language. In this paper, we apply and evaluate several machine learning and deep learning methods, along with various feature extraction and word-embedding techniques, on a consolidated dataset of 20600 instances, for hate speech detection from tweets and comments in Hinglish. The experimental results reveal that deep learning models perform better than machine learning models in general. Among the deep learning models, the CNN-BiLSTM model with word2vec word embedding provides the best results. The model yields 0.876 accuracy, 0.830 precision, 0.840 recall and 0.835 F1-score. These results surpass the recent state-of-art approaches.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2511-2104 2511-2112
DOI:	10.1007/s41870-023-01211-z