Fine tuning large language models for hate speech detection in Hinglish and code mixed custom dataset through a socially responsible approach for safer digital platforms.
Saved in:
| Title: | Fine tuning large language models for hate speech detection in Hinglish and code mixed custom dataset through a socially responsible approach for safer digital platforms. |
|---|---|
| Authors: | Rathore, Bhawani Singh, Chaurasia, Sandeep |
| Source: | Discover Sustainability; 12/22/2025, Vol. 6 Issue 1, p1-27, 27p |
| Subject Terms: | LANGUAGE models, CODE switching (Linguistics), ETHICAL problems, CLASSIFICATION, DISCRIMINATORY language |
| Abstract: | This paper presents an extensive review and experimental framework for hate speech detection using large language models (LLMs). We explored traditional machine learning approaches and illustrated how state-of-the-art LLMs, such as BERT variants (DistilBERT and ModernBERT) and recent developments, such as DeepSeek R1, Llama 3, Gemma 3, Mistral, and Phi 3.5, enhance text classification performance. In this study, we evaluated the effectiveness of various large language models (LLMs) for Hindi–English code-mixed hate speech detection. We constructed a custom dataset using Reddit posts on social media, which were auto-labeled and subsequently verified and validated by expert annotators. LoRa adapters were used for model fine-tuning and in-domain performance evaluation of the model, with precision, recall, F1-score, and accuracy as metrics. To ensure a fair comparison, all the models were trained for the same number of epochs under the same training conditions. According to our findings, DeepSeek R1 outperformed larger general-purpose LLMs, such as Llama 3 and Gemma 3, with smaller margins. DeepSeek R1 surpassed the other models with the highest accuracy of 79 percent, demonstrating a better understanding of the complexity of code-mixed hate speech detection. These results indicate that lightweight, responsibly fine-tuned LLMs can strengthen moderation for multilingual, code-mixed communities without prohibitive computing costs. In practice, the approach offers a clear path towards safer, more inclusive platforms by pairing accuracy with bias-aware development and human oversight. [ABSTRACT FROM AUTHOR] |
| Copyright of Discover Sustainability is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Database: | Complementary Index |
Be the first to leave a comment!
Full Text Finder
Nájsť tento článok vo Web of Science