Fine tuning large language models for hate speech detection in Hinglish and code mixed custom dataset through a socially responsible approach for safer digital platforms.
Gespeichert in:
| Titel: | Fine tuning large language models for hate speech detection in Hinglish and code mixed custom dataset through a socially responsible approach for safer digital platforms. |
|---|---|
| Autoren: | Rathore, Bhawani Singh, Chaurasia, Sandeep |
| Quelle: | Discover Sustainability; 12/22/2025, Vol. 6 Issue 1, p1-27, 27p |
| Schlagwörter: | LANGUAGE models, CODE switching (Linguistics), ETHICAL problems, CLASSIFICATION, DISCRIMINATORY language |
| Abstract: | This paper presents an extensive review and experimental framework for hate speech detection using large language models (LLMs). We explored traditional machine learning approaches and illustrated how state-of-the-art LLMs, such as BERT variants (DistilBERT and ModernBERT) and recent developments, such as DeepSeek R1, Llama 3, Gemma 3, Mistral, and Phi 3.5, enhance text classification performance. In this study, we evaluated the effectiveness of various large language models (LLMs) for Hindi–English code-mixed hate speech detection. We constructed a custom dataset using Reddit posts on social media, which were auto-labeled and subsequently verified and validated by expert annotators. LoRa adapters were used for model fine-tuning and in-domain performance evaluation of the model, with precision, recall, F1-score, and accuracy as metrics. To ensure a fair comparison, all the models were trained for the same number of epochs under the same training conditions. According to our findings, DeepSeek R1 outperformed larger general-purpose LLMs, such as Llama 3 and Gemma 3, with smaller margins. DeepSeek R1 surpassed the other models with the highest accuracy of 79 percent, demonstrating a better understanding of the complexity of code-mixed hate speech detection. These results indicate that lightweight, responsibly fine-tuned LLMs can strengthen moderation for multilingual, code-mixed communities without prohibitive computing costs. In practice, the approach offers a clear path towards safer, more inclusive platforms by pairing accuracy with bias-aware development and human oversight. [ABSTRACT FROM AUTHOR] |
| Copyright of Discover Sustainability is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Datenbank: | Complementary Index |
| FullText | Text: Availability: 0 CustomLinks: – Url: https://resolver.ebscohost.com/openurl?sid=EBSCO:edb&genre=article&issn=26629984&ISBN=&volume=6&issue=1&date=20251222&spage=1&pages=1-27&title=Discover Sustainability&atitle=Fine%20tuning%20large%20language%20models%20for%20hate%20speech%20detection%20in%20Hinglish%20and%20code%20mixed%20custom%20dataset%20through%20a%20socially%20responsible%20approach%20for%20safer%20digital%20platforms.&aulast=Rathore%2C%20Bhawani%20Singh&id=DOI:10.1007/s43621-025-02190-w Name: Full Text Finder Category: fullText Text: Full Text Finder Icon: https://imageserver.ebscohost.com/branding/images/FTF.gif MouseOverText: Full Text Finder – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Rathore%20BS Name: ISI Category: fullText Text: Nájsť tento článok vo Web of Science Icon: https://imagesrvr.epnet.com/ls/20docs.gif MouseOverText: Nájsť tento článok vo Web of Science |
|---|---|
| Header | DbId: edb DbLabel: Complementary Index An: 190407721 RelevancyScore: 1082 AccessLevel: 6 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 1082.15173339844 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: Fine tuning large language models for hate speech detection in Hinglish and code mixed custom dataset through a socially responsible approach for safer digital platforms. – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Rathore%2C+Bhawani+Singh%22">Rathore, Bhawani Singh</searchLink><br /><searchLink fieldCode="AR" term="%22Chaurasia%2C+Sandeep%22">Chaurasia, Sandeep</searchLink> – Name: TitleSource Label: Source Group: Src Data: Discover Sustainability; 12/22/2025, Vol. 6 Issue 1, p1-27, 27p – Name: Subject Label: Subject Terms Group: Su Data: <searchLink fieldCode="DE" term="%22LANGUAGE+models%22">LANGUAGE models</searchLink><br /><searchLink fieldCode="DE" term="%22CODE+switching+%28Linguistics%29%22">CODE switching (Linguistics)</searchLink><br /><searchLink fieldCode="DE" term="%22ETHICAL+problems%22">ETHICAL problems</searchLink><br /><searchLink fieldCode="DE" term="%22CLASSIFICATION%22">CLASSIFICATION</searchLink><br /><searchLink fieldCode="DE" term="%22DISCRIMINATORY+language%22">DISCRIMINATORY language</searchLink> – Name: Abstract Label: Abstract Group: Ab Data: This paper presents an extensive review and experimental framework for hate speech detection using large language models (LLMs). We explored traditional machine learning approaches and illustrated how state-of-the-art LLMs, such as BERT variants (DistilBERT and ModernBERT) and recent developments, such as DeepSeek R1, Llama 3, Gemma 3, Mistral, and Phi 3.5, enhance text classification performance. In this study, we evaluated the effectiveness of various large language models (LLMs) for Hindi–English code-mixed hate speech detection. We constructed a custom dataset using Reddit posts on social media, which were auto-labeled and subsequently verified and validated by expert annotators. LoRa adapters were used for model fine-tuning and in-domain performance evaluation of the model, with precision, recall, F1-score, and accuracy as metrics. To ensure a fair comparison, all the models were trained for the same number of epochs under the same training conditions. According to our findings, DeepSeek R1 outperformed larger general-purpose LLMs, such as Llama 3 and Gemma 3, with smaller margins. DeepSeek R1 surpassed the other models with the highest accuracy of 79 percent, demonstrating a better understanding of the complexity of code-mixed hate speech detection. These results indicate that lightweight, responsibly fine-tuned LLMs can strengthen moderation for multilingual, code-mixed communities without prohibitive computing costs. In practice, the approach offers a clear path towards safer, more inclusive platforms by pairing accuracy with bias-aware development and human oversight. [ABSTRACT FROM AUTHOR] – Name: Abstract Label: Group: Ab Data: <i>Copyright of Discover Sustainability is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.) |
| PLink | https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edb&AN=190407721 |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1007/s43621-025-02190-w Languages: – Code: eng Text: English PhysicalDescription: Pagination: PageCount: 27 StartPage: 1 Subjects: – SubjectFull: LANGUAGE models Type: general – SubjectFull: CODE switching (Linguistics) Type: general – SubjectFull: ETHICAL problems Type: general – SubjectFull: CLASSIFICATION Type: general – SubjectFull: DISCRIMINATORY language Type: general Titles: – TitleFull: Fine tuning large language models for hate speech detection in Hinglish and code mixed custom dataset through a socially responsible approach for safer digital platforms. Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Rathore, Bhawani Singh – PersonEntity: Name: NameFull: Chaurasia, Sandeep IsPartOfRelationships: – BibEntity: Dates: – D: 22 M: 12 Text: 12/22/2025 Type: published Y: 2025 Identifiers: – Type: issn-print Value: 26629984 Numbering: – Type: volume Value: 6 – Type: issue Value: 1 Titles: – TitleFull: Discover Sustainability Type: main |
| ResultId | 1 |
Full Text Finder
Nájsť tento článok vo Web of Science