Fine tuning large language models for hate speech detection in Hinglish and code mixed custom dataset through a socially responsible approach for safer digital platforms.

Gespeichert in:
Bibliographische Detailangaben
Titel: Fine tuning large language models for hate speech detection in Hinglish and code mixed custom dataset through a socially responsible approach for safer digital platforms.
Autoren: Rathore, Bhawani Singh, Chaurasia, Sandeep
Quelle: Discover Sustainability; 12/22/2025, Vol. 6 Issue 1, p1-27, 27p
Schlagwörter: LANGUAGE models, CODE switching (Linguistics), ETHICAL problems, CLASSIFICATION, DISCRIMINATORY language
Abstract: This paper presents an extensive review and experimental framework for hate speech detection using large language models (LLMs). We explored traditional machine learning approaches and illustrated how state-of-the-art LLMs, such as BERT variants (DistilBERT and ModernBERT) and recent developments, such as DeepSeek R1, Llama 3, Gemma 3, Mistral, and Phi 3.5, enhance text classification performance. In this study, we evaluated the effectiveness of various large language models (LLMs) for Hindi–English code-mixed hate speech detection. We constructed a custom dataset using Reddit posts on social media, which were auto-labeled and subsequently verified and validated by expert annotators. LoRa adapters were used for model fine-tuning and in-domain performance evaluation of the model, with precision, recall, F1-score, and accuracy as metrics. To ensure a fair comparison, all the models were trained for the same number of epochs under the same training conditions. According to our findings, DeepSeek R1 outperformed larger general-purpose LLMs, such as Llama 3 and Gemma 3, with smaller margins. DeepSeek R1 surpassed the other models with the highest accuracy of 79 percent, demonstrating a better understanding of the complexity of code-mixed hate speech detection. These results indicate that lightweight, responsibly fine-tuned LLMs can strengthen moderation for multilingual, code-mixed communities without prohibitive computing costs. In practice, the approach offers a clear path towards safer, more inclusive platforms by pairing accuracy with bias-aware development and human oversight. [ABSTRACT FROM AUTHOR]
Copyright of Discover Sustainability is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Datenbank: Complementary Index
FullText Text:
  Availability: 0
CustomLinks:
  – Url: https://resolver.ebscohost.com/openurl?sid=EBSCO:edb&genre=article&issn=26629984&ISBN=&volume=6&issue=1&date=20251222&spage=1&pages=1-27&title=Discover Sustainability&atitle=Fine%20tuning%20large%20language%20models%20for%20hate%20speech%20detection%20in%20Hinglish%20and%20code%20mixed%20custom%20dataset%20through%20a%20socially%20responsible%20approach%20for%20safer%20digital%20platforms.&aulast=Rathore%2C%20Bhawani%20Singh&id=DOI:10.1007/s43621-025-02190-w
    Name: Full Text Finder
    Category: fullText
    Text: Full Text Finder
    Icon: https://imageserver.ebscohost.com/branding/images/FTF.gif
    MouseOverText: Full Text Finder
  – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Rathore%20BS
    Name: ISI
    Category: fullText
    Text: Nájsť tento článok vo Web of Science
    Icon: https://imagesrvr.epnet.com/ls/20docs.gif
    MouseOverText: Nájsť tento článok vo Web of Science
Header DbId: edb
DbLabel: Complementary Index
An: 190407721
RelevancyScore: 1082
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 1082.15173339844
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Fine tuning large language models for hate speech detection in Hinglish and code mixed custom dataset through a socially responsible approach for safer digital platforms.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Rathore%2C+Bhawani+Singh%22">Rathore, Bhawani Singh</searchLink><br /><searchLink fieldCode="AR" term="%22Chaurasia%2C+Sandeep%22">Chaurasia, Sandeep</searchLink>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: Discover Sustainability; 12/22/2025, Vol. 6 Issue 1, p1-27, 27p
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22LANGUAGE+models%22">LANGUAGE models</searchLink><br /><searchLink fieldCode="DE" term="%22CODE+switching+%28Linguistics%29%22">CODE switching (Linguistics)</searchLink><br /><searchLink fieldCode="DE" term="%22ETHICAL+problems%22">ETHICAL problems</searchLink><br /><searchLink fieldCode="DE" term="%22CLASSIFICATION%22">CLASSIFICATION</searchLink><br /><searchLink fieldCode="DE" term="%22DISCRIMINATORY+language%22">DISCRIMINATORY language</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: This paper presents an extensive review and experimental framework for hate speech detection using large language models (LLMs). We explored traditional machine learning approaches and illustrated how state-of-the-art LLMs, such as BERT variants (DistilBERT and ModernBERT) and recent developments, such as DeepSeek R1, Llama 3, Gemma 3, Mistral, and Phi 3.5, enhance text classification performance. In this study, we evaluated the effectiveness of various large language models (LLMs) for Hindi–English code-mixed hate speech detection. We constructed a custom dataset using Reddit posts on social media, which were auto-labeled and subsequently verified and validated by expert annotators. LoRa adapters were used for model fine-tuning and in-domain performance evaluation of the model, with precision, recall, F1-score, and accuracy as metrics. To ensure a fair comparison, all the models were trained for the same number of epochs under the same training conditions. According to our findings, DeepSeek R1 outperformed larger general-purpose LLMs, such as Llama 3 and Gemma 3, with smaller margins. DeepSeek R1 surpassed the other models with the highest accuracy of 79 percent, demonstrating a better understanding of the complexity of code-mixed hate speech detection. These results indicate that lightweight, responsibly fine-tuned LLMs can strengthen moderation for multilingual, code-mixed communities without prohibitive computing costs. In practice, the approach offers a clear path towards safer, more inclusive platforms by pairing accuracy with bias-aware development and human oversight. [ABSTRACT FROM AUTHOR]
– Name: Abstract
  Label:
  Group: Ab
  Data: <i>Copyright of Discover Sustainability is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edb&AN=190407721
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1007/s43621-025-02190-w
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 27
        StartPage: 1
    Subjects:
      – SubjectFull: LANGUAGE models
        Type: general
      – SubjectFull: CODE switching (Linguistics)
        Type: general
      – SubjectFull: ETHICAL problems
        Type: general
      – SubjectFull: CLASSIFICATION
        Type: general
      – SubjectFull: DISCRIMINATORY language
        Type: general
    Titles:
      – TitleFull: Fine tuning large language models for hate speech detection in Hinglish and code mixed custom dataset through a socially responsible approach for safer digital platforms.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Rathore, Bhawani Singh
      – PersonEntity:
          Name:
            NameFull: Chaurasia, Sandeep
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 22
              M: 12
              Text: 12/22/2025
              Type: published
              Y: 2025
          Identifiers:
            – Type: issn-print
              Value: 26629984
          Numbering:
            – Type: volume
              Value: 6
            – Type: issue
              Value: 1
          Titles:
            – TitleFull: Discover Sustainability
              Type: main
ResultId 1