Graph-based hostile content detection in Hindi language
Abstract Organizations and governments are struggling to handle the hostile content on social media sites ( $$Facebook^{TM}$$ , $$Twitter^{TM}$$ , etc.). While extensive research exists for English-language content, regional languages like Hindi lack robust tools and datasets for effective moderatio...
Uloženo v:
| Vydáno v: | Discover Computing Ročník 28; číslo 1; s. 1 - 23 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Springer
13.11.2025
|
| Témata: | |
| ISSN: | 2948-2992 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Abstract Organizations and governments are struggling to handle the hostile content on social media sites ( $$Facebook^{TM}$$ , $$Twitter^{TM}$$ , etc.). While extensive research exists for English-language content, regional languages like Hindi lack robust tools and datasets for effective moderation. This study proposes a scalable AI-based framework for detecting hostile posts in Hindi, the most widely spoken language in the Indian subcontinent and the third most spoken globally. We employ both binary (coarse-grained) and multi-class, multi-label (fine-grained) classification using contextual and semantic features. Our approach integrates various BERT-based embeddings with Relational Graph Convolutional Networks (R-GCN), forming a hybrid BRGCN architecture trained on the Constraint 2021 Hindi dataset. To enhance performance, we implement a hard voting-based ensemble classifier. The proposed model achieves superior F1-scores compared to existing baselines: 0.98 for coarse-grained classification and 0.84, 0.61, 0.49, and 0.64 for the fine-grained categories of Fake, Hate, Defamation, and Offensive, respectively. Code and data will be made publicly available in https://github.com/mani-design/B-RGCN . |
|---|---|
| AbstractList | Abstract Organizations and governments are struggling to handle the hostile content on social media sites ( $$Facebook^{TM}$$ , $$Twitter^{TM}$$ , etc.). While extensive research exists for English-language content, regional languages like Hindi lack robust tools and datasets for effective moderation. This study proposes a scalable AI-based framework for detecting hostile posts in Hindi, the most widely spoken language in the Indian subcontinent and the third most spoken globally. We employ both binary (coarse-grained) and multi-class, multi-label (fine-grained) classification using contextual and semantic features. Our approach integrates various BERT-based embeddings with Relational Graph Convolutional Networks (R-GCN), forming a hybrid BRGCN architecture trained on the Constraint 2021 Hindi dataset. To enhance performance, we implement a hard voting-based ensemble classifier. The proposed model achieves superior F1-scores compared to existing baselines: 0.98 for coarse-grained classification and 0.84, 0.61, 0.49, and 0.64 for the fine-grained categories of Fake, Hate, Defamation, and Offensive, respectively. Code and data will be made publicly available in https://github.com/mani-design/B-RGCN . |
| Author | Arif Ahmed Sekh Angana Chakraborty Dilip K. Prasad Subhankar Joardar |
| Author_xml | – sequence: 1 fullname: Angana Chakraborty organization: Department of Computer Science and Engineering, Haldia Institute of Technology – sequence: 2 fullname: Subhankar Joardar organization: School of Computer Science, Electronics and Informatics, Haldia Institute of Technology – sequence: 3 fullname: Dilip K. Prasad organization: UiT: The Arctic University of Norway – sequence: 4 fullname: Arif Ahmed Sekh organization: UiT: The Arctic University of Norway |
| BookMark | eNotzMFKAzEUQNEgCtbaH3A1PxBNXpImWUrRtlDoRtfDS_KmjYxJmYkL_15QVxfO4t6x61ILMfYgxaMUwj7NUlgvuQDDhbdecHHFFuC14-A93LLVPOcgjLIK1kIsmN1OeDnzgDOl7lznlkfqYi2NSusSNYot19Ll0u1ySbkbsZy-8ET37GbAcabVf5fs_fXlbbPjh-N2v3k-8KgcNK70OilErcBFEGlIUkYkmZQlHYIHkyjZuA5Jo3FaIUWwZAczGEKw3qgl2_99U8WP_jLlT5y--4q5_4U6nXqcWo4j9do4EfQgCVDp6MFJr8h6DCBNGrRTP0gSVnQ |
| ContentType | Journal Article |
| DBID | DOA |
| DOI | 10.1007/s10791-025-09790-0 |
| DatabaseName | Directory of Open Access Journals (DOAJ) |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
| DeliveryMethod | fulltext_linktorsrc |
| EISSN | 2948-2992 |
| EndPage | 23 |
| ExternalDocumentID | oai_doaj_org_article_4580b4f1e2a34c928193e79ab215df48 |
| GroupedDBID | AAJSJ AASML ABDBE AEFQL ALMA_UNASSIGNED_HOLDINGS EBLON GROUPED_DOAJ JZLTJ SOJ |
| ID | FETCH-LOGICAL-c382t-346d3aa4328c20dfd11cae1d37e4bb925ded7c6bd4a5843aec27e7f5f5ea27953 |
| IEDL.DBID | DOA |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001613811200002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Mon Nov 17 19:34:54 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c382t-346d3aa4328c20dfd11cae1d37e4bb925ded7c6bd4a5843aec27e7f5f5ea27953 |
| OpenAccessLink | https://doaj.org/article/4580b4f1e2a34c928193e79ab215df48 |
| PageCount | 23 |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_4580b4f1e2a34c928193e79ab215df48 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-11-13 |
| PublicationDateYYYYMMDD | 2025-11-13 |
| PublicationDate_xml | – month: 11 year: 2025 text: 2025-11-13 day: 13 |
| PublicationDecade | 2020 |
| PublicationTitle | Discover Computing |
| PublicationYear | 2025 |
| Publisher | Springer |
| Publisher_xml | – name: Springer |
| SSID | ssib053732600 |
| Score | 2.402742 |
| Snippet | Abstract Organizations and governments are struggling to handle the hostile content on social media sites ( $$Facebook^{TM}$$ , $$Twitter^{TM}$$ , etc.). While... |
| SourceID | doaj |
| SourceType | Open Website |
| StartPage | 1 |
| SubjectTerms | BERT Hindi Hostility detection Natural language processing R-GCN Social networking |
| Title | Graph-based hostile content detection in Hindi language |
| URI | https://doaj.org/article/4580b4f1e2a34c928193e79ab215df48 |
| Volume | 28 |
| WOSCitedRecordID | wos001613811200002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV09TwQhECXGWNgYjRq_Q2FLZBlYllKN5xXmYqHJdRtgILlmNefq75fZXZOzsrGlgPCAzAzz5g1j1yEaCMHVwspcAhTS-3QhZyE9BC8j1DEN6vpPdrFolkv3vNHqizhhozzwCNyNNo0MOldJedDRUd4HknU-FFuFWQ9lvtK6jWCq3CQDFkh5faqSmWrlLJF8lBHSWSeF_KXSP5iT2T7bm_xAfjuuf8C2UnfI7CPJRwuyLMip_KI8WU5k8mIZOKZ-oE11fNXxOaWa-c9n4xF7nT283M_F1NlARGhUL0DXCN5rUE1UEjNWVfSpQrBJF-SUwYQ21gG1Lw4C-BSVTTabbJJX1hk4ZtvdW5dOGNcGTZCodKpRuxyDx9pjYxpnQgVenrI72mX7PopXtCQnPQwUkNsJ5PYvkM_-Y5JztqsIfuLQwQXb7tef6ZLtxK9-9bG-Gs7vGwHtn_0 |
| linkProvider | Directory of Open Access Journals |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Graph-based+hostile+content+detection+in+Hindi+language&rft.jtitle=Discover+Computing&rft.au=Angana+Chakraborty&rft.au=Subhankar+Joardar&rft.au=Dilip+K.+Prasad&rft.au=Arif+Ahmed+Sekh&rft.date=2025-11-13&rft.pub=Springer&rft.eissn=2948-2992&rft.volume=28&rft.issue=1&rft.spage=1&rft.epage=23&rft_id=info:doi/10.1007%2Fs10791-025-09790-0&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_4580b4f1e2a34c928193e79ab215df48 |