Graph-based hostile content detection in Hindi language

Abstract Organizations and governments are struggling to handle the hostile content on social media sites ( $$Facebook^{TM}$$ , $$Twitter^{TM}$$ , etc.). While extensive research exists for English-language content, regional languages like Hindi lack robust tools and datasets for effective moderatio...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Discover Computing Ročník 28; číslo 1; s. 1 - 23
Hlavní autori: Angana Chakraborty, Subhankar Joardar, Dilip K. Prasad, Arif Ahmed Sekh
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Springer 13.11.2025
Predmet:
ISSN:2948-2992
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Abstract Organizations and governments are struggling to handle the hostile content on social media sites ( $$Facebook^{TM}$$ , $$Twitter^{TM}$$ , etc.). While extensive research exists for English-language content, regional languages like Hindi lack robust tools and datasets for effective moderation. This study proposes a scalable AI-based framework for detecting hostile posts in Hindi, the most widely spoken language in the Indian subcontinent and the third most spoken globally. We employ both binary (coarse-grained) and multi-class, multi-label (fine-grained) classification using contextual and semantic features. Our approach integrates various BERT-based embeddings with Relational Graph Convolutional Networks (R-GCN), forming a hybrid BRGCN architecture trained on the Constraint 2021 Hindi dataset. To enhance performance, we implement a hard voting-based ensemble classifier. The proposed model achieves superior F1-scores compared to existing baselines: 0.98 for coarse-grained classification and 0.84, 0.61, 0.49, and 0.64 for the fine-grained categories of Fake, Hate, Defamation, and Offensive, respectively. Code and data will be made publicly available in https://github.com/mani-design/B-RGCN .
ISSN:2948-2992
DOI:10.1007/s10791-025-09790-0