Graph-based hostile content detection in Hindi language
Abstract Organizations and governments are struggling to handle the hostile content on social media sites ( $$Facebook^{TM}$$ , $$Twitter^{TM}$$ , etc.). While extensive research exists for English-language content, regional languages like Hindi lack robust tools and datasets for effective moderatio...
Saved in:
| Published in: | Discover Computing Vol. 28; no. 1; pp. 1 - 23 |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Springer
13.11.2025
|
| Subjects: | |
| ISSN: | 2948-2992 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Organizations and governments are struggling to handle the hostile content on social media sites ( $$Facebook^{TM}$$ , $$Twitter^{TM}$$ , etc.). While extensive research exists for English-language content, regional languages like Hindi lack robust tools and datasets for effective moderation. This study proposes a scalable AI-based framework for detecting hostile posts in Hindi, the most widely spoken language in the Indian subcontinent and the third most spoken globally. We employ both binary (coarse-grained) and multi-class, multi-label (fine-grained) classification using contextual and semantic features. Our approach integrates various BERT-based embeddings with Relational Graph Convolutional Networks (R-GCN), forming a hybrid BRGCN architecture trained on the Constraint 2021 Hindi dataset. To enhance performance, we implement a hard voting-based ensemble classifier. The proposed model achieves superior F1-scores compared to existing baselines: 0.98 for coarse-grained classification and 0.84, 0.61, 0.49, and 0.64 for the fine-grained categories of Fake, Hate, Defamation, and Offensive, respectively. Code and data will be made publicly available in https://github.com/mani-design/B-RGCN . |
|---|---|
| ISSN: | 2948-2992 |
| DOI: | 10.1007/s10791-025-09790-0 |