SimilCatch: Enhanced social spammers detection on Twitter using Markov Random Fields

highlights•Social spam evolution is leading to a marked deterioration in the performance of state-of-the-art supervised classifiers.•The Markov Random Fields formalism allows a hybrid social spam detection model that exploits both users features and their content-based similarity.•Users content can...

Full description

Saved in:
Bibliographic Details
Published in:Information processing & management Vol. 57; no. 6; p. 102317
Main Authors: El-Mawass, Nour, Honeine, Paul, Vercouter, Laurent
Format: Journal Article
Language:English
Published: Elsevier Ltd 01.11.2020
Elsevier
Subjects:
ISSN:0306-4573, 1873-5371
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:highlights•Social spam evolution is leading to a marked deterioration in the performance of state-of-the-art supervised classifiers.•The Markov Random Fields formalism allows a hybrid social spam detection model that exploits both users features and their content-based similarity.•Users content can be exploited to define robust similarity measures.•Biased and inaccurate prior predictions on users classes can be effectively used in the context of probabilistic graphical models as demonstrated by the significant increase in recall obtained by the proposed approach. The problem of social spam detection has been traditionally modeled as a supervised classification problem. Despite the initial success of this detection approach, later analysis of proposed systems and detection features has shown that, like email spam, the dynamic and adversarial nature of social spam makes the performance achieved by supervised systems hard to maintain. In this paper, we investigate the possibility of using the output of previously proposed supervised classification systems as a tool for spammers discovery. The hypothesis is that these systems are still highly capable of detecting spammers reliably even when their recall is far from perfect. We then propose to use the output of these classifiers as prior beliefs in a probabilistic graphical model framework. This framework allows beliefs to be propagated to similar social accounts. Basing similarity on a who-connects-to-whom network has been empirically critiqued in recent literature and we propose here an alternative definition based on a bipartite users-content interaction graph. For evaluation, we build a Markov Random Field on a graph of similar users and compute prior beliefs using a selection of state-of-the-art classifiers. We apply Loopy Belief Propagation to obtain posterior predictions on users. The proposed system is evaluated on a recent Twitter dataset that we collected and manually labeled. Classification results show a significant increase in recall and a maintained precision. This validates that formulating the detection problem with an undirected graphical model framework permits to restore the deteriorated performances of previously proposed statistical classifiers and to effectively mitigate the effect of spam evolution.
ISSN:0306-4573
1873-5371
DOI:10.1016/j.ipm.2020.102317