Effectively Detecting Content Spam on the Web Using Topical Diversity Measures

Recent studies about web spam detection have utilized various content-based and link-based features to construct a spam classification model. In this paper, we conduct a thorough analysis of content spam on the web using topic models and propose several novel topical diversity measures for content s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology Jg. 1; S. 266 - 273
Hauptverfasser: Dong, Cailing, Zhou, Bin
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 01.12.2012
Schlagworte:
ISBN:9781467360579, 1467360570
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Recent studies about web spam detection have utilized various content-based and link-based features to construct a spam classification model. In this paper, we conduct a thorough analysis of content spam on the web using topic models and propose several novel topical diversity measures for content spam detection. We adopt the web spam benchmark data set WEBSPAM-UK2007 for evaluation, and the experimental results verify that by integrating our topical diversity measures the performance of the state-of-the-art web spam detection methods can be greatly improved. In addition, comparing to existing features for training spam classification models, our topical diversity measures can achieve high spam detection performance using small set of training data. In personalized web spam detection, the training data (i.e., user's spam labeling results) are typically small. Our finding makes personalized web spam detection highly achievable. We develop an efficient and effective regression model using topical diversity measures for personalized web spam detection, and present some promising results obtained from an empirical study.
ISBN:9781467360579
1467360570
DOI:10.1109/WI-IAT.2012.98