Effectively Detecting Content Spam on the Web Using Topical Diversity Measures
Recent studies about web spam detection have utilized various content-based and link-based features to construct a spam classification model. In this paper, we conduct a thorough analysis of content spam on the web using topic models and propose several novel topical diversity measures for content s...
Uložené v:
| Vydané v: | 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology Ročník 1; s. 266 - 273 |
|---|---|
| Hlavní autori: | , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
01.12.2012
|
| Predmet: | |
| ISBN: | 9781467360579, 1467360570 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | Recent studies about web spam detection have utilized various content-based and link-based features to construct a spam classification model. In this paper, we conduct a thorough analysis of content spam on the web using topic models and propose several novel topical diversity measures for content spam detection. We adopt the web spam benchmark data set WEBSPAM-UK2007 for evaluation, and the experimental results verify that by integrating our topical diversity measures the performance of the state-of-the-art web spam detection methods can be greatly improved. In addition, comparing to existing features for training spam classification models, our topical diversity measures can achieve high spam detection performance using small set of training data. In personalized web spam detection, the training data (i.e., user's spam labeling results) are typically small. Our finding makes personalized web spam detection highly achievable. We develop an efficient and effective regression model using topical diversity measures for personalized web spam detection, and present some promising results obtained from an empirical study. |
|---|---|
| ISBN: | 9781467360579 1467360570 |
| DOI: | 10.1109/WI-IAT.2012.98 |

