Extraction and classification of dense implicit communities in the Web graph
Saved in:
| Title: | Extraction and classification of dense implicit communities in the Web graph |
|---|---|
| Authors: | Dourisboure Y, Geraci F, Pellegrini M |
| Source: | ACM transactions on the web 3 (2009). doi:10.1145/1513876.1513879 info:cnr-pdr/source/autori:Dourisboure Y.; Geraci F.; Pellegrini M./titolo:Extraction and classification of dense implicit communities in the Web Graph/doi:10.1145%2F1513876.1513879/rivista:ACM transactions on the web/anno:2009/pagina_da:/pagina_a:/intervallo_pagine:/volume:3 |
| Publisher Information: | Association for Computing Machinery (ACM), 2009. |
| Publication Year: | 2009 |
| Subject Terms: | communities, H.2.8 Database applications. Clustering, web graph, detection of dense subgraph, 0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology, F.2.2 Nonnumerical Algorithms and Problems. Computations on discrete structures |
| Description: | The World Wide Web (WWW) is rapidly becoming important for society as a medium for sharing data, information, and services, and there is a growing interest in tools for understanding collective behavior and emerging phenomena in the WWW. In this article we focus on the problem of searching and classifying communities in the Web. Loosely speaking a community is a group of pages related to a common interest. More formally, communities have been associated in the computer science literature with the existence of a locally dense subgraph of the Web graph (where Web pages are nodes and hyperlinks are arcs of the Web graph). The core of our contribution is a new scalable algorithm for finding relatively dense subgraphs in massive graphs. We apply our algorithm on Web graphs built on three publicly available large crawls of the Web (with raw sizes up to 120M nodes and 1G arcs). The effectiveness of our algorithm in finding dense subgraphs is demonstrated experimentally by embedding artificial communities in the Web graph and counting how many of these are blindly found. Effectiveness increases with the size and density of the communities: it is close to 100% for communities of thirty nodes or more (even at low density). It is still about 80% even for communities of twenty nodes with density over 50% of the arcs present. At the lower extremes the algorithm catches 35% of dense communities made of ten nodes. We also develop some sufficient conditions for the detection of a community under some local graph models and not-too-restrictive hypotheses. We complete our Community Watch system by clustering the communities found in the Web graph into homogeneous groups by topic and labeling each group by representative keywords. |
| Document Type: | Article |
| Language: | English |
| ISSN: | 1559-114X 1559-1131 |
| DOI: | 10.1145/1513876.1513879 |
| Access URL: | http://wwwold.iit.cnr.it/staff/marco.pellegrini/papiri/jv-cybercom-draft2.pdf https://dl.acm.org/doi/10.1145/1513876.1513879 https://dblp.uni-trier.de/db/journals/tweb/tweb3.html#DourisboureGP09 https://doi.org/10.1145/1513876.1513879 http://wwwold.iit.cnr.it/staff/marco.pellegrini/papiri/jv-cybercom-draft2.pdf https://hdl.handle.net/20.500.14243/46197 https://doi.org/10.1145/1513876.1513879 |
| Rights: | URL: https://www.acm.org/publications/policies/copyright_policy#Background |
| Accession Number: | edsair.doi.dedup.....2d9fb6b0f62f23f62b40e83deefafc96 |
| Database: | OpenAIRE |
| Abstract: | The World Wide Web (WWW) is rapidly becoming important for society as a medium for sharing data, information, and services, and there is a growing interest in tools for understanding collective behavior and emerging phenomena in the WWW. In this article we focus on the problem of searching and classifying communities in the Web. Loosely speaking a community is a group of pages related to a common interest. More formally, communities have been associated in the computer science literature with the existence of a locally dense subgraph of the Web graph (where Web pages are nodes and hyperlinks are arcs of the Web graph). The core of our contribution is a new scalable algorithm for finding relatively dense subgraphs in massive graphs. We apply our algorithm on Web graphs built on three publicly available large crawls of the Web (with raw sizes up to 120M nodes and 1G arcs). The effectiveness of our algorithm in finding dense subgraphs is demonstrated experimentally by embedding artificial communities in the Web graph and counting how many of these are blindly found. Effectiveness increases with the size and density of the communities: it is close to 100% for communities of thirty nodes or more (even at low density). It is still about 80% even for communities of twenty nodes with density over 50% of the arcs present. At the lower extremes the algorithm catches 35% of dense communities made of ten nodes. We also develop some sufficient conditions for the detection of a community under some local graph models and not-too-restrictive hypotheses. We complete our Community Watch system by clustering the communities found in the Web graph into homogeneous groups by topic and labeling each group by representative keywords. |
|---|---|
| ISSN: | 1559114X 15591131 |
| DOI: | 10.1145/1513876.1513879 |
Full Text Finder
Nájsť tento článok vo Web of Science