I/O-efficient algorithms for top-k nearest keyword search in massive graphs
Networks emerging nowadays usually have labels or textual content on the nodes. We model such commonly seen network as an undirected graph G , in which each node is attached with zero or more keywords, and each edge is assigned with a length. On such networks, a novel and useful query is called top-...
Uloženo v:
| Vydáno v: | The VLDB journal Ročník 26; číslo 4; s. 563 - 583 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Berlin/Heidelberg
Springer Berlin Heidelberg
01.08.2017
Springer Nature B.V |
| Témata: | |
| ISSN: | 1066-8888, 0949-877X |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Networks emerging nowadays usually have labels or textual content on the nodes. We model such commonly seen network as an undirected graph
G
, in which each node is attached with zero or more keywords, and each edge is assigned with a length. On such networks, a novel and useful query is called
top-k nearest keyword
(
k
-
NK
) search. Given a query node
q
in
G
and a keyword
λ
, a
k
-
NK
query searches
k
nodes which contain
λ
and are nearest to
q
. The
k
-
NK
problem has been studied recently in the literature. But most existing solutions assume that the graph as well as the constructed index can fit entirely in memory. As a result, they cannot be applied directly to very large-scale networks which are commonly found in practice, but cannot fit in memory. In this work, we design an I/O-efficient solution, which uses a compact disk index to answer a
k
-
NK
query with constant I/Os. The key to an accurate
k
-
NK
result is a precise shortest distance estimation in a graph. In our solution, we follow our previous work Qiao et al. (PVLDB 6:901–912,
2013
) which uses the shortest path tree as an approximate representation of a graph and uses the tree distance between two nodes as an accurate estimation of the shortest distance between them on a graph. With such representation, the original
k
-
NK
query on a graph can be reduced to answering the query on a set of trees and then assembling the results obtained from the trees. We exploit a compact tree-based index and study how to lay out the index to disk. We design a novel technique which decomposes the index tree into paths and subtrees and stores them in disk. Our theoretical analysis shows that the disk-based index is small in size and supports constant query I/Os. Extensive experimental study on massive trees and graphs with billions of edges and keywords verifies our theoretical findings and demonstrates the superiority of our method over the state-of-the-art methods in the literature. |
|---|---|
| Bibliografie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1066-8888 0949-877X |
| DOI: | 10.1007/s00778-017-0464-7 |