I/O-efficient algorithms for top-k nearest keyword search in massive graphs

Networks emerging nowadays usually have labels or textual content on the nodes. We model such commonly seen network as an undirected graph G , in which each node is attached with zero or more keywords, and each edge is assigned with a length. On such networks, a novel and useful query is called top-...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:The VLDB journal Ročník 26; číslo 4; s. 563 - 583
Hlavní autoři: Zhu, Qiankun, Cheng, Hong, Huang, Xin
Médium: Journal Article
Jazyk:angličtina
Vydáno: Berlin/Heidelberg Springer Berlin Heidelberg 01.08.2017
Springer Nature B.V
Témata:
ISSN:1066-8888, 0949-877X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Networks emerging nowadays usually have labels or textual content on the nodes. We model such commonly seen network as an undirected graph G , in which each node is attached with zero or more keywords, and each edge is assigned with a length. On such networks, a novel and useful query is called top-k nearest keyword ( k - NK ) search. Given a query node q in G and a keyword λ , a k - NK query searches k nodes which contain λ and are nearest to q . The k - NK problem has been studied recently in the literature. But most existing solutions assume that the graph as well as the constructed index can fit entirely in memory. As a result, they cannot be applied directly to very large-scale networks which are commonly found in practice, but cannot fit in memory. In this work, we design an I/O-efficient solution, which uses a compact disk index to answer a k - NK query with constant I/Os. The key to an accurate k - NK result is a precise shortest distance estimation in a graph. In our solution, we follow our previous work Qiao et al. (PVLDB 6:901–912,  2013 ) which uses the shortest path tree as an approximate representation of a graph and uses the tree distance between two nodes as an accurate estimation of the shortest distance between them on a graph. With such representation, the original k - NK query on a graph can be reduced to answering the query on a set of trees and then assembling the results obtained from the trees. We exploit a compact tree-based index and study how to lay out the index to disk. We design a novel technique which decomposes the index tree into paths and subtrees and stores them in disk. Our theoretical analysis shows that the disk-based index is small in size and supports constant query I/Os. Extensive experimental study on massive trees and graphs with billions of edges and keywords verifies our theoretical findings and demonstrates the superiority of our method over the state-of-the-art methods in the literature.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1066-8888
0949-877X
DOI:10.1007/s00778-017-0464-7