I/O-efficient algorithms for top-k nearest keyword search in massive graphs

Networks emerging nowadays usually have labels or textual content on the nodes. We model such commonly seen network as an undirected graph G , in which each node is attached with zero or more keywords, and each edge is assigned with a length. On such networks, a novel and useful query is called top-...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	The VLDB journal Ročník 26; číslo 4; s. 563 - 583
Hlavní autoři:	Zhu, Qiankun, Cheng, Hong, Huang, Xin
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Berlin/Heidelberg Springer Berlin Heidelberg 01.08.2017 Springer Nature B.V
Témata:	Computer Science Database Management Graphs Keywords Networks Regular Paper Shortest-path problems Stores Studies Trees Well construction Nearest keywords search I/O-efficient algorithms Top Massive graphs
ISSN:	1066-8888, 0949-877X
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Networks emerging nowadays usually have labels or textual content on the nodes. We model such commonly seen network as an undirected graph G , in which each node is attached with zero or more keywords, and each edge is assigned with a length. On such networks, a novel and useful query is called top-k nearest keyword ( k - NK ) search. Given a query node q in G and a keyword λ , a k - NK query searches k nodes which contain λ and are nearest to q . The k - NK problem has been studied recently in the literature. But most existing solutions assume that the graph as well as the constructed index can fit entirely in memory. As a result, they cannot be applied directly to very large-scale networks which are commonly found in practice, but cannot fit in memory. In this work, we design an I/O-efficient solution, which uses a compact disk index to answer a k - NK query with constant I/Os. The key to an accurate k - NK result is a precise shortest distance estimation in a graph. In our solution, we follow our previous work Qiao et al. (PVLDB 6:901–912, 2013 ) which uses the shortest path tree as an approximate representation of a graph and uses the tree distance between two nodes as an accurate estimation of the shortest distance between them on a graph. With such representation, the original k - NK query on a graph can be reduced to answering the query on a set of trees and then assembling the results obtained from the trees. We exploit a compact tree-based index and study how to lay out the index to disk. We design a novel technique which decomposes the index tree into paths and subtrees and stores them in disk. Our theoretical analysis shows that the disk-based index is small in size and supports constant query I/Os. Extensive experimental study on massive trees and graphs with billions of edges and keywords verifies our theoretical findings and demonstrates the superiority of our method over the state-of-the-art methods in the literature.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1066-8888 0949-877X
DOI:	10.1007/s00778-017-0464-7