Reliable Keyword Query Interpretation on Summary Graphs

The semantic gap between keyword queries and search intents behind them motivates intensive studies on keyword query interpretation, which aims to interpret a keyword query to structured queries (a.k.a. patterns) representing most possibly relevant search intents. However, there still lacks of study...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on knowledge and data engineering Vol. 35; no. 5; pp. 5187 - 5202
Main Authors: Zhong, Ming, Zheng, Yingyi, Xue, Guotong, Liu, Mengchi
Format: Journal Article
Language:English
Published: New York IEEE 01.05.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:1041-4347, 1558-2191
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The semantic gap between keyword queries and search intents behind them motivates intensive studies on keyword query interpretation, which aims to interpret a keyword query to structured queries (a.k.a. patterns) representing most possibly relevant search intents. However, there still lacks of study on an important issue: how to guarantee the patterns are "reliable", which means the structured queries can be evaluated as really existing results. In this paper, we regard the reliability as a new metric for ranking patterns, and present a keyword query interpretation approach to find both reliable and relevant pattern trees on an arbitrary summary graph of underlying data. Specifically, we first propose a reliability estimation model to measure how possibly a pattern tree can be evaluated as a nonempty result set by statistics under reasonable assumptions. Second, we develop constrained top-<inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="zhong-ieq1-3144001.gif"/> </inline-formula> search algorithms that guarantee to return the optimal pattern trees for a specific keyword query. Moreover, to improve the efficiency of online search, we also design elaborate indexes, search heuristics and pruning strategies. Lastly, we perform comprehensive experiments on two real-world datasets, DBpedia and Yago, with both QALD-9 queries and random queries. The observations indicate our approach improves the accuracy and overall quality of top-<inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="zhong-ieq2-3144001.gif"/> </inline-formula> results significantly.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1041-4347
1558-2191
DOI:10.1109/TKDE.2022.3144001