Reliable Keyword Query Interpretation on Summary Graphs
The semantic gap between keyword queries and search intents behind them motivates intensive studies on keyword query interpretation, which aims to interpret a keyword query to structured queries (a.k.a. patterns) representing most possibly relevant search intents. However, there still lacks of study...
Saved in:
| Published in: | IEEE transactions on knowledge and data engineering Vol. 35; no. 5; pp. 5187 - 5202 |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York
IEEE
01.05.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 1041-4347, 1558-2191 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The semantic gap between keyword queries and search intents behind them motivates intensive studies on keyword query interpretation, which aims to interpret a keyword query to structured queries (a.k.a. patterns) representing most possibly relevant search intents. However, there still lacks of study on an important issue: how to guarantee the patterns are "reliable", which means the structured queries can be evaluated as really existing results. In this paper, we regard the reliability as a new metric for ranking patterns, and present a keyword query interpretation approach to find both reliable and relevant pattern trees on an arbitrary summary graph of underlying data. Specifically, we first propose a reliability estimation model to measure how possibly a pattern tree can be evaluated as a nonempty result set by statistics under reasonable assumptions. Second, we develop constrained top-<inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="zhong-ieq1-3144001.gif"/> </inline-formula> search algorithms that guarantee to return the optimal pattern trees for a specific keyword query. Moreover, to improve the efficiency of online search, we also design elaborate indexes, search heuristics and pruning strategies. Lastly, we perform comprehensive experiments on two real-world datasets, DBpedia and Yago, with both QALD-9 queries and random queries. The observations indicate our approach improves the accuracy and overall quality of top-<inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="zhong-ieq2-3144001.gif"/> </inline-formula> results significantly. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1041-4347 1558-2191 |
| DOI: | 10.1109/TKDE.2022.3144001 |