An Improved Algorithm for Fast K-Word Proximity Search Based on Multi-Component Key Indexes

A search query consists of several words. In a proximity full-text search, we want to find documents that contain these words near each other. This task requires much time when the query consists of high-frequently occurring words. If we cannot avoid this task by excluding high-frequently occurring...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org
Main Author: Veretennikov, Alexander B
Format: Paper
Language:English
Published: Ithaca Cornell University Library, arXiv.org 06.09.2020
Subjects:
ISSN:2331-8422
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A search query consists of several words. In a proximity full-text search, we want to find documents that contain these words near each other. This task requires much time when the query consists of high-frequently occurring words. If we cannot avoid this task by excluding high-frequently occurring words from consideration by declaring them as stop words, then we can optimize our solution by introducing additional indexes for faster execution. In a previous work, we discussed how to decrease the search time with multi-component key indexes. We had shown that additional indexes can be used to improve the average query execution time up to 130 times if queries consisted of high-frequently occurring words. In this paper, we present another search algorithm that overcomes some limitations of our previous algorithm and provides even more performance gain. This is a pre-print of a contribution published in Arai K., Kapoor S., Bhatia R. (eds) Intelligent Systems and Applications. IntelliSys 2020. Advances in Intelligent Systems and Computing, vol 1251, published by Springer, Cham. The final authenticated version is available online at: https://doi.org/10.1007/978-3-030-55187-2_37
Bibliography:SourceType-Working Papers-1
ObjectType-Working Paper/Pre-Print-1
content type line 50
ISSN:2331-8422
DOI:10.48550/arxiv.2009.02684