Parallel methods for the update of partitioned inverted files

Purpose - An issue that tends to be ignored in information retrieval is the issue of updating inverted files. This is largely because inverted files were devised to provide fast query service, and much work has been done with the emphasis strongly on queries. This paper aims to study the effect of u...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Aslib proceedings Ročník 59; číslo 4/5; s. 367 - 396
Hlavní autoři: MacFarlane, A., McCann, J.A., Robertson, S.E.
Médium: Journal Article
Jazyk:angličtina
Vydáno: Bradford Emerald Group Publishing Limited 12.07.2007
Témata:
ISSN:0001-253X, 2050-3806, 1758-3748
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Purpose - An issue that tends to be ignored in information retrieval is the issue of updating inverted files. This is largely because inverted files were devised to provide fast query service, and much work has been done with the emphasis strongly on queries. This paper aims to study the effect of using parallel methods for the update of inverted files in order to reduce costs, by looking at two types of partitioning for inverted files: document identifier and term identifier.Design methodology approach - Raw update service and update with query service are studied with these partitioning schemes using an incremental update strategy. The paper uses standard measures used in parallel computing such as speedup to examine the computing results and also the costs of reorganising indexes while servicing transactions.Findings - Empirical results show that for both transaction processing and index reorganisation the document identifier method is superior. However, there is evidence that the term identifier partitioning method could be useful in a concurrent transaction processing context.Practical implications - There is an increasing need to service updates, which is now becoming a requirement of inverted files (for dynamic collections such as the web), demonstrating that a shift in requirements of inverted file maintenance is needed from the past.Originality value - The paper is of value to database administrators who manage large-scale and dynamic text collections, and who need to use parallel computing to implement their text retrieval services.
Bibliografie:ark:/67375/4W2-JP3Z6WNW-G
original-pdf:2760590406.pdf
filenameID:2760590406
istex:9653DAF4E64D74F6DB1532F543B7BC16453E05FD
href:00012530710817582.pdf
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
ObjectType-Article-2
ISSN:0001-253X
2050-3806
1758-3748
DOI:10.1108/00012530710817582