A Distributed Memory Algorithm for Lexicon Building

A parallel algorithm for preparing word frequency concordances over two specified sets of documents from a collection is presented. Good parallel efficiency is demonstrated on a 128-node distributed memory machine using sets whose combined size exceeds one gigabyte. It is demonstrated that efficienc...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of parallel and distributed computing Ročník 44; číslo 1; s. 80 - 87
Hlavní autor: Hawking, David
Médium: Journal Article
Jazyk:angličtina
Vydáno: San Diego, CA Elsevier Inc 10.07.1997
Elsevier
Témata:
ISSN:0743-7315, 1096-0848
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:A parallel algorithm for preparing word frequency concordances over two specified sets of documents from a collection is presented. Good parallel efficiency is demonstrated on a 128-node distributed memory machine using sets whose combined size exceeds one gigabyte. It is demonstrated that efficiency is heavily influenced by hashing and communication strategies. A two-stage hashing algorithm is proposed to reduce communication overhead. Ways of increasing capacity are considered, and the applicability of the algorithm to other text-processing functions such as index and symbol-table building is outlined.
ISSN:0743-7315
1096-0848
DOI:10.1006/jpdc.1997.1344