A Distributed Memory Algorithm for Lexicon Building
A parallel algorithm for preparing word frequency concordances over two specified sets of documents from a collection is presented. Good parallel efficiency is demonstrated on a 128-node distributed memory machine using sets whose combined size exceeds one gigabyte. It is demonstrated that efficienc...
Uloženo v:
| Vydáno v: | Journal of parallel and distributed computing Ročník 44; číslo 1; s. 80 - 87 |
|---|---|
| Hlavní autor: | |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
San Diego, CA
Elsevier Inc
10.07.1997
Elsevier |
| Témata: | |
| ISSN: | 0743-7315, 1096-0848 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | A parallel algorithm for preparing word frequency concordances over two specified sets of documents from a collection is presented. Good parallel efficiency is demonstrated on a 128-node distributed memory machine using sets whose combined size exceeds one gigabyte. It is demonstrated that efficiency is heavily influenced by hashing and communication strategies. A two-stage hashing algorithm is proposed to reduce communication overhead. Ways of increasing capacity are considered, and the applicability of the algorithm to other text-processing functions such as index and symbol-table building is outlined. |
|---|---|
| ISSN: | 0743-7315 1096-0848 |
| DOI: | 10.1006/jpdc.1997.1344 |