A Distributed Memory Algorithm for Lexicon Building

A parallel algorithm for preparing word frequency concordances over two specified sets of documents from a collection is presented. Good parallel efficiency is demonstrated on a 128-node distributed memory machine using sets whose combined size exceeds one gigabyte. It is demonstrated that efficienc...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Journal of parallel and distributed computing Ročník 44; číslo 1; s. 80 - 87
Hlavní autor:	Hawking, David
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	San Diego, CA Elsevier Inc 10.07.1997 Elsevier
Témata:	Algorithmics. Computability. Computer arithmetics Applied sciences Computer science; control theory; systems Computer systems and distributed systems. User interface Exact sciences and technology Information systems. Data bases Memory organisation. Data processing Software Theoretical computing System architecture Distributed memory multiprocessor system Algorithm performance Information system Hashing Experimental study Implementation Communication Document processing
ISSN:	0743-7315, 1096-0848
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	A parallel algorithm for preparing word frequency concordances over two specified sets of documents from a collection is presented. Good parallel efficiency is demonstrated on a 128-node distributed memory machine using sets whose combined size exceeds one gigabyte. It is demonstrated that efficiency is heavily influenced by hashing and communication strategies. A two-stage hashing algorithm is proposed to reduce communication overhead. Ways of increasing capacity are considered, and the applicability of the algorithm to other text-processing functions such as index and symbol-table building is outlined.
ISSN:	0743-7315 1096-0848
DOI:	10.1006/jpdc.1997.1344