A Distributed Memory Algorithm for Lexicon Building

A parallel algorithm for preparing word frequency concordances over two specified sets of documents from a collection is presented. Good parallel efficiency is demonstrated on a 128-node distributed memory machine using sets whose combined size exceeds one gigabyte. It is demonstrated that efficienc...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Journal of parallel and distributed computing Ročník 44; číslo 1; s. 80 - 87
Hlavný autor:	Hawking, David
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	San Diego, CA Elsevier Inc 10.07.1997 Elsevier
Predmet:	Algorithmics. Computability. Computer arithmetics Applied sciences Computer science; control theory; systems Computer systems and distributed systems. User interface Exact sciences and technology Information systems. Data bases Memory organisation. Data processing Software Theoretical computing System architecture Distributed memory multiprocessor system Algorithm performance Information system Hashing Experimental study Implementation Communication Document processing
ISSN:	0743-7315, 1096-0848
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	A parallel algorithm for preparing word frequency concordances over two specified sets of documents from a collection is presented. Good parallel efficiency is demonstrated on a 128-node distributed memory machine using sets whose combined size exceeds one gigabyte. It is demonstrated that efficiency is heavily influenced by hashing and communication strategies. A two-stage hashing algorithm is proposed to reduce communication overhead. Ways of increasing capacity are considered, and the applicability of the algorithm to other text-processing functions such as index and symbol-table building is outlined.
ISSN:	0743-7315 1096-0848
DOI:	10.1006/jpdc.1997.1344