A Distributed Memory Algorithm for Lexicon Building

A parallel algorithm for preparing word frequency concordances over two specified sets of documents from a collection is presented. Good parallel efficiency is demonstrated on a 128-node distributed memory machine using sets whose combined size exceeds one gigabyte. It is demonstrated that efficienc...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of parallel and distributed computing Jg. 44; H. 1; S. 80 - 87
1. Verfasser:	Hawking, David
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	San Diego, CA Elsevier Inc 10.07.1997 Elsevier
Schlagworte:	Algorithmics. Computability. Computer arithmetics Applied sciences Computer science; control theory; systems Computer systems and distributed systems. User interface Exact sciences and technology Information systems. Data bases Memory organisation. Data processing Software Theoretical computing System architecture Distributed memory multiprocessor system Algorithm performance Information system Hashing Experimental study Implementation Communication Document processing
ISSN:	0743-7315, 1096-0848
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A parallel algorithm for preparing word frequency concordances over two specified sets of documents from a collection is presented. Good parallel efficiency is demonstrated on a 128-node distributed memory machine using sets whose combined size exceeds one gigabyte. It is demonstrated that efficiency is heavily influenced by hashing and communication strategies. A two-stage hashing algorithm is proposed to reduce communication overhead. Ways of increasing capacity are considered, and the applicability of the algorithm to other text-processing functions such as index and symbol-table building is outlined.
ISSN:	0743-7315 1096-0848
DOI:	10.1006/jpdc.1997.1344