A Distributed Memory Algorithm for Lexicon Building

A parallel algorithm for preparing word frequency concordances over two specified sets of documents from a collection is presented. Good parallel efficiency is demonstrated on a 128-node distributed memory machine using sets whose combined size exceeds one gigabyte. It is demonstrated that efficienc...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of parallel and distributed computing Ročník 44; číslo 1; s. 80 - 87
Hlavní autor: Hawking, David
Médium: Journal Article
Jazyk:angličtina
Vydáno: San Diego, CA Elsevier Inc 10.07.1997
Elsevier
Témata:
ISSN:0743-7315, 1096-0848
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract A parallel algorithm for preparing word frequency concordances over two specified sets of documents from a collection is presented. Good parallel efficiency is demonstrated on a 128-node distributed memory machine using sets whose combined size exceeds one gigabyte. It is demonstrated that efficiency is heavily influenced by hashing and communication strategies. A two-stage hashing algorithm is proposed to reduce communication overhead. Ways of increasing capacity are considered, and the applicability of the algorithm to other text-processing functions such as index and symbol-table building is outlined.
AbstractList A parallel algorithm for preparing word frequency concordances over two specified sets of documents from a collection is presented. Good parallel efficiency is demonstrated on a 128-node distributed memory machine using sets whose combined size exceeds one gigabyte. It is demonstrated that efficiency is heavily influenced by hashing and communication strategies. A two-stage hashing algorithm is proposed to reduce communication overhead. Ways of increasing capacity are considered, and the applicability of the algorithm to other text-processing functions such as index and symbol-table building is outlined.
Author Hawking, David
Author_xml – sequence: 1
  givenname: David
  surname: Hawking
  fullname: Hawking, David
  organization: Cooperative Research Centre for Advanced Computational Systems, Department of Computer Science, Australian National University, Canberra, Australian Capital Territory, 0200, Australia
BackLink http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=2841389$$DView record in Pascal Francis
BookMark eNp1kD1PwzAQQC1UJNrCypyBNeEcu7E9lvIpBbHAbKX2ubhK48pOEf33JCpiYzrp9N7p9GZk0oUOCbmmUFCA6na7t6agSomCMs7PyJSCqnKQXE7IFARnuWB0cUFmKW0BKF0IOSVsmd371Ee_PvRos1fchXjMlu0mRN9_7jIXYlbjtzehy-4OvrW-21ySc9e0Ca9-55x8PD68r57z-u3pZbWsc1MK6HPTlAIrVSow3JVKGOOgckpZNmydQ-kcXTPEkrIFuLW0oLBsXGUpcEUlsDm5Od3dN8k0rYtNZ3zS--h3TTzqUnLKpBqw4oSZGFKK6P4ICnoso8cyeiyjxzKDIE8CDs9_eYw6GY-dQesjml7b4P9TfwD92Gto
Cites_doi 10.1002/(SICI)1097-4571(199508)46:7<537::AID-ASI7>3.0.CO;2-P
10.1145/4078.4080
10.1016/0306-4573(91)90085-Z
10.6028/NIST.SP.500-236.overview
10.1002/(SICI)1097-4571(199012)41:8<581::AID-ASI4>3.0.CO;2-U
ContentType Journal Article
Copyright 1997 Academic Press
1997 INIST-CNRS
Copyright_xml – notice: 1997 Academic Press
– notice: 1997 INIST-CNRS
DBID AAYXX
CITATION
IQODW
DOI 10.1006/jpdc.1997.1344
DatabaseName CrossRef
Pascal-Francis
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
Applied Sciences
EISSN 1096-0848
EndPage 87
ExternalDocumentID 2841389
10_1006_jpdc_1997_1344
S0743731597913447
GroupedDBID --K
--M
-~X
.~1
1B1
1~.
1~5
29L
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
ABBOA
ABJNI
ABMAC
ABTAH
ABYKQ
ACDAQ
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADFGL
ADJOM
ADMUD
ADTZH
AECPX
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CAG
COF
CS3
DM4
DU5
E.L
EBS
EFBJH
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
GBOLZ
HVGLF
IHE
J1W
JJJVA
K-O
KOM
LG5
LG9
LY7
M41
MO0
N9A
O-L
OAUVE
OZT
P-9
P2P
PC.
Q38
R2-
ROL
RPZ
SDF
SDG
SDP
SES
SPC
SPCBC
SST
SSV
SSZ
T5K
TN5
TWZ
XOL
XPP
ZMT
ZU3
ZY4
~G-
0R~
4.4
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABEFU
ABFNM
ABFSI
ABWVN
ABXDB
ACLOT
ACRPL
ACVFH
ADCNI
ADHUB
ADNMO
ADVLN
AEBSH
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
G-2
GBLVA
HLZ
HZ~
H~9
O9-
P-8
SBC
SET
SEW
WUQ
~HD
AFXIZ
AGCQF
AGRNS
BNPGV
IQODW
RIG
SSH
XJT
ID FETCH-LOGICAL-c270t-ca27e69290c4f297ccf06f99d3e69ffe8ff1b3ee21350fb8d09e2af6d10491803
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=10_1006_jpdc_1997_1344&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0743-7315
IngestDate Mon Jul 21 09:16:02 EDT 2025
Sat Nov 29 07:12:40 EST 2025
Fri Feb 23 02:27:54 EST 2024
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords System architecture
Distributed memory multiprocessor system
Algorithm performance
Information system
Hashing
Experimental study
Implementation
Communication
Document processing
Language English
License CC BY 4.0
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c270t-ca27e69290c4f297ccf06f99d3e69ffe8ff1b3ee21350fb8d09e2af6d10491803
PageCount 8
ParticipantIDs pascalfrancis_primary_2841389
crossref_primary_10_1006_jpdc_1997_1344
elsevier_sciencedirect_doi_10_1006_jpdc_1997_1344
PublicationCentury 1900
PublicationDate 1997-07-10
PublicationDateYYYYMMDD 1997-07-10
PublicationDate_xml – month: 07
  year: 1997
  text: 1997-07-10
  day: 10
PublicationDecade 1990
PublicationPlace San Diego, CA
PublicationPlace_xml – name: San Diego, CA
PublicationTitle Journal of parallel and distributed computing
PublicationYear 1997
Publisher Elsevier Inc
Elsevier
Publisher_xml – name: Elsevier Inc
– name: Elsevier
References A. Tridgell, D. Walsh, The HiDIOS filesystem, Proc. Fourth International Parallel Computing Workshop, Imperial College/Fujitsu, London, September 1995, 53, 63
D. Sitsky, K. Hayashi, Implementing MPI for the Fujitsu AP1000/AP1000+ using polling, interrupts and remote copying, Proc. Joint Symposium on Parallel Processing, Tokyo, Japan, 1996, 177, 184
D. K. Harman, Proc. Fourth Text Retrieval Conference (TREC-4), Gaithersburg, MD, November 1995, U.S. National Institute of Standards and Technology
Moffat, Bell (PC971344RF9) 1995; 46
Smith (PC971344RF12) 1990
Day (PC971344RF1) 1984
D. Hawking, Document retrieval performance on parallel systems, Proc. 1996 International Conference On Parallel and Distributed Processing Techniques and Applications, Sunnyvale, California, August 1996, 1354, 1365, CSREA, Athena, GA
Tridgell, Mackerras, Sitsky, Walsh (PC971344RF14) 1996; TR-CS-96-07
Faloutsos (PC971344RF2) 1985; 17
Ousterhout (PC971344RF10) 1994
T. Horie, H. Ishihata, T. Shimizu, M. Ikesaka, AP1000 architecture and performance of LU decomposition, Proc. 1991 International Conference On Parallel Processing, August 1991, 634, 635
Knuth (PC971344RF8) 1973
Stanfill, Thau (PC971344RF13) 1991; 27
Harman, Candela (PC971344RF3) 1990; 41
D. Hawking, P. Bailey, 1997, Parallel Document Retrieval Engine (PADRE) Web Page
10.1006/jpdc.1997.1344_PC971344RF15
Tridgell (10.1006/jpdc.1997.1344_PC971344RF14) 1996; TR-CS-96-07
Moffat (10.1006/jpdc.1997.1344_PC971344RF9) 1995; 46
10.1006/jpdc.1997.1344_PC971344RF11
Harman (10.1006/jpdc.1997.1344_PC971344RF3) 1990; 41
Ousterhout (10.1006/jpdc.1997.1344_PC971344RF10) 1994
Faloutsos (10.1006/jpdc.1997.1344_PC971344RF2) 1985; 17
Stanfill (10.1006/jpdc.1997.1344_PC971344RF13) 1991; 27
10.1006/jpdc.1997.1344_PC971344RF7
Day (10.1006/jpdc.1997.1344_PC971344RF1) 1984
10.1006/jpdc.1997.1344_PC971344RF6
10.1006/jpdc.1997.1344_PC971344RF5
10.1006/jpdc.1997.1344_PC971344RF4
Knuth (10.1006/jpdc.1997.1344_PC971344RF8) 1973
Smith (10.1006/jpdc.1997.1344_PC971344RF12) 1990
References_xml – reference: A. Tridgell, D. Walsh, The HiDIOS filesystem, Proc. Fourth International Parallel Computing Workshop, Imperial College/Fujitsu, London, September 1995, 53, 63
– volume: 17
  start-page: 49
  year: 1985
  end-page: 74
  ident: PC971344RF2
  article-title: Access methods for text
  publication-title: ACM Comput. Surveys
– year: 1994
  ident: PC971344RF10
  publication-title: Tcl and the Tk Toolkit
– reference: D. Hawking, P. Bailey, 1997, Parallel Document Retrieval Engine (PADRE) Web Page
– reference: D. Hawking, Document retrieval performance on parallel systems, Proc. 1996 International Conference On Parallel and Distributed Processing Techniques and Applications, Sunnyvale, California, August 1996, 1354, 1365, CSREA, Athena, GA
– volume: 27
  start-page: 285
  year: 1991
  end-page: 310
  ident: PC971344RF13
  article-title: Information retrieval on the connection machine: 1 to 8192 gigabytes
  publication-title: Inform. Process. Management
– year: 1984
  ident: PC971344RF1
  publication-title: Text Processing
– reference: T. Horie, H. Ishihata, T. Shimizu, M. Ikesaka, AP1000 architecture and performance of LU decomposition, Proc. 1991 International Conference On Parallel Processing, August 1991, 634, 635
– year: 1973
  ident: PC971344RF8
  publication-title: The Art of Computer Programming: Sorting and Searching
– year: 1990
  ident: PC971344RF12
  publication-title: An Introduction to Text Processing
– volume: 46
  start-page: 537
  year: 1995
  end-page: 550
  ident: PC971344RF9
  publication-title: J. Amer. Soc. Inform. Sci.
– reference: D. Sitsky, K. Hayashi, Implementing MPI for the Fujitsu AP1000/AP1000+ using polling, interrupts and remote copying, Proc. Joint Symposium on Parallel Processing, Tokyo, Japan, 1996, 177, 184
– volume: TR-CS-96-07
  year: 1996
  ident: PC971344RF14
  article-title: AP/Linux—Initial implementation
  publication-title: Technical Report
– volume: 41
  start-page: 581
  year: 1990
  end-page: 589
  ident: PC971344RF3
  article-title: Retrieving records from a gigabyte of text on a minicomputer using statistical ranking
  publication-title: J. Amer. Soc. Inform. Sci.
– reference: D. K. Harman, Proc. Fourth Text Retrieval Conference (TREC-4), Gaithersburg, MD, November 1995, U.S. National Institute of Standards and Technology
– volume: 46
  start-page: 537
  year: 1995
  ident: 10.1006/jpdc.1997.1344_PC971344RF9
  article-title: In situ
  publication-title: J. Amer. Soc. Inform. Sci.
  doi: 10.1002/(SICI)1097-4571(199508)46:7<537::AID-ASI7>3.0.CO;2-P
– volume: 17
  start-page: 49
  year: 1985
  ident: 10.1006/jpdc.1997.1344_PC971344RF2
  article-title: Access methods for text
  publication-title: ACM Comput. Surveys
  doi: 10.1145/4078.4080
– ident: 10.1006/jpdc.1997.1344_PC971344RF15
– year: 1984
  ident: 10.1006/jpdc.1997.1344_PC971344RF1
– ident: 10.1006/jpdc.1997.1344_PC971344RF11
– volume: TR-CS-96-07
  year: 1996
  ident: 10.1006/jpdc.1997.1344_PC971344RF14
  article-title: AP/Linux—Initial implementation
  publication-title: Technical Report
– ident: 10.1006/jpdc.1997.1344_PC971344RF7
– ident: 10.1006/jpdc.1997.1344_PC971344RF5
– ident: 10.1006/jpdc.1997.1344_PC971344RF6
– volume: 27
  start-page: 285
  year: 1991
  ident: 10.1006/jpdc.1997.1344_PC971344RF13
  article-title: Information retrieval on the connection machine: 1 to 8192 gigabytes
  publication-title: Inform. Process. Management
  doi: 10.1016/0306-4573(91)90085-Z
– ident: 10.1006/jpdc.1997.1344_PC971344RF4
  doi: 10.6028/NIST.SP.500-236.overview
– year: 1994
  ident: 10.1006/jpdc.1997.1344_PC971344RF10
– year: 1973
  ident: 10.1006/jpdc.1997.1344_PC971344RF8
– year: 1990
  ident: 10.1006/jpdc.1997.1344_PC971344RF12
– volume: 41
  start-page: 581
  year: 1990
  ident: 10.1006/jpdc.1997.1344_PC971344RF3
  article-title: Retrieving records from a gigabyte of text on a minicomputer using statistical ranking
  publication-title: J. Amer. Soc. Inform. Sci.
  doi: 10.1002/(SICI)1097-4571(199012)41:8<581::AID-ASI4>3.0.CO;2-U
SSID ssj0011578
Score 1.472298
Snippet A parallel algorithm for preparing word frequency concordances over two specified sets of documents from a collection is presented. Good parallel efficiency is...
SourceID pascalfrancis
crossref
elsevier
SourceType Index Database
Publisher
StartPage 80
SubjectTerms Algorithmics. Computability. Computer arithmetics
Applied sciences
Computer science; control theory; systems
Computer systems and distributed systems. User interface
Exact sciences and technology
Information systems. Data bases
Memory organisation. Data processing
Software
Theoretical computing
Title A Distributed Memory Algorithm for Lexicon Building
URI https://dx.doi.org/10.1006/jpdc.1997.1344
Volume 44
WOSCitedRecordID wos10_1006_jpdc_1997_1344&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: ScienceDirect database
  customDbUrl:
  eissn: 1096-0848
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0011578
  issn: 0743-7315
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3db9MwELdYx8MkNGAM0Y0iPyDxUIU5jhvHj9Eo4ksVD0XrW5Q49pjUplHbTeW_5xzbaQuqGA97iaKL8qH7Xe7OH_c7hN4KJktDRBXEjJOAUWYob2MaKMYGSUMgRppC4W98NEomE_HdNdpeNu0EeFUl67WoHxRqkAHYpnT2P-BuHwoCOAfQ4Qiww_FewKdm0cX2sYJkcmZ20v7q59Pr-eJm9XPWbCs0JJgwDu4Xrif2ngzV0IJPp8qSCWw_VDatIPyNxijSq6-u-unDZh7BcrDywO0obSa3XCTe8j-GvZRHttrSO0tL1rhjFNbz2X5MOzH0L-8Mf7jxznUpTZEkfw-2wDZxyK-9_xGe2k2DEEjNquoBOqR8IMCDHaafh5Mv7ZpROLBx13-2p-gk8cXuK_elIE_qfAk_hrYdTbbSjPEzdOy0j1OL63P0SFUn6KkbK2DniZcg8u04vOwFilK8BRK2yOMWeQzIY4c89sifoh8fh-PLT4HriRFIyskqkDnlKoaclkimqeBSahJrIcoIpFqrROuwiJSiYTQgukhKIhTNdVzCsFuECYleok41r9QrhIVSvMhzw_kWsThheVGCu1cJDMgVLZXoondeTVltqU8yS3IdZ0ahmVFoZhTaRaHXYuYSN5uQZYD83nt6O-puX-FgPvvH9XN0tLHi16izWtyqHnos71Y3y8UbZxu_AeD2aqc
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+distributed+memory+algorithm+for+lexicon+building&rft.jtitle=Journal+of+parallel+and+distributed+computing&rft.au=HAWKING%2C+D&rft.date=1997-07-10&rft.pub=Elsevier&rft.issn=0743-7315&rft.volume=44&rft.issue=1&rft.spage=80&rft.epage=87&rft_id=info:doi/10.1006%2Fjpdc.1997.1344&rft.externalDBID=n%2Fa&rft.externalDocID=2841389
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0743-7315&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0743-7315&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0743-7315&client=summon