A Distributed Memory Algorithm for Lexicon Building
A parallel algorithm for preparing word frequency concordances over two specified sets of documents from a collection is presented. Good parallel efficiency is demonstrated on a 128-node distributed memory machine using sets whose combined size exceeds one gigabyte. It is demonstrated that efficienc...
Uloženo v:
| Vydáno v: | Journal of parallel and distributed computing Ročník 44; číslo 1; s. 80 - 87 |
|---|---|
| Hlavní autor: | |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
San Diego, CA
Elsevier Inc
10.07.1997
Elsevier |
| Témata: | |
| ISSN: | 0743-7315, 1096-0848 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | A parallel algorithm for preparing word frequency concordances over two specified sets of documents from a collection is presented. Good parallel efficiency is demonstrated on a 128-node distributed memory machine using sets whose combined size exceeds one gigabyte. It is demonstrated that efficiency is heavily influenced by hashing and communication strategies. A two-stage hashing algorithm is proposed to reduce communication overhead. Ways of increasing capacity are considered, and the applicability of the algorithm to other text-processing functions such as index and symbol-table building is outlined. |
|---|---|
| AbstractList | A parallel algorithm for preparing word frequency concordances over two specified sets of documents from a collection is presented. Good parallel efficiency is demonstrated on a 128-node distributed memory machine using sets whose combined size exceeds one gigabyte. It is demonstrated that efficiency is heavily influenced by hashing and communication strategies. A two-stage hashing algorithm is proposed to reduce communication overhead. Ways of increasing capacity are considered, and the applicability of the algorithm to other text-processing functions such as index and symbol-table building is outlined. |
| Author | Hawking, David |
| Author_xml | – sequence: 1 givenname: David surname: Hawking fullname: Hawking, David organization: Cooperative Research Centre for Advanced Computational Systems, Department of Computer Science, Australian National University, Canberra, Australian Capital Territory, 0200, Australia |
| BackLink | http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=2841389$$DView record in Pascal Francis |
| BookMark | eNp1kD1PwzAQQC1UJNrCypyBNeEcu7E9lvIpBbHAbKX2ubhK48pOEf33JCpiYzrp9N7p9GZk0oUOCbmmUFCA6na7t6agSomCMs7PyJSCqnKQXE7IFARnuWB0cUFmKW0BKF0IOSVsmd371Ee_PvRos1fchXjMlu0mRN9_7jIXYlbjtzehy-4OvrW-21ySc9e0Ca9-55x8PD68r57z-u3pZbWsc1MK6HPTlAIrVSow3JVKGOOgckpZNmydQ-kcXTPEkrIFuLW0oLBsXGUpcEUlsDm5Od3dN8k0rYtNZ3zS--h3TTzqUnLKpBqw4oSZGFKK6P4ICnoso8cyeiyjxzKDIE8CDs9_eYw6GY-dQesjml7b4P9TfwD92Gto |
| Cites_doi | 10.1002/(SICI)1097-4571(199508)46:7<537::AID-ASI7>3.0.CO;2-P 10.1145/4078.4080 10.1016/0306-4573(91)90085-Z 10.6028/NIST.SP.500-236.overview 10.1002/(SICI)1097-4571(199012)41:8<581::AID-ASI4>3.0.CO;2-U |
| ContentType | Journal Article |
| Copyright | 1997 Academic Press 1997 INIST-CNRS |
| Copyright_xml | – notice: 1997 Academic Press – notice: 1997 INIST-CNRS |
| DBID | AAYXX CITATION IQODW |
| DOI | 10.1006/jpdc.1997.1344 |
| DatabaseName | CrossRef Pascal-Francis |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science Applied Sciences |
| EISSN | 1096-0848 |
| EndPage | 87 |
| ExternalDocumentID | 2841389 10_1006_jpdc_1997_1344 S0743731597913447 |
| GroupedDBID | --K --M -~X .~1 1B1 1~. 1~5 29L 457 4G. 5GY 5VS 7-5 71M 8P~ 9JN AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABJNI ABMAC ABTAH ABYKQ ACDAQ ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADFGL ADJOM ADMUD ADTZH AECPX AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CAG COF CS3 DM4 DU5 E.L EBS EFBJH EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q GBOLZ HVGLF IHE J1W JJJVA K-O KOM LG5 LG9 LY7 M41 MO0 N9A O-L OAUVE OZT P-9 P2P PC. Q38 R2- ROL RPZ SDF SDG SDP SES SPC SPCBC SST SSV SSZ T5K TN5 TWZ XOL XPP ZMT ZU3 ZY4 ~G- 0R~ 4.4 9DU AATTM AAXKI AAYWO AAYXX ABDPE ABEFU ABFNM ABFSI ABWVN ABXDB ACLOT ACRPL ACVFH ADCNI ADHUB ADNMO ADVLN AEBSH AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS G-2 GBLVA HLZ HZ~ H~9 O9- P-8 SBC SET SEW WUQ ~HD AFXIZ AGCQF AGRNS BNPGV IQODW RIG SSH XJT |
| ID | FETCH-LOGICAL-c270t-ca27e69290c4f297ccf06f99d3e69ffe8ff1b3ee21350fb8d09e2af6d10491803 |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=10_1006_jpdc_1997_1344&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0743-7315 |
| IngestDate | Mon Jul 21 09:16:02 EDT 2025 Sat Nov 29 07:12:40 EST 2025 Fri Feb 23 02:27:54 EST 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Keywords | System architecture Distributed memory multiprocessor system Algorithm performance Information system Hashing Experimental study Implementation Communication Document processing |
| Language | English |
| License | CC BY 4.0 |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c270t-ca27e69290c4f297ccf06f99d3e69ffe8ff1b3ee21350fb8d09e2af6d10491803 |
| PageCount | 8 |
| ParticipantIDs | pascalfrancis_primary_2841389 crossref_primary_10_1006_jpdc_1997_1344 elsevier_sciencedirect_doi_10_1006_jpdc_1997_1344 |
| PublicationCentury | 1900 |
| PublicationDate | 1997-07-10 |
| PublicationDateYYYYMMDD | 1997-07-10 |
| PublicationDate_xml | – month: 07 year: 1997 text: 1997-07-10 day: 10 |
| PublicationDecade | 1990 |
| PublicationPlace | San Diego, CA |
| PublicationPlace_xml | – name: San Diego, CA |
| PublicationTitle | Journal of parallel and distributed computing |
| PublicationYear | 1997 |
| Publisher | Elsevier Inc Elsevier |
| Publisher_xml | – name: Elsevier Inc – name: Elsevier |
| References | A. Tridgell, D. Walsh, The HiDIOS filesystem, Proc. Fourth International Parallel Computing Workshop, Imperial College/Fujitsu, London, September 1995, 53, 63 D. Sitsky, K. Hayashi, Implementing MPI for the Fujitsu AP1000/AP1000+ using polling, interrupts and remote copying, Proc. Joint Symposium on Parallel Processing, Tokyo, Japan, 1996, 177, 184 D. K. Harman, Proc. Fourth Text Retrieval Conference (TREC-4), Gaithersburg, MD, November 1995, U.S. National Institute of Standards and Technology Moffat, Bell (PC971344RF9) 1995; 46 Smith (PC971344RF12) 1990 Day (PC971344RF1) 1984 D. Hawking, Document retrieval performance on parallel systems, Proc. 1996 International Conference On Parallel and Distributed Processing Techniques and Applications, Sunnyvale, California, August 1996, 1354, 1365, CSREA, Athena, GA Tridgell, Mackerras, Sitsky, Walsh (PC971344RF14) 1996; TR-CS-96-07 Faloutsos (PC971344RF2) 1985; 17 Ousterhout (PC971344RF10) 1994 T. Horie, H. Ishihata, T. Shimizu, M. Ikesaka, AP1000 architecture and performance of LU decomposition, Proc. 1991 International Conference On Parallel Processing, August 1991, 634, 635 Knuth (PC971344RF8) 1973 Stanfill, Thau (PC971344RF13) 1991; 27 Harman, Candela (PC971344RF3) 1990; 41 D. Hawking, P. Bailey, 1997, Parallel Document Retrieval Engine (PADRE) Web Page 10.1006/jpdc.1997.1344_PC971344RF15 Tridgell (10.1006/jpdc.1997.1344_PC971344RF14) 1996; TR-CS-96-07 Moffat (10.1006/jpdc.1997.1344_PC971344RF9) 1995; 46 10.1006/jpdc.1997.1344_PC971344RF11 Harman (10.1006/jpdc.1997.1344_PC971344RF3) 1990; 41 Ousterhout (10.1006/jpdc.1997.1344_PC971344RF10) 1994 Faloutsos (10.1006/jpdc.1997.1344_PC971344RF2) 1985; 17 Stanfill (10.1006/jpdc.1997.1344_PC971344RF13) 1991; 27 10.1006/jpdc.1997.1344_PC971344RF7 Day (10.1006/jpdc.1997.1344_PC971344RF1) 1984 10.1006/jpdc.1997.1344_PC971344RF6 10.1006/jpdc.1997.1344_PC971344RF5 10.1006/jpdc.1997.1344_PC971344RF4 Knuth (10.1006/jpdc.1997.1344_PC971344RF8) 1973 Smith (10.1006/jpdc.1997.1344_PC971344RF12) 1990 |
| References_xml | – reference: A. Tridgell, D. Walsh, The HiDIOS filesystem, Proc. Fourth International Parallel Computing Workshop, Imperial College/Fujitsu, London, September 1995, 53, 63 – volume: 17 start-page: 49 year: 1985 end-page: 74 ident: PC971344RF2 article-title: Access methods for text publication-title: ACM Comput. Surveys – year: 1994 ident: PC971344RF10 publication-title: Tcl and the Tk Toolkit – reference: D. Hawking, P. Bailey, 1997, Parallel Document Retrieval Engine (PADRE) Web Page – reference: D. Hawking, Document retrieval performance on parallel systems, Proc. 1996 International Conference On Parallel and Distributed Processing Techniques and Applications, Sunnyvale, California, August 1996, 1354, 1365, CSREA, Athena, GA – volume: 27 start-page: 285 year: 1991 end-page: 310 ident: PC971344RF13 article-title: Information retrieval on the connection machine: 1 to 8192 gigabytes publication-title: Inform. Process. Management – year: 1984 ident: PC971344RF1 publication-title: Text Processing – reference: T. Horie, H. Ishihata, T. Shimizu, M. Ikesaka, AP1000 architecture and performance of LU decomposition, Proc. 1991 International Conference On Parallel Processing, August 1991, 634, 635 – year: 1973 ident: PC971344RF8 publication-title: The Art of Computer Programming: Sorting and Searching – year: 1990 ident: PC971344RF12 publication-title: An Introduction to Text Processing – volume: 46 start-page: 537 year: 1995 end-page: 550 ident: PC971344RF9 publication-title: J. Amer. Soc. Inform. Sci. – reference: D. Sitsky, K. Hayashi, Implementing MPI for the Fujitsu AP1000/AP1000+ using polling, interrupts and remote copying, Proc. Joint Symposium on Parallel Processing, Tokyo, Japan, 1996, 177, 184 – volume: TR-CS-96-07 year: 1996 ident: PC971344RF14 article-title: AP/Linux—Initial implementation publication-title: Technical Report – volume: 41 start-page: 581 year: 1990 end-page: 589 ident: PC971344RF3 article-title: Retrieving records from a gigabyte of text on a minicomputer using statistical ranking publication-title: J. Amer. Soc. Inform. Sci. – reference: D. K. Harman, Proc. Fourth Text Retrieval Conference (TREC-4), Gaithersburg, MD, November 1995, U.S. National Institute of Standards and Technology – volume: 46 start-page: 537 year: 1995 ident: 10.1006/jpdc.1997.1344_PC971344RF9 article-title: In situ publication-title: J. Amer. Soc. Inform. Sci. doi: 10.1002/(SICI)1097-4571(199508)46:7<537::AID-ASI7>3.0.CO;2-P – volume: 17 start-page: 49 year: 1985 ident: 10.1006/jpdc.1997.1344_PC971344RF2 article-title: Access methods for text publication-title: ACM Comput. Surveys doi: 10.1145/4078.4080 – ident: 10.1006/jpdc.1997.1344_PC971344RF15 – year: 1984 ident: 10.1006/jpdc.1997.1344_PC971344RF1 – ident: 10.1006/jpdc.1997.1344_PC971344RF11 – volume: TR-CS-96-07 year: 1996 ident: 10.1006/jpdc.1997.1344_PC971344RF14 article-title: AP/Linux—Initial implementation publication-title: Technical Report – ident: 10.1006/jpdc.1997.1344_PC971344RF7 – ident: 10.1006/jpdc.1997.1344_PC971344RF5 – ident: 10.1006/jpdc.1997.1344_PC971344RF6 – volume: 27 start-page: 285 year: 1991 ident: 10.1006/jpdc.1997.1344_PC971344RF13 article-title: Information retrieval on the connection machine: 1 to 8192 gigabytes publication-title: Inform. Process. Management doi: 10.1016/0306-4573(91)90085-Z – ident: 10.1006/jpdc.1997.1344_PC971344RF4 doi: 10.6028/NIST.SP.500-236.overview – year: 1994 ident: 10.1006/jpdc.1997.1344_PC971344RF10 – year: 1973 ident: 10.1006/jpdc.1997.1344_PC971344RF8 – year: 1990 ident: 10.1006/jpdc.1997.1344_PC971344RF12 – volume: 41 start-page: 581 year: 1990 ident: 10.1006/jpdc.1997.1344_PC971344RF3 article-title: Retrieving records from a gigabyte of text on a minicomputer using statistical ranking publication-title: J. Amer. Soc. Inform. Sci. doi: 10.1002/(SICI)1097-4571(199012)41:8<581::AID-ASI4>3.0.CO;2-U |
| SSID | ssj0011578 |
| Score | 1.472298 |
| Snippet | A parallel algorithm for preparing word frequency concordances over two specified sets of documents from a collection is presented. Good parallel efficiency is... |
| SourceID | pascalfrancis crossref elsevier |
| SourceType | Index Database Publisher |
| StartPage | 80 |
| SubjectTerms | Algorithmics. Computability. Computer arithmetics Applied sciences Computer science; control theory; systems Computer systems and distributed systems. User interface Exact sciences and technology Information systems. Data bases Memory organisation. Data processing Software Theoretical computing |
| Title | A Distributed Memory Algorithm for Lexicon Building |
| URI | https://dx.doi.org/10.1006/jpdc.1997.1344 |
| Volume | 44 |
| WOSCitedRecordID | wos10_1006_jpdc_1997_1344&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: ScienceDirect database customDbUrl: eissn: 1096-0848 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0011578 issn: 0743-7315 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3db9MwELdYx8MkNGAM0Y0iPyDxUIU5jhvHj9Eo4ksVD0XrW5Q49pjUplHbTeW_5xzbaQuqGA97iaKL8qH7Xe7OH_c7hN4KJktDRBXEjJOAUWYob2MaKMYGSUMgRppC4W98NEomE_HdNdpeNu0EeFUl67WoHxRqkAHYpnT2P-BuHwoCOAfQ4Qiww_FewKdm0cX2sYJkcmZ20v7q59Pr-eJm9XPWbCs0JJgwDu4Xrif2ngzV0IJPp8qSCWw_VDatIPyNxijSq6-u-unDZh7BcrDywO0obSa3XCTe8j-GvZRHttrSO0tL1rhjFNbz2X5MOzH0L-8Mf7jxznUpTZEkfw-2wDZxyK-9_xGe2k2DEEjNquoBOqR8IMCDHaafh5Mv7ZpROLBx13-2p-gk8cXuK_elIE_qfAk_hrYdTbbSjPEzdOy0j1OL63P0SFUn6KkbK2DniZcg8u04vOwFilK8BRK2yOMWeQzIY4c89sifoh8fh-PLT4HriRFIyskqkDnlKoaclkimqeBSahJrIcoIpFqrROuwiJSiYTQgukhKIhTNdVzCsFuECYleok41r9QrhIVSvMhzw_kWsThheVGCu1cJDMgVLZXoondeTVltqU8yS3IdZ0ahmVFoZhTaRaHXYuYSN5uQZYD83nt6O-puX-FgPvvH9XN0tLHi16izWtyqHnos71Y3y8UbZxu_AeD2aqc |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+distributed+memory+algorithm+for+lexicon+building&rft.jtitle=Journal+of+parallel+and+distributed+computing&rft.au=HAWKING%2C+D&rft.date=1997-07-10&rft.pub=Elsevier&rft.issn=0743-7315&rft.volume=44&rft.issue=1&rft.spage=80&rft.epage=87&rft_id=info:doi/10.1006%2Fjpdc.1997.1344&rft.externalDBID=n%2Fa&rft.externalDocID=2841389 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0743-7315&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0743-7315&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0743-7315&client=summon |