Pattern Matching in Hypertext
The importance of hypertext has been steadily growing over the past decade. The Internet and other information systems use hypertext format, with data organized associatively rather than sequentially or relationally. A myriad of textual problems have been considered in the pattern matching field wit...
Uloženo v:
| Vydáno v: | Journal of algorithms Ročník 35; číslo 1; s. 82 - 99 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
San Diego, CA
Elsevier Inc
01.04.2000
Elsevier |
| Témata: | |
| ISSN: | 0196-6774, 1090-2678 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | The importance of hypertext has been steadily growing over the past decade. The Internet and other information systems use hypertext format, with data organized associatively rather than sequentially or relationally. A myriad of textual problems have been considered in the pattern matching field with many nontrivial results. Nevertheless, surprisingly little work has been done on the natural combination of pattern matching and hypertext. In contrast to regular text, hypertext has a nonlinear structure and the techniques of pattern matching for text cannot be directly applied to hypertext. Manber and Wu (1992, “IAPR Workshop on Structural and Syntactic Pattern Recognition, Bern, Switzerland”) pioneered the study of pattern matching in hypertext and defined a hypertext model for pattern matching. Akutsu (1993, “Procedures of the 4th Symposium on Combinatorial Pattern Matching, Podova, Italy,” pp. 1–10) developed an algorithm that can be used for exact pattern matching in a tree-structured hypertext. Park and Kim (1995, “6th Symposium on Combinatorial Pattern Matching, Helsinki, Finland”) considered regular pattern matching in hypertext. They developed a complex algorithm that works for hypertext with an underlying structure of a DAG. In this paper we present a much simpler algorithm achieving the same complexity which runs on any hypertext graph. We then extend the problem to approximate pattern matching in hypertext, first considering hamming distance and then edit distance. We show that in contrast to regular text, it does make a difference whether the errors occur in the hypertext or the pattern. The approximate pattern matching problem in hypertext with errors in the hypertext turns out to be NP-complete and the approximate pattern matching problem in hypertext with errors in the pattern has a polynomial time solution. |
|---|---|
| AbstractList | The importance of hypertext has been steadily growing over the past decade. The Internet and other information systems use hypertext format, with data organized associatively rather than sequentially or relationally. A myriad of textual problems have been considered in the pattern matching field with many nontrivial results. Nevertheless, surprisingly little work has been done on the natural combination of pattern matching and hypertext. In contrast to regular text, hypertext has a nonlinear structure and the techniques of pattern matching for text cannot be directly applied to hypertext. Manber and Wu (1992, “IAPR Workshop on Structural and Syntactic Pattern Recognition, Bern, Switzerland”) pioneered the study of pattern matching in hypertext and defined a hypertext model for pattern matching. Akutsu (1993, “Procedures of the 4th Symposium on Combinatorial Pattern Matching, Podova, Italy,” pp. 1–10) developed an algorithm that can be used for exact pattern matching in a tree-structured hypertext. Park and Kim (1995, “6th Symposium on Combinatorial Pattern Matching, Helsinki, Finland”) considered regular pattern matching in hypertext. They developed a complex algorithm that works for hypertext with an underlying structure of a DAG. In this paper we present a much simpler algorithm achieving the same complexity which runs on any hypertext graph. We then extend the problem to approximate pattern matching in hypertext, first considering hamming distance and then edit distance. We show that in contrast to regular text, it does make a difference whether the errors occur in the hypertext or the pattern. The approximate pattern matching problem in hypertext with errors in the hypertext turns out to be NP-complete and the approximate pattern matching problem in hypertext with errors in the pattern has a polynomial time solution. |
| Author | Amir, Amihood Lewenstein, Moshe Lewenstein, Noa |
| Author_xml | – sequence: 1 givenname: Amihood surname: Amir fullname: Amir, Amihood – sequence: 2 givenname: Moshe surname: Lewenstein fullname: Lewenstein, Moshe – sequence: 3 givenname: Noa surname: Lewenstein fullname: Lewenstein, Noa |
| BackLink | http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=1308214$$DView record in Pascal Francis |
| BookMark | eNp1kE1LAzEQQINUsK1evQk9eN2a7EeyOUpRK1T0oOcwnUxqSpuWJIj99-5SQRB6GgbeG5g3YoOwC8TYteBTwbm8W8NqOxVa626V1RkbCq55UUrVDtiQCy0LqVR9wUYprTkXoqn1kN28Qc4Uw-QFMn76sJr4MJkf9hQzfedLdu5gk-jqd47Zx-PD-2xeLF6fnmf3iwIr0eQC0ZVLCaS0okqq0jWuAaWs5WiXQlW10xbJOVsqV4MA7gQ2ugUU2AqSthqz2-PdPSSEjYsQ0Cezj34L8WBExdtS1B02PWIYdylFcn8EN30D0zcwfQPTN-iE-p-APkP2u5Aj-M1prT1q1P385SmahJ4CkvWRMBu786fUH_mSdgU |
| CODEN | JOALDV |
| CitedBy_id | crossref_primary_10_1007_s00453_022_00989_x crossref_primary_10_3390_a14010014 crossref_primary_10_1007_s00453_016_0271_3 crossref_primary_10_1016_j_ic_2021_104748 crossref_primary_10_1089_cmb_2024_0601 crossref_primary_10_1007_s00224_024_10194_8 crossref_primary_10_1186_s12859_018_2436_3 crossref_primary_10_1016_j_ic_2007_06_001 crossref_primary_10_1016_j_jda_2012_10_001 crossref_primary_10_1145_3588334 crossref_primary_10_1145_3301312 crossref_primary_10_1007_s00453_022_01007_w crossref_primary_10_1093_bioadv_vbad167 crossref_primary_10_1101_gr_279143_124 crossref_primary_10_1016_j_jclepro_2023_136888 crossref_primary_10_1016_j_ipl_2009_04_012 crossref_primary_10_1186_s12859_020_03590_7 crossref_primary_10_1089_cmb_2019_0066 crossref_primary_10_1089_cmb_2022_0411 |
| Cites_doi | 10.1006/inco.1995.1090 10.1016/0196-6774(89)90010-2 10.1142/9789812797919_0002 10.1016/S0022-0000(05)80047-9 10.1007/BFb0029792 10.1007/3-540-60044-2_51 10.1137/0206024 10.1137/0216067 |
| ContentType | Journal Article |
| Copyright | 2000 Academic Press 2000 INIST-CNRS |
| Copyright_xml | – notice: 2000 Academic Press – notice: 2000 INIST-CNRS |
| DBID | AAYXX CITATION IQODW |
| DOI | 10.1006/jagm.1999.1063 |
| DatabaseName | CrossRef Pascal-Francis |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science Applied Sciences |
| EISSN | 1090-2678 |
| EndPage | 99 |
| ExternalDocumentID | 1308214 10_1006_jagm_1999_1063 S0196677499910635 |
| GroupedDBID | --K --M -~X .DC .~1 0R~ 1B1 1RT 1~. 1~5 29J 4.4 4G. 5GY 5VS 7-5 71M 8P~ 9JN AACTN AAEDT AAEDW AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABAOU ABBOA ABEFU ABMAC ABTAH ABXDB ABYKQ ACAZW ACDAQ ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADFGL ADGUI ADIYS ADJOM ADMUD AEBSH AEKER AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHZHX AIEXJ AIGVJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ ASPBG AVWKF AXJTR AZFZN BKOJK BLXMC CAG COF CS3 DM4 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 FA8 FDB FEDTE FGOYB FIRID FNPLU FYGXN G-2 G-Q G8K GBOLZ HLZ HMJ HVGLF HZ~ IHE KOM LG5 LX9 M25 MHUIS MO0 MVM N9A O-L O9- OAUVE OZT P-9 P2P PC. Q38 R2- RIG ROL RPZ SBC SDF SDG SDP SEW SME SPC SSV SSW SSZ T5K TN5 TWZ UPT UQL WUQ XJT XPP YQT ZCA ZU3 ZY4 9DU AATTM AAXKI AAYWO AAYXX ABDPE ABJNI ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU CITATION EFKBS ~HD AFXIZ AGCQF AGRNS IQODW SSH |
| ID | FETCH-LOGICAL-c315t-ccf2b6ae797e3672f5f5a77dd0cdb1734f9dceffd27f4a1a0f1c598ac1c81e6d3 |
| ISICitedReferencesCount | 40 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000086054600004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0196-6774 |
| IngestDate | Mon Jul 21 09:18:45 EDT 2025 Sat Nov 29 06:24:22 EST 2025 Tue Nov 18 20:40:29 EST 2025 Fri Feb 23 02:22:56 EST 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Keywords | pattern matching hypertext design and analysis of algorithms combinatorial algorithms on words pattern matching on hypertext Design Hamming distance Hypertext Analysis Combinatorial algorithm Graph theory Algorithm Computational complexity Pattern matching Polynomial time |
| Language | English |
| License | https://www.elsevier.com/tdm/userlicense/1.0 CC BY 4.0 |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c315t-ccf2b6ae797e3672f5f5a77dd0cdb1734f9dceffd27f4a1a0f1c598ac1c81e6d3 |
| PageCount | 18 |
| ParticipantIDs | pascalfrancis_primary_1308214 crossref_primary_10_1006_jagm_1999_1063 crossref_citationtrail_10_1006_jagm_1999_1063 elsevier_sciencedirect_doi_10_1006_jagm_1999_1063 |
| PublicationCentury | 2000 |
| PublicationDate | 2000-04-01 |
| PublicationDateYYYYMMDD | 2000-04-01 |
| PublicationDate_xml | – month: 04 year: 2000 text: 2000-04-01 day: 01 |
| PublicationDecade | 2000 |
| PublicationPlace | San Diego, CA |
| PublicationPlace_xml | – name: San Diego, CA |
| PublicationTitle | Journal of algorithms |
| PublicationYear | 2000 |
| Publisher | Elsevier Inc Elsevier |
| Publisher_xml | – name: Elsevier Inc – name: Elsevier |
| References | Cormen, Leiserson, Rivest (RF7) 1990 IAPR Workshop on Structural and Syntactic Pattern Recognition, Bern, Switzerland, 1992. Boyer, Moore (RF6) 1977; 20 K. Park, and, D. K. Kim, String matching in hypertext Landau, Vishkin (RF13) 1989; 10 U. Manber, and, S. Wu, Approximate string matching with arbitrary costs for text and hypertext Amir, Farach, Giancarlo, Galil, Park (RF3) 1994; 49 Fraenkel, Klein (RF10) 1995 Knuth, Morris, Pratt (RF11) 1977; 6 Abrahamson (RF1) 1987; 16 T. Akutsu, A linear time pattern matching algorithm between a string and a tree Amir, Farach, Idury, La Poutré, Schäffer (RF4) 1995; 119 Aviad (RF5) 1993 Proceedings of the 4th Symposium on Combinatorial Pattern Matching, Padova, Italy, 1993, pp. 1–10. 6th Symposium on Combinatorial Pattern Matching, Helsinki, Finland, 1995. Sahinalp, Vishkin (RF17) 1996 S. Rao, Kosaraju, Efficient string matching, manuscript, 1987. Ferragina, Grossi (RF9) 1995 Fischer, Paterson (RF8) 1974 Nielsen (RF15) 1993 Fischer (10.1006/jagm.1999.1063_RF8) 1974 Nielsen (10.1006/jagm.1999.1063_RF15) 1993 10.1006/jagm.1999.1063_RF2 Amir (10.1006/jagm.1999.1063_RF3) 1994; 49 10.1006/jagm.1999.1063_RF14 Aviad (10.1006/jagm.1999.1063_RF5) 1993 Fraenkel (10.1006/jagm.1999.1063_RF10) 1995 10.1006/jagm.1999.1063_RF12 10.1006/jagm.1999.1063_RF16 Abrahamson (10.1006/jagm.1999.1063_RF1) 1987; 16 Knuth (10.1006/jagm.1999.1063_RF11) 1977; 6 Cormen (10.1006/jagm.1999.1063_RF7) 1990 Amir (10.1006/jagm.1999.1063_RF4) 1995; 119 Ferragina (10.1006/jagm.1999.1063_RF9) 1995 Sahinalp (10.1006/jagm.1999.1063_RF17) 1996 Boyer (10.1006/jagm.1999.1063_RF6) 1977; 20 Landau (10.1006/jagm.1999.1063_RF13) 1989; 10 |
| References_xml | – reference: U. Manber, and, S. Wu, Approximate string matching with arbitrary costs for text and hypertext, – year: 1993 ident: RF15 publication-title: Hypertext and Hypermedia – year: 1990 ident: RF7 publication-title: Introduction to Algorithms – volume: 6 start-page: 323 year: 1977 end-page: 350 ident: RF11 article-title: Fast pattern matching in strings publication-title: SIAM J. Comput. – reference: T. Akutsu, A linear time pattern matching algorithm between a string and a tree, – volume: 16 start-page: 1039 year: 1987 end-page: 1051 ident: RF1 article-title: Generalized string matching publication-title: SIAM J. Comput. – volume: 20 start-page: 762 year: 1977 end-page: 772 ident: RF6 article-title: A fast string searching algorithm publication-title: Comm. Assoc. Comput. Mach. – reference: K. Park, and, D. K. Kim, String matching in hypertext, – reference: , 6th Symposium on Combinatorial Pattern Matching, Helsinki, Finland, 1995. – start-page: 113 year: 1974 end-page: 125 ident: RF8 article-title: String matching and other products publication-title: Complexity of Computation – year: 1995 ident: RF9 article-title: Optimal on-line search and sublinear time update in string matching publication-title: Proc. 7th ACM-SIAM Symposium on Discrete Algorithms – volume: 10 start-page: 157 year: 1989 end-page: 169 ident: RF13 article-title: Fast parallel and serial approximate string matching publication-title: J. Algorithms – reference: Proceedings of the 4th Symposium on Combinatorial Pattern Matching, Padova, Italy, 1993, pp. 1–10. – volume: 49 start-page: 208 year: 1994 end-page: 222 ident: RF3 article-title: Dynamic dictionary matching publication-title: J. Comput. System Sci. – year: 1995 ident: RF10 publication-title: Information Retrieval from Annotated Texts – year: 1996 ident: RF17 article-title: Efficient approximate and dynamic matching of patterns using a labeling paradigm publication-title: Proc. 36th FOCS – volume: 119 start-page: 258 year: 1995 end-page: 282 ident: RF4 article-title: Improved dynamic dictionary matching publication-title: Inform. and Comput. – reference: S. Rao, Kosaraju, Efficient string matching, manuscript, 1987. – reference: , IAPR Workshop on Structural and Syntactic Pattern Recognition, Bern, Switzerland, 1992. – year: 1993 ident: RF5 publication-title: HyperTalmud: A Hypertext System for the Babylonian Talmud and Its Commentaries – year: 1995 ident: 10.1006/jagm.1999.1063_RF10 – volume: 119 start-page: 258 year: 1995 ident: 10.1006/jagm.1999.1063_RF4 article-title: Improved dynamic dictionary matching publication-title: Inform. and Comput. doi: 10.1006/inco.1995.1090 – volume: 10 start-page: 157 year: 1989 ident: 10.1006/jagm.1999.1063_RF13 article-title: Fast parallel and serial approximate string matching publication-title: J. Algorithms doi: 10.1016/0196-6774(89)90010-2 – ident: 10.1006/jagm.1999.1063_RF14 doi: 10.1142/9789812797919_0002 – volume: 49 start-page: 208 year: 1994 ident: 10.1006/jagm.1999.1063_RF3 article-title: Dynamic dictionary matching publication-title: J. Comput. System Sci. doi: 10.1016/S0022-0000(05)80047-9 – ident: 10.1006/jagm.1999.1063_RF12 – year: 1990 ident: 10.1006/jagm.1999.1063_RF7 – year: 1993 ident: 10.1006/jagm.1999.1063_RF15 – year: 1996 ident: 10.1006/jagm.1999.1063_RF17 article-title: Efficient approximate and dynamic matching of patterns using a labeling paradigm – volume: 20 start-page: 762 year: 1977 ident: 10.1006/jagm.1999.1063_RF6 article-title: A fast string searching algorithm publication-title: Comm. Assoc. Comput. Mach. – year: 1993 ident: 10.1006/jagm.1999.1063_RF5 – ident: 10.1006/jagm.1999.1063_RF2 doi: 10.1007/BFb0029792 – ident: 10.1006/jagm.1999.1063_RF16 doi: 10.1007/3-540-60044-2_51 – volume: 6 start-page: 323 year: 1977 ident: 10.1006/jagm.1999.1063_RF11 article-title: Fast pattern matching in strings publication-title: SIAM J. Comput. doi: 10.1137/0206024 – start-page: 113 year: 1974 ident: 10.1006/jagm.1999.1063_RF8 article-title: String matching and other products – year: 1995 ident: 10.1006/jagm.1999.1063_RF9 article-title: Optimal on-line search and sublinear time update in string matching – volume: 16 start-page: 1039 year: 1987 ident: 10.1006/jagm.1999.1063_RF1 article-title: Generalized string matching publication-title: SIAM J. Comput. doi: 10.1137/0216067 |
| SSID | ssj0011549 |
| Score | 1.5954802 |
| Snippet | The importance of hypertext has been steadily growing over the past decade. The Internet and other information systems use hypertext format, with data... |
| SourceID | pascalfrancis crossref elsevier |
| SourceType | Index Database Enrichment Source Publisher |
| StartPage | 82 |
| SubjectTerms | Applied sciences combinatorial algorithms on words design and analysis of algorithms Exact sciences and technology hypertext Mathematical programming Operational research and scientific management Operational research. Management science pattern matching pattern matching on hypertext |
| Title | Pattern Matching in Hypertext |
| URI | https://dx.doi.org/10.1006/jagm.1999.1063 |
| Volume | 35 |
| WOSCitedRecordID | wos000086054600004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: ScienceDirect customDbUrl: eissn: 1090-2678 dateEnd: 20091031 omitProxy: false ssIdentifier: ssj0011549 issn: 0196-6774 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Ni9swEBVl00Oh9LtsupvFh0IPwTSyI8s6huKlWbYmtCnNzcj6yBoSJyRpyc_fkSU7cZfQ9tCLMWMnDvPGo6eJNA-h91oQncO8xM9DgGEoWeADjeA-zARyTCSTlPBKbIKmaTybsYlTyN5WcgK0LOP9nq3_K9RgA7DN1tl_gLv5UjDAOYAOR4Adjn8F_KTqmFn2gYradZJF2b-D2ebGrPE4wUX5Yr7aFLu7ZcOwR1_GX1ulztvkR5J-mybjtFVDbZvTVhFhcLT2pKpsuWH4uNDIIj-iVkGnzpS2sUgrImzas_pBbgC1gkcPUjO83pUkwHxpdkgyMLjM1uqB_dvY1KwYxKatjhEs7wSUMMjAndE4md00fxiZTnN2Z7z92XV_zkH0sf3IU_zj6Zpv4a3QVs7kiGNMX6BnDhBvZEF9iR6p8hV67iYKnkvDWzDVWhy17TXqOdi9GnavKL0G9jfo-3Uy_fTZd9IXvggx2flC6CCPuKKMqjCigSaacEqlHAiZYxoONZNCaS0Dqocc84HGgrCYCyxirCIZvkVn5apU58gLqSKhCCImuBgCT4kDnmMFtFviQIY47yK_dkgmXF94I0-yyGxH6ygzDsyMAzPjwC760Ny_th1RTt6Ja_9mjs9ZnpZBTJz8TK8FxOERNgDe_eH6BXpyiO9LdLbb_FQ99Fj82hXbzZWLmnutw27b |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Pattern+matching+in+hypertext&rft.jtitle=Journal+of+algorithms&rft.au=AMIR%2C+A&rft.au=LEWENSTEIN%2C+M&rft.au=LEWENSTEIN%2C+N&rft.date=2000-04-01&rft.pub=Elsevier&rft.issn=0196-6774&rft.volume=35&rft.issue=1&rft.spage=82&rft.epage=99&rft_id=info:doi/10.1006%2Fjagm.1999.1063&rft.externalDBID=n%2Fa&rft.externalDocID=1308214 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0196-6774&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0196-6774&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0196-6774&client=summon |