A BIT-PARALLEL ALGORITHM FOR SEQUENTIAL PATTERN MATCHING WITH WILDCARDS
Pattern matching with both gap constraints and the one-off condition is a challenging topic, especially in bioinformatics, information retrieval, and dictionary query. Among the algorithms to solve the problem, the most efficient one is SAIL, which is time consuming, especially when the pattern is l...
Saved in:
| Published in: | Cybernetics and systems Vol. 42; no. 6; pp. 382 - 401 |
|---|---|
| Main Authors: | , , , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Taylor & Francis Group
01.08.2011
|
| Subjects: | |
| ISSN: | 0196-9722, 1087-6553 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Pattern matching with both gap constraints and the one-off condition is a challenging topic, especially in bioinformatics, information retrieval, and dictionary query. Among the algorithms to solve the problem, the most efficient one is SAIL, which is time consuming, especially when the pattern is long. In addition, existing algorithms based on bit-parallelism cannot handle a pattern that has only one pattern character between successive wildcards and the minimum local length constraints are zero. We propose an algorithm BPBM to handle online sequential pattern matching. In BPBM, an extended bit-parallelism operation is used to accelerate the matching process. An effective transition window mechanism with two nondeterministic finite state automatons (NFAs) is adopted to drop the useless scan window. It identifies gap constraints automatically and just scans once to export occurrences with exact match positions. Theoretical analysis and experimental results show that the BPBM algorithm is more competitive than other peers. It has an absolute advantage on search time complexity. It also has better stability that decreases operation costs with the increasing of the size of sequence alphabet or the length of the pattern. We also study off-line pattern matching. With twice pruning, left-most and right-most, we can increase the matching ratio about 2.08% on average. |
|---|---|
| AbstractList | Pattern matching with both gap constraints and the one-off condition is a challenging topic, especially in bioinformatics, information retrieval, and dictionary query. Among the algorithms to solve the problem, the most efficient one is SAIL, which is time consuming, especially when the pattern is long. In addition, existing algorithms based on bit-parallelism cannot handle a pattern that has only one pattern character between successive wildcards and the minimum local length constraints are zero. We propose an algorithm BPBM to handle online sequential pattern matching. In BPBM, an extended bit-parallelism operation is used to accelerate the matching process. An effective transition window mechanism with two nondeterministic finite state automatons (NFAs) is adopted to drop the useless scan window. It identifies gap constraints automatically and just scans once to export occurrences with exact match positions. Theoretical analysis and experimental results show that the BPBM algorithm is more competitive than other peers. It has an absolute advantage on search time complexity. It also has better stability that decreases operation costs with the increasing of the size of sequence alphabet or the length of the pattern. We also study off-line pattern matching. With twice pruning, left-most and right-most, we can increase the matching ratio about 2.08% on average. |
| Author | Hong, Xiao-Li Hu, Xue-Gang Wu, Gong-Qing Liu, Ying-Ling Guo, Dan Gao, Jun Wu, Xindong |
| Author_xml | – sequence: 1 givenname: Dan surname: Guo fullname: Guo, Dan organization: College of Computer Science and Information Engineering, Hefei University of Technology – sequence: 2 givenname: Xiao-Li surname: Hong fullname: Hong, Xiao-Li organization: College of Computer Science and Information Engineering, Hefei University of Technology – sequence: 3 givenname: Xue-Gang surname: Hu fullname: Hu, Xue-Gang organization: College of Computer Science and Information Engineering, Hefei University of Technology – sequence: 4 givenname: Jun surname: Gao fullname: Gao, Jun organization: College of Computer Science and Information Engineering, Hefei University of Technology – sequence: 5 givenname: Ying-Ling surname: Liu fullname: Liu, Ying-Ling organization: School of Physics, University of Science and Technology of China – sequence: 6 givenname: Gong-Qing surname: Wu fullname: Wu, Gong-Qing organization: College of Computer Science and Information Engineering, Hefei University of Technology – sequence: 7 givenname: Xindong surname: Wu fullname: Wu, Xindong email: xwu@hfut.edu.cn organization: Department of Computer Science , University of Vermont |
| BookMark | eNqFkE1Pg0AQQDdGE9vqP_DAH6DOsrCAF7O2lJJsP6Q0HjfLwiYYCmYhMf33QqoXD3qZOcx7c3hTdN20TYnQA4Y5hgAeAYc09B1n7gDGcwpAPXyFJsPNt6nnkWs0GRF7ZG7RtOveAYAQH09QzKyXJLP3LGWcR9xiPN6lSbbeWKtdah2i12O0zRLGrT3LsijdWhuWLdbJNrbeBmoYfLlg6fJwh260rLvy_nvP0HEVDaTNd3GyYNxWBPzeDlxcAMEF1qFyvIBSqqgsSE7AhRy7rkNKqf0yCKhf0iCnKsQy146SWru5h4HMkHv5q0zbdabU4sNUJ2nOAoMYY4ifGGKMIS4xBu3pl6aqXvZV2_RGVvV_8vNFrhrdmpP8bE1diF6e69ZoIxtVdYL8-eELqA1yYQ |
| CitedBy_id | crossref_primary_10_1007_s10489_012_0394_4 crossref_primary_10_3233_IDA_205087 crossref_primary_10_1007_s10044_018_0733_0 crossref_primary_10_1007_s11390_014_1464_3 |
| Cites_doi | 10.1007/s10115-006-0016-8 10.1017/CBO9780511574931 10.1080/01969722.2010.520228 10.1007/s10115-007-0086-2 10.1080/01969721003778576 10.1007/s10115-009-0252-9 10.1080/01969720496443390 10.1109/TCBB.2005.5 10.1109/DASC.2009.65 10.1016/0020-0190(91)90032-D 10.1080/019697200124919 10.1017/CBO9781316135228 10.1089/106652703322756140 10.1007/s10115-007-0108-0 10.1007/s10115-009-0271-6 10.1007/s10115-009-0237-8 10.1093/nar/27.1.215 |
| ContentType | Journal Article |
| Copyright | Copyright Taylor & Francis Group, LLC 2011 |
| Copyright_xml | – notice: Copyright Taylor & Francis Group, LLC 2011 |
| DBID | AAYXX CITATION |
| DOI | 10.1080/01969722.2011.600651 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Sciences (General) |
| EISSN | 1087-6553 |
| EndPage | 401 |
| ExternalDocumentID | 10_1080_01969722_2011_600651 600651 |
| GroupedDBID | -~X .7F .DC .QJ 0BK 0R~ 29F 2DF 30N 4.4 5GY 5VS AAENE AAGDL AAHIA AAJMT AALDU AAMIU AAPUL AAQRR ABCCY ABFIM ABHAV ABJNI ABLIJ ABPAQ ABPEM ABTAI ABXUL ABXYU ACGEJ ACGFS ACTIO ADCVX ADGTB ADUMR ADXPE AEISY AENEX AEOZL AEPSL AEYOC AFKVX AFRVT AGBKS AGDLA AGMYJ AHDZW AIJEM AIYEW AJWEG AKBVH AKOOK ALMA_UNASSIGNED_HOLDINGS ALQZU AQRUH AVBZW AWYRJ BLEHA CAG CCCUG CE4 COF CS3 DGEBU DKSSO DU5 EBS EJD E~A E~B F5P FPAXQ GTTXZ H13 HF~ HZ~ H~P IPNFZ J.P KYCEM M4Z NA5 NX~ O9- P2P RIG RNANH ROSJB RTWRZ S-T SNACF TASJS TBQAZ TEN TFL TFT TFW TN5 TNC TTHFI TUROJ TWF UT5 UU3 ZGOLN ~S~ 07I 1OL 1TA 4B5 AAYXX ACTTO ADXEU AEHZU AEZBV AFBWG AFION AGBLW AGVKY AGWUF AGYFW AKHJE AKMBP ALRRR ALXIB ARCSS BGSSV BWMZZ C0- C5H CITATION CYRSC DAOYK DEXXA FETWF IFELN L8C LJTGL NUSFT OPCYK TAJZE TAP UB6 |
| ID | FETCH-LOGICAL-c307t-841d031d1f9c258666c6ad3b3040b14423eaf7e8867e68b6c91abf2caff4b5103 |
| IEDL.DBID | TFW |
| ISICitedReferencesCount | 7 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000295452300002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0196-9722 |
| IngestDate | Sat Nov 29 03:41:22 EST 2025 Tue Nov 18 21:17:04 EST 2025 Mon Oct 20 23:44:12 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 6 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c307t-841d031d1f9c258666c6ad3b3040b14423eaf7e8867e68b6c91abf2caff4b5103 |
| PageCount | 20 |
| ParticipantIDs | informaworld_taylorfrancis_310_1080_01969722_2011_600651 crossref_primary_10_1080_01969722_2011_600651 crossref_citationtrail_10_1080_01969722_2011_600651 |
| PublicationCentury | 2000 |
| PublicationDate | 2011-08-00 |
| PublicationDateYYYYMMDD | 2011-08-01 |
| PublicationDate_xml | – month: 08 year: 2011 text: 2011-08-00 |
| PublicationDecade | 2010 |
| PublicationTitle | Cybernetics and systems |
| PublicationYear | 2011 |
| Publisher | Taylor & Francis Group |
| Publisher_xml | – name: Taylor & Francis Group |
| References | Aygün R. S. (CIT0002) 2008; 16 CIT0010 CIT0012 Pisanti N. (CIT0019) 2005; 2 Chen G. (CIT0003) 2006; 10 Navarro G. (CIT0017) 2002 Akutsu T. (CIT0001) 1996; 79 Loekito E. (CIT0013) 2010; 24 CIT0014 CIT0015 Hofmann K. (CIT0008) 1999; 27 Zhang S. (CIT0026) 2008; 14 CIT0023 CIT0022 Fischer M. J. (CIT0005) 1974; 7 Singh S. (CIT0020) 2000; 31 Uno T. (CIT0021) 2010; 25 CIT0025 Navarro G. (CIT0018) 2003; 10 CIT0024 CIT0027 CIT0004 CIT0007 CIT0006 CIT0009 Kelil A. (CIT0011) 2010; 24 |
| References_xml | – ident: CIT0009 – volume: 10 start-page: 399 year: 2006 ident: CIT0003 publication-title: Knowledge and Information Systems doi: 10.1007/s10115-006-0016-8 – ident: CIT0006 doi: 10.1017/CBO9780511574931 – ident: CIT0024 doi: 10.1080/01969722.2010.520228 – volume: 14 start-page: 81 year: 2008 ident: CIT0026 publication-title: Knowledge Information System doi: 10.1007/s10115-007-0086-2 – ident: CIT0022 doi: 10.1080/01969721003778576 – volume: 24 start-page: 235 year: 2010 ident: CIT0013 publication-title: Knowledge and Information Systems doi: 10.1007/s10115-009-0252-9 – ident: CIT0025 – ident: CIT0023 – ident: CIT0027 doi: 10.1080/01969720496443390 – ident: CIT0004 – volume: 2 start-page: 40 year: 2005 ident: CIT0019 publication-title: IEEE/ACM Transactions on Computational Biology and Bioinformatics: IEEE Computer Society Press doi: 10.1109/TCBB.2005.5 – ident: CIT0015 doi: 10.1109/DASC.2009.65 – ident: CIT0014 doi: 10.1016/0020-0190(91)90032-D – volume: 31 start-page: 49 year: 2000 ident: CIT0020 publication-title: Cybernetics and Systems doi: 10.1080/019697200124919 – volume-title: Flexible pattern matching in strings: Practical on-line search algorithms for texts and biological sequences year: 2002 ident: CIT0017 doi: 10.1017/CBO9781316135228 – ident: CIT0012 – volume: 10 start-page: 903 year: 2003 ident: CIT0018 publication-title: Computational Biology doi: 10.1089/106652703322756140 – ident: CIT0010 – volume: 16 start-page: 303 year: 2008 ident: CIT0002 publication-title: Knowledge and Information Systems doi: 10.1007/s10115-007-0108-0 – volume: 79 start-page: 1353 year: 1996 ident: CIT0001 publication-title: IEICE Transactions - Info and Systems – ident: CIT0007 – volume: 25 start-page: 229 year: 2010 ident: CIT0021 publication-title: Knowledge and Information Systems doi: 10.1007/s10115-009-0271-6 – volume: 24 start-page: 197 year: 2010 ident: CIT0011 publication-title: Knowledge and Information Systems doi: 10.1007/s10115-009-0237-8 – volume: 7 volume-title: Complexity of Computation year: 1974 ident: CIT0005 – volume: 27 start-page: 215 year: 1999 ident: CIT0008 publication-title: Nucleic Acids Research doi: 10.1093/nar/27.1.215 |
| SSID | ssj0003371 |
| Score | 1.9246655 |
| Snippet | Pattern matching with both gap constraints and the one-off condition is a challenging topic, especially in bioinformatics, information retrieval, and... |
| SourceID | crossref informaworld |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 382 |
| SubjectTerms | bit-parallelism length contraints nondeterministic automatons one-off condition pattern matching wildcards |
| Title | A BIT-PARALLEL ALGORITHM FOR SEQUENTIAL PATTERN MATCHING WITH WILDCARDS |
| URI | https://www.tandfonline.com/doi/abs/10.1080/01969722.2011.600651 |
| Volume | 42 |
| WOSCitedRecordID | wos000295452300002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAWR databaseName: Taylor and Francis Online Journals customDbUrl: eissn: 1087-6553 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0003371 issn: 0196-9722 databaseCode: TFW dateStart: 19800601 isFulltext: true titleUrlDefault: https://www.tandfonline.com providerName: Taylor & Francis |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3NT8IwFG8M8eBFxY-IX-nBgx4aWTu67lhhIMlAhBG4LW23JiYGDUz_ftt9EDioiV52Wd-yvPbX934v7e8BcEOwr3yWtJDJhjVyE8GQTDFFkjHZ0pJZTfG82YQ3HLL53B9t3OK3xyoth9aFUES-V1twC7mqTsTd55IuHsaFACe1UdTyHxP5LTKj7my9FRPilQ0JKbIW1d25bz6yFZu2lEs3Yk734P9_ewj2y3wT8mKB1MFOujgC9RLRK3hbyk7fHYMehw_9CI34mIdhEEIe9p7G_ehxAA1RhJPgeRoMoz4P4YhHVkYXDnjUtsUuODOjzCPstPm4MzkB025g3qCyzQJSBuAZYq6TGGgnjvYVbjHDZxQVCZHE4FsavoVJKrSXMka9lDJJle8IqbESWrvSCvKdgtribZGeAciaijZT3yFamLTBpcL1PCkdgQmRQnm6AUjl4FiVGuS2FcZr7FRSpaW3YuutuPBWA6C11XuhwfHLeLY5d3GW1z500agkJj-Znv_d9ALsVfXmpnMJatnyI70Cu-oze1ktr_N1-QXv-dUo |
| linkProvider | Taylor & Francis |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrZ3NS8MwFMCDTEEv6vzA-ZmDBz0E16ZL02PcN3Zzbh3brSRpA4JM2aZ_v0mbDndQQbz00rxSXvPyPnj9PQCusRvIgCY1pKNhhbyEUyRSlyBBqagpQQ1TPBs24ff7dDoNBrabcGHbKk0OrXJQRHZWG-M2xeiiJe4uY7r4rpsTOIlxozoB2qxpV2vw-VFrsjqMMfbtSEKCjEjx99w3T1nzTmvs0i9ep7X3D--7D3ZtyAlZvkfKYCOdHYCyNeoFvLHk6dtD0GbwvhuhARuyMGyGkIXtx2E36vSgzhXhqPk0bvajLgvhgEWGpAt7LKqbehec6FX6EjbqbNgYHYFxq6nvIDtpAUlt40tEPSfR1p04KpBujeqURhKeYIG1iQudcrk45cpPKSV-SqggMnC4UK7kSnnCMPmOQWn2OktPAKRVSapp4GDFdeTgEe75vhAOdzEWXPqqAnCh4VhaDLmZhvESOwWt1GorNtqKc21VAFpJveUYjl_W068fL15m5Q-VzyqJ8U-ip38XvQLbnagXxmG3_3AGdoryc9U5B6Xl_D29AFvyY_m8mF9mm_QTqeXZUg |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpZ3NT8IwFMAbo8Z4UfEj4mcPHvTQyNbRdccKGxIHIozAbWm7NTExSAD9-233QeCgJnrZZX3L8vpe33tN-3sA3GDbkx5N6khnwwo5CadIpDZBglJRV4IapnjWbMLtdul47PVWbvGbY5WmhlY5KCJbq41zTxNVnoi7z5Aurm3nAE5ioqiuf7Z05kyMjUfBaLkWY-wWHQkJMiLl5blvvrIWnNbQpStBJ9j__-8egL0i4YQst5AK2Egnh6BSuPQc3hbc6bsj0GLwoR2hHuuzMPRDyMLWc78dPXagrhThwH8Z-t2ozULYY5Hh6MIOixpmtwuO9Cj9CJsN1m8OjsEw8PUbVPRZQFJ7-AJRx0q0byeW8qRdp1qLkvAEC6wdXOiCy8YpV25KKXFTQgWRnsWFsiVXyhGGyHcCNifvk_QUQFqTpJZ6FlZc5w0O4Y7rCmFxG2PBpauqAJcKjmUBITe9MN5iq2SVFtqKjbbiXFtVgJZS0xzC8ct4ujp38SLb_FB5p5IY_yR69nfRa7DTawZx2O4-nYPdcu-5Zl2AzcXsI70E2_Jz8TqfXWUm-gUZFtgE |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+BIT-PARALLEL+ALGORITHM+FOR+SEQUENTIAL+PATTERN+MATCHING+WITH+WILDCARDS&rft.jtitle=Cybernetics+and+systems&rft.au=Guo%2C+Dan&rft.au=Hong%2C+Xiao-Li&rft.au=Hu%2C+Xue-Gang&rft.au=Gao%2C+Jun&rft.date=2011-08-01&rft.issn=0196-9722&rft.eissn=1087-6553&rft.volume=42&rft.issue=6&rft.spage=382&rft.epage=401&rft_id=info:doi/10.1080%2F01969722.2011.600651&rft.externalDBID=n%2Fa&rft.externalDocID=10_1080_01969722_2011_600651 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0196-9722&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0196-9722&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0196-9722&client=summon |