A BIT-PARALLEL ALGORITHM FOR SEQUENTIAL PATTERN MATCHING WITH WILDCARDS

Pattern matching with both gap constraints and the one-off condition is a challenging topic, especially in bioinformatics, information retrieval, and dictionary query. Among the algorithms to solve the problem, the most efficient one is SAIL, which is time consuming, especially when the pattern is l...

Full description

Saved in:
Bibliographic Details
Published in:Cybernetics and systems Vol. 42; no. 6; pp. 382 - 401
Main Authors: Guo, Dan, Hong, Xiao-Li, Hu, Xue-Gang, Gao, Jun, Liu, Ying-Ling, Wu, Gong-Qing, Wu, Xindong
Format: Journal Article
Language:English
Published: Taylor & Francis Group 01.08.2011
Subjects:
ISSN:0196-9722, 1087-6553
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Pattern matching with both gap constraints and the one-off condition is a challenging topic, especially in bioinformatics, information retrieval, and dictionary query. Among the algorithms to solve the problem, the most efficient one is SAIL, which is time consuming, especially when the pattern is long. In addition, existing algorithms based on bit-parallelism cannot handle a pattern that has only one pattern character between successive wildcards and the minimum local length constraints are zero. We propose an algorithm BPBM to handle online sequential pattern matching. In BPBM, an extended bit-parallelism operation is used to accelerate the matching process. An effective transition window mechanism with two nondeterministic finite state automatons (NFAs) is adopted to drop the useless scan window. It identifies gap constraints automatically and just scans once to export occurrences with exact match positions. Theoretical analysis and experimental results show that the BPBM algorithm is more competitive than other peers. It has an absolute advantage on search time complexity. It also has better stability that decreases operation costs with the increasing of the size of sequence alphabet or the length of the pattern. We also study off-line pattern matching. With twice pruning, left-most and right-most, we can increase the matching ratio about 2.08% on average.
AbstractList Pattern matching with both gap constraints and the one-off condition is a challenging topic, especially in bioinformatics, information retrieval, and dictionary query. Among the algorithms to solve the problem, the most efficient one is SAIL, which is time consuming, especially when the pattern is long. In addition, existing algorithms based on bit-parallelism cannot handle a pattern that has only one pattern character between successive wildcards and the minimum local length constraints are zero. We propose an algorithm BPBM to handle online sequential pattern matching. In BPBM, an extended bit-parallelism operation is used to accelerate the matching process. An effective transition window mechanism with two nondeterministic finite state automatons (NFAs) is adopted to drop the useless scan window. It identifies gap constraints automatically and just scans once to export occurrences with exact match positions. Theoretical analysis and experimental results show that the BPBM algorithm is more competitive than other peers. It has an absolute advantage on search time complexity. It also has better stability that decreases operation costs with the increasing of the size of sequence alphabet or the length of the pattern. We also study off-line pattern matching. With twice pruning, left-most and right-most, we can increase the matching ratio about 2.08% on average.
Author Hong, Xiao-Li
Hu, Xue-Gang
Wu, Gong-Qing
Liu, Ying-Ling
Guo, Dan
Gao, Jun
Wu, Xindong
Author_xml – sequence: 1
  givenname: Dan
  surname: Guo
  fullname: Guo, Dan
  organization: College of Computer Science and Information Engineering, Hefei University of Technology
– sequence: 2
  givenname: Xiao-Li
  surname: Hong
  fullname: Hong, Xiao-Li
  organization: College of Computer Science and Information Engineering, Hefei University of Technology
– sequence: 3
  givenname: Xue-Gang
  surname: Hu
  fullname: Hu, Xue-Gang
  organization: College of Computer Science and Information Engineering, Hefei University of Technology
– sequence: 4
  givenname: Jun
  surname: Gao
  fullname: Gao, Jun
  organization: College of Computer Science and Information Engineering, Hefei University of Technology
– sequence: 5
  givenname: Ying-Ling
  surname: Liu
  fullname: Liu, Ying-Ling
  organization: School of Physics, University of Science and Technology of China
– sequence: 6
  givenname: Gong-Qing
  surname: Wu
  fullname: Wu, Gong-Qing
  organization: College of Computer Science and Information Engineering, Hefei University of Technology
– sequence: 7
  givenname: Xindong
  surname: Wu
  fullname: Wu, Xindong
  email: xwu@hfut.edu.cn
  organization: Department of Computer Science , University of Vermont
BookMark eNqFkE1Pg0AQQDdGE9vqP_DAH6DOsrCAF7O2lJJsP6Q0HjfLwiYYCmYhMf33QqoXD3qZOcx7c3hTdN20TYnQA4Y5hgAeAYc09B1n7gDGcwpAPXyFJsPNt6nnkWs0GRF7ZG7RtOveAYAQH09QzKyXJLP3LGWcR9xiPN6lSbbeWKtdah2i12O0zRLGrT3LsijdWhuWLdbJNrbeBmoYfLlg6fJwh260rLvy_nvP0HEVDaTNd3GyYNxWBPzeDlxcAMEF1qFyvIBSqqgsSE7AhRy7rkNKqf0yCKhf0iCnKsQy146SWru5h4HMkHv5q0zbdabU4sNUJ2nOAoMYY4ifGGKMIS4xBu3pl6aqXvZV2_RGVvV_8vNFrhrdmpP8bE1diF6e69ZoIxtVdYL8-eELqA1yYQ
CitedBy_id crossref_primary_10_1007_s10489_012_0394_4
crossref_primary_10_3233_IDA_205087
crossref_primary_10_1007_s10044_018_0733_0
crossref_primary_10_1007_s11390_014_1464_3
Cites_doi 10.1007/s10115-006-0016-8
10.1017/CBO9780511574931
10.1080/01969722.2010.520228
10.1007/s10115-007-0086-2
10.1080/01969721003778576
10.1007/s10115-009-0252-9
10.1080/01969720496443390
10.1109/TCBB.2005.5
10.1109/DASC.2009.65
10.1016/0020-0190(91)90032-D
10.1080/019697200124919
10.1017/CBO9781316135228
10.1089/106652703322756140
10.1007/s10115-007-0108-0
10.1007/s10115-009-0271-6
10.1007/s10115-009-0237-8
10.1093/nar/27.1.215
ContentType Journal Article
Copyright Copyright Taylor & Francis Group, LLC 2011
Copyright_xml – notice: Copyright Taylor & Francis Group, LLC 2011
DBID AAYXX
CITATION
DOI 10.1080/01969722.2011.600651
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Sciences (General)
EISSN 1087-6553
EndPage 401
ExternalDocumentID 10_1080_01969722_2011_600651
600651
GroupedDBID -~X
.7F
.DC
.QJ
0BK
0R~
29F
2DF
30N
4.4
5GY
5VS
AAENE
AAGDL
AAHIA
AAJMT
AALDU
AAMIU
AAPUL
AAQRR
ABCCY
ABFIM
ABHAV
ABJNI
ABLIJ
ABPAQ
ABPEM
ABTAI
ABXUL
ABXYU
ACGEJ
ACGFS
ACTIO
ADCVX
ADGTB
ADUMR
ADXPE
AEISY
AENEX
AEOZL
AEPSL
AEYOC
AFKVX
AFRVT
AGBKS
AGDLA
AGMYJ
AHDZW
AIJEM
AIYEW
AJWEG
AKBVH
AKOOK
ALMA_UNASSIGNED_HOLDINGS
ALQZU
AQRUH
AVBZW
AWYRJ
BLEHA
CAG
CCCUG
CE4
COF
CS3
DGEBU
DKSSO
DU5
EBS
EJD
E~A
E~B
F5P
FPAXQ
GTTXZ
H13
HF~
HZ~
H~P
IPNFZ
J.P
KYCEM
M4Z
NA5
NX~
O9-
P2P
RIG
RNANH
ROSJB
RTWRZ
S-T
SNACF
TASJS
TBQAZ
TEN
TFL
TFT
TFW
TN5
TNC
TTHFI
TUROJ
TWF
UT5
UU3
ZGOLN
~S~
07I
1OL
1TA
4B5
AAYXX
ACTTO
ADXEU
AEHZU
AEZBV
AFBWG
AFION
AGBLW
AGVKY
AGWUF
AGYFW
AKHJE
AKMBP
ALRRR
ALXIB
ARCSS
BGSSV
BWMZZ
C0-
C5H
CITATION
CYRSC
DAOYK
DEXXA
FETWF
IFELN
L8C
LJTGL
NUSFT
OPCYK
TAJZE
TAP
UB6
ID FETCH-LOGICAL-c307t-841d031d1f9c258666c6ad3b3040b14423eaf7e8867e68b6c91abf2caff4b5103
IEDL.DBID TFW
ISICitedReferencesCount 7
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000295452300002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0196-9722
IngestDate Sat Nov 29 03:41:22 EST 2025
Tue Nov 18 21:17:04 EST 2025
Mon Oct 20 23:44:12 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 6
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c307t-841d031d1f9c258666c6ad3b3040b14423eaf7e8867e68b6c91abf2caff4b5103
PageCount 20
ParticipantIDs informaworld_taylorfrancis_310_1080_01969722_2011_600651
crossref_primary_10_1080_01969722_2011_600651
crossref_citationtrail_10_1080_01969722_2011_600651
PublicationCentury 2000
PublicationDate 2011-08-00
PublicationDateYYYYMMDD 2011-08-01
PublicationDate_xml – month: 08
  year: 2011
  text: 2011-08-00
PublicationDecade 2010
PublicationTitle Cybernetics and systems
PublicationYear 2011
Publisher Taylor & Francis Group
Publisher_xml – name: Taylor & Francis Group
References Aygün R. S. (CIT0002) 2008; 16
CIT0010
CIT0012
Pisanti N. (CIT0019) 2005; 2
Chen G. (CIT0003) 2006; 10
Navarro G. (CIT0017) 2002
Akutsu T. (CIT0001) 1996; 79
Loekito E. (CIT0013) 2010; 24
CIT0014
CIT0015
Hofmann K. (CIT0008) 1999; 27
Zhang S. (CIT0026) 2008; 14
CIT0023
CIT0022
Fischer M. J. (CIT0005) 1974; 7
Singh S. (CIT0020) 2000; 31
Uno T. (CIT0021) 2010; 25
CIT0025
Navarro G. (CIT0018) 2003; 10
CIT0024
CIT0027
CIT0004
CIT0007
CIT0006
CIT0009
Kelil A. (CIT0011) 2010; 24
References_xml – ident: CIT0009
– volume: 10
  start-page: 399
  year: 2006
  ident: CIT0003
  publication-title: Knowledge and Information Systems
  doi: 10.1007/s10115-006-0016-8
– ident: CIT0006
  doi: 10.1017/CBO9780511574931
– ident: CIT0024
  doi: 10.1080/01969722.2010.520228
– volume: 14
  start-page: 81
  year: 2008
  ident: CIT0026
  publication-title: Knowledge Information System
  doi: 10.1007/s10115-007-0086-2
– ident: CIT0022
  doi: 10.1080/01969721003778576
– volume: 24
  start-page: 235
  year: 2010
  ident: CIT0013
  publication-title: Knowledge and Information Systems
  doi: 10.1007/s10115-009-0252-9
– ident: CIT0025
– ident: CIT0023
– ident: CIT0027
  doi: 10.1080/01969720496443390
– ident: CIT0004
– volume: 2
  start-page: 40
  year: 2005
  ident: CIT0019
  publication-title: IEEE/ACM Transactions on Computational Biology and Bioinformatics: IEEE Computer Society Press
  doi: 10.1109/TCBB.2005.5
– ident: CIT0015
  doi: 10.1109/DASC.2009.65
– ident: CIT0014
  doi: 10.1016/0020-0190(91)90032-D
– volume: 31
  start-page: 49
  year: 2000
  ident: CIT0020
  publication-title: Cybernetics and Systems
  doi: 10.1080/019697200124919
– volume-title: Flexible pattern matching in strings: Practical on-line search algorithms for texts and biological sequences
  year: 2002
  ident: CIT0017
  doi: 10.1017/CBO9781316135228
– ident: CIT0012
– volume: 10
  start-page: 903
  year: 2003
  ident: CIT0018
  publication-title: Computational Biology
  doi: 10.1089/106652703322756140
– ident: CIT0010
– volume: 16
  start-page: 303
  year: 2008
  ident: CIT0002
  publication-title: Knowledge and Information Systems
  doi: 10.1007/s10115-007-0108-0
– volume: 79
  start-page: 1353
  year: 1996
  ident: CIT0001
  publication-title: IEICE Transactions - Info and Systems
– ident: CIT0007
– volume: 25
  start-page: 229
  year: 2010
  ident: CIT0021
  publication-title: Knowledge and Information Systems
  doi: 10.1007/s10115-009-0271-6
– volume: 24
  start-page: 197
  year: 2010
  ident: CIT0011
  publication-title: Knowledge and Information Systems
  doi: 10.1007/s10115-009-0237-8
– volume: 7
  volume-title: Complexity of Computation
  year: 1974
  ident: CIT0005
– volume: 27
  start-page: 215
  year: 1999
  ident: CIT0008
  publication-title: Nucleic Acids Research
  doi: 10.1093/nar/27.1.215
SSID ssj0003371
Score 1.9246655
Snippet Pattern matching with both gap constraints and the one-off condition is a challenging topic, especially in bioinformatics, information retrieval, and...
SourceID crossref
informaworld
SourceType Enrichment Source
Index Database
Publisher
StartPage 382
SubjectTerms bit-parallelism
length contraints
nondeterministic automatons
one-off condition
pattern matching
wildcards
Title A BIT-PARALLEL ALGORITHM FOR SEQUENTIAL PATTERN MATCHING WITH WILDCARDS
URI https://www.tandfonline.com/doi/abs/10.1080/01969722.2011.600651
Volume 42
WOSCitedRecordID wos000295452300002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAWR
  databaseName: Taylor and Francis Online Journals
  customDbUrl:
  eissn: 1087-6553
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0003371
  issn: 0196-9722
  databaseCode: TFW
  dateStart: 19800601
  isFulltext: true
  titleUrlDefault: https://www.tandfonline.com
  providerName: Taylor & Francis
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3NT8IwFG8M8eBFxY-IX-nBgx4aWTu67lhhIMlAhBG4LW23JiYGDUz_ftt9EDioiV52Wd-yvPbX934v7e8BcEOwr3yWtJDJhjVyE8GQTDFFkjHZ0pJZTfG82YQ3HLL53B9t3OK3xyoth9aFUES-V1twC7mqTsTd55IuHsaFACe1UdTyHxP5LTKj7my9FRPilQ0JKbIW1d25bz6yFZu2lEs3Yk734P9_ewj2y3wT8mKB1MFOujgC9RLRK3hbyk7fHYMehw_9CI34mIdhEEIe9p7G_ehxAA1RhJPgeRoMoz4P4YhHVkYXDnjUtsUuODOjzCPstPm4MzkB025g3qCyzQJSBuAZYq6TGGgnjvYVbjHDZxQVCZHE4FsavoVJKrSXMka9lDJJle8IqbESWrvSCvKdgtribZGeAciaijZT3yFamLTBpcL1PCkdgQmRQnm6AUjl4FiVGuS2FcZr7FRSpaW3YuutuPBWA6C11XuhwfHLeLY5d3GW1z500agkJj-Znv_d9ALsVfXmpnMJatnyI70Cu-oze1ktr_N1-QXv-dUo
linkProvider Taylor & Francis
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrZ3NS8MwFMCDTEEv6vzA-ZmDBz0E16ZL02PcN3Zzbh3brSRpA4JM2aZ_v0mbDndQQbz00rxSXvPyPnj9PQCusRvIgCY1pKNhhbyEUyRSlyBBqagpQQ1TPBs24ff7dDoNBrabcGHbKk0OrXJQRHZWG-M2xeiiJe4uY7r4rpsTOIlxozoB2qxpV2vw-VFrsjqMMfbtSEKCjEjx99w3T1nzTmvs0i9ep7X3D--7D3ZtyAlZvkfKYCOdHYCyNeoFvLHk6dtD0GbwvhuhARuyMGyGkIXtx2E36vSgzhXhqPk0bvajLgvhgEWGpAt7LKqbehec6FX6EjbqbNgYHYFxq6nvIDtpAUlt40tEPSfR1p04KpBujeqURhKeYIG1iQudcrk45cpPKSV-SqggMnC4UK7kSnnCMPmOQWn2OktPAKRVSapp4GDFdeTgEe75vhAOdzEWXPqqAnCh4VhaDLmZhvESOwWt1GorNtqKc21VAFpJveUYjl_W068fL15m5Q-VzyqJ8U-ip38XvQLbnagXxmG3_3AGdoryc9U5B6Xl_D29AFvyY_m8mF9mm_QTqeXZUg
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpZ3NT8IwFMAbo8Z4UfEj4mcPHvTQyNbRdccKGxIHIozAbWm7NTExSAD9-233QeCgJnrZZX3L8vpe33tN-3sA3GDbkx5N6khnwwo5CadIpDZBglJRV4IapnjWbMLtdul47PVWbvGbY5WmhlY5KCJbq41zTxNVnoi7z5Aurm3nAE5ioqiuf7Z05kyMjUfBaLkWY-wWHQkJMiLl5blvvrIWnNbQpStBJ9j__-8egL0i4YQst5AK2Egnh6BSuPQc3hbc6bsj0GLwoR2hHuuzMPRDyMLWc78dPXagrhThwH8Z-t2ozULYY5Hh6MIOixpmtwuO9Cj9CJsN1m8OjsEw8PUbVPRZQFJ7-AJRx0q0byeW8qRdp1qLkvAEC6wdXOiCy8YpV25KKXFTQgWRnsWFsiVXyhGGyHcCNifvk_QUQFqTpJZ6FlZc5w0O4Y7rCmFxG2PBpauqAJcKjmUBITe9MN5iq2SVFtqKjbbiXFtVgJZS0xzC8ct4ujp38SLb_FB5p5IY_yR69nfRa7DTawZx2O4-nYPdcu-5Zl2AzcXsI70E2_Jz8TqfXWUm-gUZFtgE
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+BIT-PARALLEL+ALGORITHM+FOR+SEQUENTIAL+PATTERN+MATCHING+WITH+WILDCARDS&rft.jtitle=Cybernetics+and+systems&rft.au=Guo%2C+Dan&rft.au=Hong%2C+Xiao-Li&rft.au=Hu%2C+Xue-Gang&rft.au=Gao%2C+Jun&rft.date=2011-08-01&rft.issn=0196-9722&rft.eissn=1087-6553&rft.volume=42&rft.issue=6&rft.spage=382&rft.epage=401&rft_id=info:doi/10.1080%2F01969722.2011.600651&rft.externalDBID=n%2Fa&rft.externalDocID=10_1080_01969722_2011_600651
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0196-9722&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0196-9722&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0196-9722&client=summon