A BIT-PARALLEL ALGORITHM FOR SEQUENTIAL PATTERN MATCHING WITH WILDCARDS

Pattern matching with both gap constraints and the one-off condition is a challenging topic, especially in bioinformatics, information retrieval, and dictionary query. Among the algorithms to solve the problem, the most efficient one is SAIL, which is time consuming, especially when the pattern is l...

Full description

Saved in:
Bibliographic Details
Published in:Cybernetics and systems Vol. 42; no. 6; pp. 382 - 401
Main Authors: Guo, Dan, Hong, Xiao-Li, Hu, Xue-Gang, Gao, Jun, Liu, Ying-Ling, Wu, Gong-Qing, Wu, Xindong
Format: Journal Article
Language:English
Published: Taylor & Francis Group 01.08.2011
Subjects:
ISSN:0196-9722, 1087-6553
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Pattern matching with both gap constraints and the one-off condition is a challenging topic, especially in bioinformatics, information retrieval, and dictionary query. Among the algorithms to solve the problem, the most efficient one is SAIL, which is time consuming, especially when the pattern is long. In addition, existing algorithms based on bit-parallelism cannot handle a pattern that has only one pattern character between successive wildcards and the minimum local length constraints are zero. We propose an algorithm BPBM to handle online sequential pattern matching. In BPBM, an extended bit-parallelism operation is used to accelerate the matching process. An effective transition window mechanism with two nondeterministic finite state automatons (NFAs) is adopted to drop the useless scan window. It identifies gap constraints automatically and just scans once to export occurrences with exact match positions. Theoretical analysis and experimental results show that the BPBM algorithm is more competitive than other peers. It has an absolute advantage on search time complexity. It also has better stability that decreases operation costs with the increasing of the size of sequence alphabet or the length of the pattern. We also study off-line pattern matching. With twice pruning, left-most and right-most, we can increase the matching ratio about 2.08% on average.
ISSN:0196-9722
1087-6553
DOI:10.1080/01969722.2011.600651