Faster query algorithms for the text fingerprinting problem

Let S be a string over a finite, ordered alphabet Σ. For any substring S ′ of S, the set of distinct characters contained in S ′ is called its fingerprint. The text fingerprinting indexing problem is to construct a data structure for the string S in advance, so that on given any input subset C of Σ,...

Full description

Saved in:
Bibliographic Details
Published in:Information and computation Vol. 209; no. 7; pp. 1057 - 1069
Main Authors: Chan, Chi-Yuan, Yu, Hung-I, Hon, Wing-Kai, Wang, Biing-Feng
Format: Journal Article
Language:English
Published: Amsterdam Elsevier Inc 01.07.2011
Elsevier
Subjects:
ISSN:0890-5401, 1090-2651
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Let S be a string over a finite, ordered alphabet Σ. For any substring S ′ of S, the set of distinct characters contained in S ′ is called its fingerprint. The text fingerprinting indexing problem is to construct a data structure for the string S in advance, so that on given any input subset C of Σ, we can answer the following queries efficiently: (1) determine if C represents a fingerprint of some substrings in S; (2) find all maximal substrings of S whose fingerprint is C. The best known results solved these two queries in Θ ( | Σ | ) and Θ ( | Σ | + K ) time, respectively, where K is the number of maximal substrings. In this paper, we propose two improved algorithms for the text fingerprinting indexing problem. The first one solves the two queries in O ( | C | log n ) and O ( | C | log n + K ) time, respectively. For the second one, the query time complexities are further reduced to O ( | C | log ( | Σ | / | C | ) ) and O ( | C | log ( | Σ | / | C | ) + K ) . Both results answer an open problem proposed by Amir et al.
ISSN:0890-5401
1090-2651
DOI:10.1016/j.ic.2011.04.001