Faster query algorithms for the text fingerprinting problem

Let S be a string over a finite, ordered alphabet Σ. For any substring S ′ of S, the set of distinct characters contained in S ′ is called its fingerprint. The text fingerprinting indexing problem is to construct a data structure for the string S in advance, so that on given any input subset C of Σ,...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Information and computation Ročník 209; číslo 7; s. 1057 - 1069
Hlavní autori: Chan, Chi-Yuan, Yu, Hung-I, Hon, Wing-Kai, Wang, Biing-Feng
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Amsterdam Elsevier Inc 01.07.2011
Elsevier
Predmet:
ISSN:0890-5401, 1090-2651
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Let S be a string over a finite, ordered alphabet Σ. For any substring S ′ of S, the set of distinct characters contained in S ′ is called its fingerprint. The text fingerprinting indexing problem is to construct a data structure for the string S in advance, so that on given any input subset C of Σ, we can answer the following queries efficiently: (1) determine if C represents a fingerprint of some substrings in S; (2) find all maximal substrings of S whose fingerprint is C. The best known results solved these two queries in Θ ( | Σ | ) and Θ ( | Σ | + K ) time, respectively, where K is the number of maximal substrings. In this paper, we propose two improved algorithms for the text fingerprinting indexing problem. The first one solves the two queries in O ( | C | log n ) and O ( | C | log n + K ) time, respectively. For the second one, the query time complexities are further reduced to O ( | C | log ( | Σ | / | C | ) ) and O ( | C | log ( | Σ | / | C | ) + K ) . Both results answer an open problem proposed by Amir et al.
ISSN:0890-5401
1090-2651
DOI:10.1016/j.ic.2011.04.001