Faster query algorithms for the text fingerprinting problem

Let S be a string over a finite, ordered alphabet Σ. For any substring S ′ of S, the set of distinct characters contained in S ′ is called its fingerprint. The text fingerprinting indexing problem is to construct a data structure for the string S in advance, so that on given any input subset C of Σ,...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Information and computation Ročník 209; číslo 7; s. 1057 - 1069
Hlavní autori:	Chan, Chi-Yuan, Yu, Hung-I, Hon, Wing-Kai, Wang, Biing-Feng
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Amsterdam Elsevier Inc 01.07.2011 Elsevier
Predmet:	Algorithmics. Computability. Computer arithmetics Applied sciences Combinatorial algorithms on words Computer science; control theory; systems Data processing. List processing. Character string processing Exact sciences and technology Fingerprints Memory organisation. Data processing Miscellaneous Patricia trie Software Text indexing Theoretical computing Text indexing Patricia trie Fingerprints Combinatorial algorithms on words Input Word Computer theory Query Combinatorial algorithm Data structure Time complexity Indexing
ISSN:	0890-5401, 1090-2651
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Let S be a string over a finite, ordered alphabet Σ. For any substring S ′ of S, the set of distinct characters contained in S ′ is called its fingerprint. The text fingerprinting indexing problem is to construct a data structure for the string S in advance, so that on given any input subset C of Σ, we can answer the following queries efficiently: (1) determine if C represents a fingerprint of some substrings in S; (2) find all maximal substrings of S whose fingerprint is C. The best known results solved these two queries in Θ ( \| Σ \| ) and Θ ( \| Σ \| + K ) time, respectively, where K is the number of maximal substrings. In this paper, we propose two improved algorithms for the text fingerprinting indexing problem. The first one solves the two queries in O ( \| C \| log n ) and O ( \| C \| log n + K ) time, respectively. For the second one, the query time complexities are further reduced to O ( \| C \| log ( \| Σ \| / \| C \| ) ) and O ( \| C \| log ( \| Σ \| / \| C \| ) + K ) . Both results answer an open problem proposed by Amir et al.
ISSN:	0890-5401 1090-2651
DOI:	10.1016/j.ic.2011.04.001