Character sets of strings

Given a string S over a finite alphabet Σ, the character set (also called the fingerprint) of a substring S ′ of S is the subset C ⊆ Σ of the symbols occurring in S ′ . The study of the character sets of all the substrings of a given string (or a given collection of strings) appears in several domai...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of discrete algorithms (Amsterdam, Netherlands) Ročník 5; číslo 2; s. 330 - 340
Hlavní autoři: Didier, Gilles, Schmidt, Thomas, Stoye, Jens, Tsur, Dekel
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.06.2007
Elsevier
Témata:
ISSN:1570-8667, 1570-8675
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Given a string S over a finite alphabet Σ, the character set (also called the fingerprint) of a substring S ′ of S is the subset C ⊆ Σ of the symbols occurring in S ′ . The study of the character sets of all the substrings of a given string (or a given collection of strings) appears in several domains such as rule induction for natural language processing or comparative genomics. Several computational problems concerning the character sets of a string arise from these applications, especially: (1) Output all the maximal locations of substrings having a given character set. (2) Output for each character set C occurring in a given string (or a given collection of strings) all the maximal locations of C. Denoting by n the total length of the considered string or collection of strings, we solve the first problem in Θ ( n ) time using Θ ( n ) space. We present two algorithms solving the second problem. The first one runs in Θ ( n 2 ) time using Θ ( n ) space. The second algorithm has Θ ( n | Σ | log | Σ | ) time and Θ ( n ) space complexity and is an adaptation of an algorithm by Amir et al. [A. Amir, A. Apostolico, G.M. Landau, G. Satta, Efficient text fingerprinting via Parikh mapping, J. Discrete Algorithms 26 (2003) 1–13].
ISSN:1570-8667
1570-8675
DOI:10.1016/j.jda.2006.03.021