Character sets of strings
Given a string S over a finite alphabet Σ, the character set (also called the fingerprint) of a substring S ′ of S is the subset C ⊆ Σ of the symbols occurring in S ′ . The study of the character sets of all the substrings of a given string (or a given collection of strings) appears in several domai...
Uloženo v:
| Vydáno v: | Journal of discrete algorithms (Amsterdam, Netherlands) Ročník 5; číslo 2; s. 330 - 340 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier B.V
01.06.2007
Elsevier |
| Témata: | |
| ISSN: | 1570-8667, 1570-8675 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Given a string
S over a finite alphabet
Σ, the
character set (also called the
fingerprint) of a substring
S
′
of
S is the subset
C
⊆
Σ
of the symbols occurring in
S
′
. The study of the character sets of all the substrings of a given string (or a given collection of strings) appears in several domains such as rule induction for natural language processing or comparative genomics. Several computational problems concerning the character sets of a string arise from these applications, especially:
(1)
Output all the maximal locations of substrings having a given character set.
(2)
Output for each character set
C occurring in a given string (or a given collection of strings) all the maximal locations of
C.
Denoting by
n the total length of the considered string or collection of strings, we solve the first problem in
Θ
(
n
)
time using
Θ
(
n
)
space. We present two algorithms solving the second problem. The first one runs in
Θ
(
n
2
)
time using
Θ
(
n
)
space. The second algorithm has
Θ
(
n
|
Σ
|
log
|
Σ
|
)
time and
Θ
(
n
)
space complexity and is an adaptation of an algorithm by Amir et al. [A. Amir, A. Apostolico, G.M. Landau, G. Satta, Efficient text fingerprinting via Parikh mapping, J. Discrete Algorithms 26 (2003) 1–13]. |
|---|---|
| ISSN: | 1570-8667 1570-8675 |
| DOI: | 10.1016/j.jda.2006.03.021 |