Improved characters distance sampling for online and offline text searching

Sampled string matching is a very effective technique to reduce the search time for a pattern within a text at the cost of a small amount of additional memory, used for storing a partial index of the text. This approach has recently received some interest and has been applied to improve both online...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Theoretical computer science Ročník 946; s. 113684
Hlavní autoři: Faro, Simone, Marino, Francesco Pio, Pavone, Arianna
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 10.02.2023
Témata:
ISSN:0304-3975
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Sampled string matching is a very effective technique to reduce the search time for a pattern within a text at the cost of a small amount of additional memory, used for storing a partial index of the text. This approach has recently received some interest and has been applied to improve both online and offline string matching solutions, improving standard solutions by more than 50%. However, this improvement is currently only achievable in the case of texts on large-sized alphabets, and remains small (or absent) in the case of small-sized alphabets. In this article we propose an extension of the approach to text-sampling, known as Character Distance Sampling, to the case of small alphabets, obtaining an improvement of up to 98% compared to standard solutions in the case of online string matching. We also extend this approach to the case of offline string matching, introducing a sampled version of the suffix array, obtaining performances up to 5 times higher than the search obtained on the standard suffix array. Differently from what has been done by previous solutions, our idea is not based on the reduction of the number of indexed suffixes, but on the construction of the index directly on the sampled text. •We extend the Character Distance Sampling approach to the case of small alphabets by making use of condensed alphabets.•We propose a way for constructing a sampled version of the suffix array to speed-up offline searching.•Our approach has a reduced space consumption whose gain is between 72% and 95%, if compared against previous solutions.•Our approach obtains a speed-up in online string matching up to 98%.•Our approach obtains a speed-up in offline string matching up to 5 times.
ISSN:0304-3975
DOI:10.1016/j.tcs.2022.12.034