Non-overlapping indexing in BWT-runs bounded space

We revisit the non-overlapping indexing problem for an efficient repetition-aware solution. The problem is to index a text T[1..n], such that whenever a pattern P[1..p] comes as a query, we can report the largest set of non-overlapping occurrences of P in T. A previous index by Cohen and Porat [ISAA...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Theoretical computer science Ročník 1056; s. 115512
Hlavní autori: Gibney, Daniel, MacNichol, Paul, Thankachan, Sharma V.
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Elsevier B.V 21.11.2025
Predmet:
ISSN:0304-3975
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:We revisit the non-overlapping indexing problem for an efficient repetition-aware solution. The problem is to index a text T[1..n], such that whenever a pattern P[1..p] comes as a query, we can report the largest set of non-overlapping occurrences of P in T. A previous index by Cohen and Porat [ISAAC 2009] takes linear space and optimal O(p+occno) query time, where occno denotes the output size. We present an index of size O(r), where r denotes the number of runs in the Burrows Wheeler Transform (BWT) of T. The parameter r is significantly smaller than n for highly repetitive texts. The query time of our index is O(plog⁡logw⁡σ+sort(occno)), where σ denotes the alphabet size, w denotes the machine word size in bits and sort(x) denotes the time for sorting x integers within the range [1,n]. We also study the counting version of this problem.
ISSN:0304-3975
DOI:10.1016/j.tcs.2025.115512