Non-overlapping indexing in BWT-runs bounded space

We revisit the non-overlapping indexing problem for an efficient repetition-aware solution. The problem is to index a text T[1..n], such that whenever a pattern P[1..p] comes as a query, we can report the largest set of non-overlapping occurrences of P in T. A previous index by Cohen and Porat [ISAA...

Full description

Saved in:
Bibliographic Details
Published in:Theoretical computer science Vol. 1056; p. 115512
Main Authors: Gibney, Daniel, MacNichol, Paul, Thankachan, Sharma V.
Format: Journal Article
Language:English
Published: Elsevier B.V 21.11.2025
Subjects:
ISSN:0304-3975
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We revisit the non-overlapping indexing problem for an efficient repetition-aware solution. The problem is to index a text T[1..n], such that whenever a pattern P[1..p] comes as a query, we can report the largest set of non-overlapping occurrences of P in T. A previous index by Cohen and Porat [ISAAC 2009] takes linear space and optimal O(p+occno) query time, where occno denotes the output size. We present an index of size O(r), where r denotes the number of runs in the Burrows Wheeler Transform (BWT) of T. The parameter r is significantly smaller than n for highly repetitive texts. The query time of our index is O(plog⁡logw⁡σ+sort(occno)), where σ denotes the alphabet size, w denotes the machine word size in bits and sort(x) denotes the time for sorting x integers within the range [1,n]. We also study the counting version of this problem.
ISSN:0304-3975
DOI:10.1016/j.tcs.2025.115512