Linear-time computation of DAWGs, symmetric indexing structures, and MAWs for integer alphabets
The directed acyclic word graph (DAWG) of a string y of length n is the smallest (partial) DFA which recognizes all suffixes of y with only O(n) nodes and edges. In this paper, we show how to construct the DAWG for the input string y from the suffix tree for y, in O(n) time for integer alphabets of...
Gespeichert in:
| Veröffentlicht in: | Theoretical computer science Jg. 973; S. 114093 |
|---|---|
| Hauptverfasser: | , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Elsevier B.V
21.09.2023
|
| Schlagworte: | |
| ISSN: | 0304-3975, 1879-2294 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | The directed acyclic word graph (DAWG) of a string y of length n is the smallest (partial) DFA which recognizes all suffixes of y with only O(n) nodes and edges. In this paper, we show how to construct the DAWG for the input string y from the suffix tree for y, in O(n) time for integer alphabets of polynomial size in n. In so doing, we first describe a folklore algorithm which, given the suffix tree for y, constructs the DAWG for the reversed string yˆ in O(n) time. Then, we present our algorithm that builds the DAWG for y in O(n) time for integer alphabets, from the suffix tree for y. We also show that a straightforward modification to our DAWG construction algorithm leads to the first O(n)-time algorithm for constructing the affix tree of a given string y over an integer alphabet. Affix trees are a text indexing structure supporting bidirectional pattern searches. We then discuss how our constructions can lead to linear-time algorithms for building other text indexing structures, such as linear-size suffix tries and symmetric CDAWGs in linear time in the case of integer alphabets. As a further application to our O(n)-time DAWG construction algorithm, we show that the set MAW(y) of all minimal absent words (MAWs) of y can be computed in optimal, input- and output-sensitive O(n+|MAW(y)|) time and O(n) working space for integer alphabets. |
|---|---|
| ISSN: | 0304-3975 1879-2294 |
| DOI: | 10.1016/j.tcs.2023.114093 |