Linear-time computation of DAWGs, symmetric indexing structures, and MAWs for integer alphabets

The directed acyclic word graph (DAWG) of a string y of length n is the smallest (partial) DFA which recognizes all suffixes of y with only O(n) nodes and edges. In this paper, we show how to construct the DAWG for the input string y from the suffix tree for y, in O(n) time for integer alphabets of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Theoretical computer science Jg. 973; S. 114093
Hauptverfasser: Fujishige, Yuta, Tsujimaru, Yuki, Inenaga, Shunsuke, Bannai, Hideo, Takeda, Masayuki
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier B.V 21.09.2023
Schlagworte:
ISSN:0304-3975, 1879-2294
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The directed acyclic word graph (DAWG) of a string y of length n is the smallest (partial) DFA which recognizes all suffixes of y with only O(n) nodes and edges. In this paper, we show how to construct the DAWG for the input string y from the suffix tree for y, in O(n) time for integer alphabets of polynomial size in n. In so doing, we first describe a folklore algorithm which, given the suffix tree for y, constructs the DAWG for the reversed string yˆ in O(n) time. Then, we present our algorithm that builds the DAWG for y in O(n) time for integer alphabets, from the suffix tree for y. We also show that a straightforward modification to our DAWG construction algorithm leads to the first O(n)-time algorithm for constructing the affix tree of a given string y over an integer alphabet. Affix trees are a text indexing structure supporting bidirectional pattern searches. We then discuss how our constructions can lead to linear-time algorithms for building other text indexing structures, such as linear-size suffix tries and symmetric CDAWGs in linear time in the case of integer alphabets. As a further application to our O(n)-time DAWG construction algorithm, we show that the set MAW(y) of all minimal absent words (MAWs) of y can be computed in optimal, input- and output-sensitive O(n+|MAW(y)|) time and O(n) working space for integer alphabets.
ISSN:0304-3975
1879-2294
DOI:10.1016/j.tcs.2023.114093