A parameterizable enumeration algorithm for sequence mining

In this paper, we introduce an generic framework for the mining of sequences under various constraints. More precisely, we study the enumeration of all partitions of a word w into multisets of subsequences. We show that using additional predicates, this generator can be used for frequent subsequence...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Theoretical computer science Ročník 468; s. 59 - 68
Hlavní autoři: David, J., Nourine, L.
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 14.01.2013
Elsevier
Témata:
ISSN:0304-3975, 1879-2294
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:In this paper, we introduce an generic framework for the mining of sequences under various constraints. More precisely, we study the enumeration of all partitions of a word w into multisets of subsequences. We show that using additional predicates, this generator can be used for frequent subsequences and substrings mining. We define the transition graph Tw whose vertices are multisets of words and arcs are transitions between multisets. We show that Tw is a directed acyclic graph and it admits a covering tree. We use Tw to propose a generic algorithm that enumerates all multisets that satisfies a set of predicates, without redundancy.
ISSN:0304-3975
1879-2294
DOI:10.1016/j.tcs.2012.11.005