Hybrid ASP-based Approach to Pattern Mining

Detecting small sets of relevant patterns from a given data set is a central challenge in data mining. The relevance of a pattern is based on user-provided criteria; typically, all patterns that satisfy certain criteria are considered relevant. Rule-based languages like answer set programming (ASP)...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Theory and practice of logic programming Ročník 19; číslo 4; s. 505 - 535
Hlavní autoři: PARAMONOV, SERGEY, STEPANOVA, DARIA, MIETTINEN, PAULI
Médium: Journal Article
Jazyk:angličtina
Vydáno: Cambridge, UK Cambridge University Press 01.07.2019
Témata:
ISSN:1471-0684, 1475-3081
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Detecting small sets of relevant patterns from a given data set is a central challenge in data mining. The relevance of a pattern is based on user-provided criteria; typically, all patterns that satisfy certain criteria are considered relevant. Rule-based languages like answer set programming (ASP) seem well suited for specifying such criteria in a form of constraints. Although progress has been made, on the one hand, on solving individual mining problems and, on the other hand, developing generic mining systems, the existing methods focus either on scalability or on generality. In this paper, we make steps toward combining local (frequency, size, and cost) and global (various condensed representations like maximal, closed, and skyline) constraints in a generic and efficient way. We present a hybrid approach for itemset, sequence, and graph mining which exploits dedicated highly optimized mining systems to detect frequent patterns and then filters the results using declarative ASP. To further demonstrate the generic nature of our hybrid framework, we apply it to a problem of approximately tiling a database. Experiments on real-world data sets show the effectiveness of the proposed method and computational gains for itemset, sequence, and graph mining, as well as approximate tiling. Under consideration in Theory and Practice of Logic Programming.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1471-0684
1475-3081
DOI:10.1017/S1471068418000467