The Discrete Basis Problem

Matrix decomposition methods represent a data matrix as a product of two factor matrices: one containing basis vectors that represent meaningful concepts in the data, and another describing how the observed data can be expressed as combinations of the basis vectors. Decomposition methods have been s...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE transactions on knowledge and data engineering Ročník 20; číslo 10; s. 1348 - 1362
Hlavní autoři:	Miettinen, P., Mielikainen, T., Gionis, A., Das, G., Mannila, H.
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	New York, NY IEEE 01.10.2008 IEEE Computer Society The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:	Algorithms and association rules Applied sciences Artificial intelligence Association rules Boolean algebra classification Classification algorithms Clustering Clustering algorithms Computer languages Computer science; control theory; systems Data processing. List processing. Character string processing Decomposition Exact sciences and technology Greedy algorithms Mathematical analysis Matrices Matrix decomposition Matrix methods Memory organisation. Data processing Mining methods and algorithms Operating systems Partitioning Partitioning algorithms Program processors Software Speech and sound recognition and synthesis. Linguistics Studies Text mining Vectors (mathematics) Text mining Clustering classification Mining methods and algorithms and association rules Cluster analysis Matrix product Statistical association Discrete data Matrix decomposition Data mining Boolean logic Zero one matrix Classification NP hard problem Greedy algorithm Observation data Matrix method clustering text mining
ISSN:	1041-4347, 1558-2191
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Matrix decomposition methods represent a data matrix as a product of two factor matrices: one containing basis vectors that represent meaningful concepts in the data, and another describing how the observed data can be expressed as combinations of the basis vectors. Decomposition methods have been studied extensively, but many methods return real-valued matrices. Interpreting real-valued factor matrices is hard if the original data is Boolean. In this paper, we describe a matrix decomposition formulation for Boolean data, the Discrete Basis Problem. The problem seeks for a Boolean decomposition of a binary matrix, thus allowing the user to easily interpret the basis vectors. We also describe a variation of the problem, the Discrete Basis Partitioning Problem. We show that both problems are NP-hard. For the Discrete Basis Problem, we give a simple greedy algorithm for solving it; for the Discrete Basis Partitioning Problem we show how it can be solved using existing methods. We present experimental results for the greedy algorithm and compare it against other, well known methods. Our algorithm gives intuitive basis vectors, but its reconstruction error is usually larger than with the real-valued methods. We discuss about the reasons for this behavior.
Bibliografie:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23
ISSN:	1041-4347 1558-2191
DOI:	10.1109/TKDE.2008.53