Exact and approximate discrete optimization algorithms for finding useful disjunctions of categorical predicates in data analysis
We discuss a discrete optimization problem that arises in data analysis from the binarization of categorical attributes. It can be described as the maximization of a function F ( l 1 ( x ) , l 2 ( x ) ) , where l 1 ( x ) and l 2 ( x ) are linear functions of binary variables x ∈ { 0 , 1 } n , and F...
Uložené v:
| Vydané v: | Discrete Applied Mathematics Ročník 144; číslo 1; s. 43 - 58 |
|---|---|
| Hlavní autori: | , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Elsevier B.V
30.11.2004
|
| Predmet: | |
| ISSN: | 0166-218X, 1872-6771 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | We discuss a discrete optimization problem that arises in data analysis from the binarization of categorical attributes. It can be described as the maximization of a function
F
(
l
1
(
x
)
,
l
2
(
x
)
)
, where
l
1
(
x
)
and
l
2
(
x
)
are linear functions of binary variables
x
∈
{
0
,
1
}
n
, and
F
:
R
2
⟶
R
. Though this problem is NP-hard, in general, an optimal solution
x
*
of it can be found, under some mild monotonicity conditions on
F, in pseudo-polynomial time. We also present an approximation algorithm which finds an approximate binary solution
x
ε
, for any given
ε
>
0
, such that
F
(
l
1
(
x
*
)
,
l
2
(
x
*
)
)
-
F
(
l
1
(
x
ε
)
,
l
2
(
x
ε
)
)
<
ε
, at the cost of no more than
O
(
n
log
n
+
2
C
/
ε
n
)
operations. Though in general
C depends on the problem instance, for the problems arising from [en]binarization of categorical variables it depends only on
F, and for all functions considered we have
C
⩽
1
/
2
. |
|---|---|
| ISSN: | 0166-218X 1872-6771 |
| DOI: | 10.1016/j.dam.2004.06.006 |