Exact and approximate discrete optimization algorithms for finding useful disjunctions of categorical predicates in data analysis

We discuss a discrete optimization problem that arises in data analysis from the binarization of categorical attributes. It can be described as the maximization of a function F ( l 1 ( x ) , l 2 ( x ) ) , where l 1 ( x ) and l 2 ( x ) are linear functions of binary variables x ∈ { 0 , 1 } n , and F...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Discrete Applied Mathematics Ročník 144; číslo 1; s. 43 - 58
Hlavní autori: Boros, Endre, Menkov, Vladimir
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Elsevier B.V 30.11.2004
Predmet:
ISSN:0166-218X, 1872-6771
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:We discuss a discrete optimization problem that arises in data analysis from the binarization of categorical attributes. It can be described as the maximization of a function F ( l 1 ( x ) , l 2 ( x ) ) , where l 1 ( x ) and l 2 ( x ) are linear functions of binary variables x ∈ { 0 , 1 } n , and F : R 2 ⟶ R . Though this problem is NP-hard, in general, an optimal solution x * of it can be found, under some mild monotonicity conditions on F, in pseudo-polynomial time. We also present an approximation algorithm which finds an approximate binary solution x ε , for any given ε > 0 , such that F ( l 1 ( x * ) , l 2 ( x * ) ) - F ( l 1 ( x ε ) , l 2 ( x ε ) ) < ε , at the cost of no more than O ( n log n + 2 C / ε n ) operations. Though in general C depends on the problem instance, for the problems arising from [en]binarization of categorical variables it depends only on F, and for all functions considered we have C ⩽ 1 / 2 .
ISSN:0166-218X
1872-6771
DOI:10.1016/j.dam.2004.06.006