Exact and approximate discrete optimization algorithms for finding useful disjunctions of categorical predicates in data analysis

We discuss a discrete optimization problem that arises in data analysis from the binarization of categorical attributes. It can be described as the maximization of a function F ( l 1 ( x ) , l 2 ( x ) ) , where l 1 ( x ) and l 2 ( x ) are linear functions of binary variables x ∈ { 0 , 1 } n , and F...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Discrete Applied Mathematics Ročník 144; číslo 1; s. 43 - 58
Hlavní autori:	Boros, Endre, Menkov, Vladimir
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Elsevier B.V 30.11.2004
Predmet:	Binary optimization Feature generation Machine learning Binary optimization Feature generation Machine learning
ISSN:	0166-218X, 1872-6771
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	We discuss a discrete optimization problem that arises in data analysis from the binarization of categorical attributes. It can be described as the maximization of a function F ( l 1 ( x ) , l 2 ( x ) ) , where l 1 ( x ) and l 2 ( x ) are linear functions of binary variables x ∈ { 0 , 1 } n , and F : R 2 ⟶ R . Though this problem is NP-hard, in general, an optimal solution x * of it can be found, under some mild monotonicity conditions on F, in pseudo-polynomial time. We also present an approximation algorithm which finds an approximate binary solution x ε , for any given ε > 0 , such that F ( l 1 ( x * ) , l 2 ( x * ) ) - F ( l 1 ( x ε ) , l 2 ( x ε ) ) < ε , at the cost of no more than O ( n log n + 2 C / ε n ) operations. Though in general C depends on the problem instance, for the problems arising from [en]binarization of categorical variables it depends only on F, and for all functions considered we have C ⩽ 1 / 2 .
ISSN:	0166-218X 1872-6771
DOI:	10.1016/j.dam.2004.06.006