Exact and approximate discrete optimization algorithms for finding useful disjunctions of categorical predicates in data analysis

We discuss a discrete optimization problem that arises in data analysis from the binarization of categorical attributes. It can be described as the maximization of a function F ( l 1 ( x ) , l 2 ( x ) ) , where l 1 ( x ) and l 2 ( x ) are linear functions of binary variables x ∈ { 0 , 1 } n , and F...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Discrete Applied Mathematics Jg. 144; H. 1; S. 43 - 58
Hauptverfasser:	Boros, Endre, Menkov, Vladimir
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Elsevier B.V 30.11.2004
Schlagworte:	Binary optimization Feature generation Machine learning Binary optimization Feature generation Machine learning
ISSN:	0166-218X, 1872-6771
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We discuss a discrete optimization problem that arises in data analysis from the binarization of categorical attributes. It can be described as the maximization of a function F ( l 1 ( x ) , l 2 ( x ) ) , where l 1 ( x ) and l 2 ( x ) are linear functions of binary variables x ∈ { 0 , 1 } n , and F : R 2 ⟶ R . Though this problem is NP-hard, in general, an optimal solution x * of it can be found, under some mild monotonicity conditions on F, in pseudo-polynomial time. We also present an approximation algorithm which finds an approximate binary solution x ε , for any given ε > 0 , such that F ( l 1 ( x * ) , l 2 ( x * ) ) - F ( l 1 ( x ε ) , l 2 ( x ε ) ) < ε , at the cost of no more than O ( n log n + 2 C / ε n ) operations. Though in general C depends on the problem instance, for the problems arising from [en]binarization of categorical variables it depends only on F, and for all functions considered we have C ⩽ 1 / 2 .
ISSN:	0166-218X 1872-6771
DOI:	10.1016/j.dam.2004.06.006