Exact and approximate discrete optimization algorithms for finding useful disjunctions of categorical predicates in data analysis

We discuss a discrete optimization problem that arises in data analysis from the binarization of categorical attributes. It can be described as the maximization of a function F ( l 1 ( x ) , l 2 ( x ) ) , where l 1 ( x ) and l 2 ( x ) are linear functions of binary variables x ∈ { 0 , 1 } n , and F...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Discrete Applied Mathematics Jg. 144; H. 1; S. 43 - 58
Hauptverfasser: Boros, Endre, Menkov, Vladimir
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier B.V 30.11.2004
Schlagworte:
ISSN:0166-218X, 1872-6771
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We discuss a discrete optimization problem that arises in data analysis from the binarization of categorical attributes. It can be described as the maximization of a function F ( l 1 ( x ) , l 2 ( x ) ) , where l 1 ( x ) and l 2 ( x ) are linear functions of binary variables x ∈ { 0 , 1 } n , and F : R 2 ⟶ R . Though this problem is NP-hard, in general, an optimal solution x * of it can be found, under some mild monotonicity conditions on F, in pseudo-polynomial time. We also present an approximation algorithm which finds an approximate binary solution x ε , for any given ε > 0 , such that F ( l 1 ( x * ) , l 2 ( x * ) ) - F ( l 1 ( x ε ) , l 2 ( x ε ) ) < ε , at the cost of no more than O ( n log n + 2 C / ε n ) operations. Though in general C depends on the problem instance, for the problems arising from [en]binarization of categorical variables it depends only on F, and for all functions considered we have C ⩽ 1 / 2 .
ISSN:0166-218X
1872-6771
DOI:10.1016/j.dam.2004.06.006