Distributed Pareto Optimization for Large-Scale Noisy Subset Selection

Subset selection, aiming to select the best subset from a ground set with respect to some objective function, is a fundamental problem with applications in many areas, such as combinatorial optimization, machine learning, data mining, computer vision, information retrieval, etc. Along with the devel...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on evolutionary computation Ročník 24; číslo 4; s. 694 - 707
Hlavní autor: Qian, Chao
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.08.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:1089-778X, 1941-0026
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Subset selection, aiming to select the best subset from a ground set with respect to some objective function, is a fundamental problem with applications in many areas, such as combinatorial optimization, machine learning, data mining, computer vision, information retrieval, etc. Along with the development of data collection and storage, the size of the ground set grows larger. Furthermore, in many subset selection applications, the objective function evaluation is subject to noise. We thus study the large-scale noisy subset selection problem in this paper. The recently proposed DPOSS algorithm based on multiobjective evolutionary optimization is a powerful distributed solver for large-scale subset selection. Its performance, however, has been only validated in the noise-free environment. In this paper, we first prove its approximation guarantee under two common noise models, i.e., multiplicative noise and additive noise, disclosing that the presence of noise degrades the performance of DPOSS largely. Next, we propose a new distributed multiobjective evolutionary algorithm called DPONSS for large-scale noisy subset selection. We prove that the approximation guarantee of DPONSS under noise is significantly better than that of DPOSS. We also conduct experiments on the application of sparse regression, where the objective evaluation is often estimated using a sample data, bringing noise. The results on various real-world data sets, whose size can reach millions, clearly show the excellent performance of DPONSS.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1089-778X
1941-0026
DOI:10.1109/TEVC.2019.2929555