Distributed Pareto Optimization for Large-Scale Noisy Subset Selection

Subset selection, aiming to select the best subset from a ground set with respect to some objective function, is a fundamental problem with applications in many areas, such as combinatorial optimization, machine learning, data mining, computer vision, information retrieval, etc. Along with the devel...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on evolutionary computation Vol. 24; no. 4; pp. 694 - 707
Main Author: Qian, Chao
Format: Journal Article
Language:English
Published: New York IEEE 01.08.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:1089-778X, 1941-0026
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Subset selection, aiming to select the best subset from a ground set with respect to some objective function, is a fundamental problem with applications in many areas, such as combinatorial optimization, machine learning, data mining, computer vision, information retrieval, etc. Along with the development of data collection and storage, the size of the ground set grows larger. Furthermore, in many subset selection applications, the objective function evaluation is subject to noise. We thus study the large-scale noisy subset selection problem in this paper. The recently proposed DPOSS algorithm based on multiobjective evolutionary optimization is a powerful distributed solver for large-scale subset selection. Its performance, however, has been only validated in the noise-free environment. In this paper, we first prove its approximation guarantee under two common noise models, i.e., multiplicative noise and additive noise, disclosing that the presence of noise degrades the performance of DPOSS largely. Next, we propose a new distributed multiobjective evolutionary algorithm called DPONSS for large-scale noisy subset selection. We prove that the approximation guarantee of DPONSS under noise is significantly better than that of DPOSS. We also conduct experiments on the application of sparse regression, where the objective evaluation is often estimated using a sample data, bringing noise. The results on various real-world data sets, whose size can reach millions, clearly show the excellent performance of DPONSS.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1089-778X
1941-0026
DOI:10.1109/TEVC.2019.2929555