Large-scale robust regression with truncated loss via majorization-minimization algorithm
The utilization of regression methods employing truncated loss functions is widely praised for its robustness in handling outliers and representing the solution in the sparse form of the samples. However, due to the non-convexity of the truncated loss, the commonly used algorithms such as difference...
Saved in:
| Published in: | European journal of operational research Vol. 319; no. 2; pp. 494 - 504 |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier B.V
01.12.2024
|
| Subjects: | |
| ISSN: | 0377-2217 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The utilization of regression methods employing truncated loss functions is widely praised for its robustness in handling outliers and representing the solution in the sparse form of the samples. However, due to the non-convexity of the truncated loss, the commonly used algorithms such as difference of convex algorithm (DCA) fail to maintain sparsity when dealing with non-convex loss functions, and adapting DCA for efficient optimization also incurs additional development costs. To address these challenges, we propose a novel approach called truncated loss regression via majorization-minimization algorithm (TLRM). TLRM employs a surrogate function to approximate the original truncated loss regression and offers several desirable properties: (i) Eliminating outliers before the training process and encapsulating general convex loss regression within its structure as iterative subproblems, (ii) Solving the convex loss problem iteratively thereby facilitating the use of a well-established toolbox for convex optimization. (iii) Converging to a truncated loss regression and providing a solution with sample sparsity. Extensive experiments demonstrate that TLRM achieves superior sparsity without sacrificing robustness, and it can be several tens of thousands of times faster than traditional DCA on large-scale problems. Moreover, TLRM is also applicable to datasets with millions of samples, making it a practical choice for real-world scenarios. The codebase for methods with truncated loss functions is accessible at https://i-do-lab.github.io/optimal-group.org/Resources/Code/TLRM.html.
•Propose an algorithm frame (TLRM) for general truncation loss regression.•Eliminate outliers before the training process to ensure sparsity.•Unveil the intrinsic connection between convex loss and truncated loss.•Enhance efficiency and scalability with well-established convex algorithms.•Experiments verify TLRM excels in sparsity, scalability, efficiency, reliability. |
|---|---|
| ISSN: | 0377-2217 |
| DOI: | 10.1016/j.ejor.2024.04.028 |