Proximal recursive generalized hyper-gradient descent method
This paper focuses on the non-convex, non-smooth composite optimization problem. It consists of a non-convex loss function and a non-smooth regularizer function that admits a proximal mapping. However, the method is still limited in handling objective functions that involve non-smooth regularizer. H...
Saved in:
| Published in: | Applied soft computing Vol. 175; p. 113073 |
|---|---|
| Main Authors: | , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier B.V
01.05.2025
|
| Subjects: | |
| ISSN: | 1568-4946 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This paper focuses on the non-convex, non-smooth composite optimization problem. It consists of a non-convex loss function and a non-smooth regularizer function that admits a proximal mapping. However, the method is still limited in handling objective functions that involve non-smooth regularizer. How to determine the step size for solving composite optimization problems can be a challenge. To address this gap, we propose a recursive gradient descent algorithm using generalized hyper-gradient descent, named ProxSarah-GHD, which utilizes variance reduction techniques and provides update rules for adaptive step sizes. To improve its generalization in proximal gradient descent, a generalized variant of hyper-gradient descent, named Generalized Hyper-gradient Descent (GHD), is proposed in this paper. We prove that ProxSarah-GHD attains a linear convergence rate. Moreover, we provide the oracle complexity of ProxSarah-GHD as Oϵ−3 and Onϵ−2+n in the online setting and finite-sum setting, respectively. In addition, to avoid the trouble of manually adjusting the batch size, we develop a novel Exponentially Increasing Mini-batch scheme for ProxSarah-GHD, named ProxSarah-GHD-EIM. The theoretical analysis that shows ProxSarah-GHD-EIM achieves a linear convergence rate is also provided, and shows that its total complexity is Oϵ−4+ϵ−2 and On+ϵ−4+ϵ−2 in the online setting and finite-sum setting, respectively. Numerical experiments on standard datasets verify the superiority of the ProxSarah-GHD over other methods. We further analyze the sensitivity of the ProxSarah-GHD-EIM to its hyperparameters, conducting experiments on standard datasets.
•A generalized hyper-gradient descent method is proposed.•A proximal recursive gradient algorithm with adaptive step size for non-convex optimization is proposed.•An exponentially increasing mini-batch size method is proposed. |
|---|---|
| ISSN: | 1568-4946 |
| DOI: | 10.1016/j.asoc.2025.113073 |