Proximal recursive generalized hyper-gradient descent method
This paper focuses on the non-convex, non-smooth composite optimization problem. It consists of a non-convex loss function and a non-smooth regularizer function that admits a proximal mapping. However, the method is still limited in handling objective functions that involve non-smooth regularizer. H...
Uložené v:
| Vydané v: | Applied soft computing Ročník 175; s. 113073 |
|---|---|
| Hlavní autori: | , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Elsevier B.V
01.05.2025
|
| Predmet: | |
| ISSN: | 1568-4946 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | This paper focuses on the non-convex, non-smooth composite optimization problem. It consists of a non-convex loss function and a non-smooth regularizer function that admits a proximal mapping. However, the method is still limited in handling objective functions that involve non-smooth regularizer. How to determine the step size for solving composite optimization problems can be a challenge. To address this gap, we propose a recursive gradient descent algorithm using generalized hyper-gradient descent, named ProxSarah-GHD, which utilizes variance reduction techniques and provides update rules for adaptive step sizes. To improve its generalization in proximal gradient descent, a generalized variant of hyper-gradient descent, named Generalized Hyper-gradient Descent (GHD), is proposed in this paper. We prove that ProxSarah-GHD attains a linear convergence rate. Moreover, we provide the oracle complexity of ProxSarah-GHD as Oϵ−3 and Onϵ−2+n in the online setting and finite-sum setting, respectively. In addition, to avoid the trouble of manually adjusting the batch size, we develop a novel Exponentially Increasing Mini-batch scheme for ProxSarah-GHD, named ProxSarah-GHD-EIM. The theoretical analysis that shows ProxSarah-GHD-EIM achieves a linear convergence rate is also provided, and shows that its total complexity is Oϵ−4+ϵ−2 and On+ϵ−4+ϵ−2 in the online setting and finite-sum setting, respectively. Numerical experiments on standard datasets verify the superiority of the ProxSarah-GHD over other methods. We further analyze the sensitivity of the ProxSarah-GHD-EIM to its hyperparameters, conducting experiments on standard datasets.
•A generalized hyper-gradient descent method is proposed.•A proximal recursive gradient algorithm with adaptive step size for non-convex optimization is proposed.•An exponentially increasing mini-batch size method is proposed. |
|---|---|
| ISSN: | 1568-4946 |
| DOI: | 10.1016/j.asoc.2025.113073 |