Proximal recursive generalized hyper-gradient descent method

This paper focuses on the non-convex, non-smooth composite optimization problem. It consists of a non-convex loss function and a non-smooth regularizer function that admits a proximal mapping. However, the method is still limited in handling objective functions that involve non-smooth regularizer. H...

Full description

Saved in:
Bibliographic Details
Published in:Applied soft computing Vol. 175; p. 113073
Main Authors: Zhang, Hao, Lu, Shuxia
Format: Journal Article
Language:English
Published: Elsevier B.V 01.05.2025
Subjects:
ISSN:1568-4946
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper focuses on the non-convex, non-smooth composite optimization problem. It consists of a non-convex loss function and a non-smooth regularizer function that admits a proximal mapping. However, the method is still limited in handling objective functions that involve non-smooth regularizer. How to determine the step size for solving composite optimization problems can be a challenge. To address this gap, we propose a recursive gradient descent algorithm using generalized hyper-gradient descent, named ProxSarah-GHD, which utilizes variance reduction techniques and provides update rules for adaptive step sizes. To improve its generalization in proximal gradient descent, a generalized variant of hyper-gradient descent, named Generalized Hyper-gradient Descent (GHD), is proposed in this paper. We prove that ProxSarah-GHD attains a linear convergence rate. Moreover, we provide the oracle complexity of ProxSarah-GHD as Oϵ−3 and Onϵ−2+n in the online setting and finite-sum setting, respectively. In addition, to avoid the trouble of manually adjusting the batch size, we develop a novel Exponentially Increasing Mini-batch scheme for ProxSarah-GHD, named ProxSarah-GHD-EIM. The theoretical analysis that shows ProxSarah-GHD-EIM achieves a linear convergence rate is also provided, and shows that its total complexity is Oϵ−4+ϵ−2 and On+ϵ−4+ϵ−2 in the online setting and finite-sum setting, respectively. Numerical experiments on standard datasets verify the superiority of the ProxSarah-GHD over other methods. We further analyze the sensitivity of the ProxSarah-GHD-EIM to its hyperparameters, conducting experiments on standard datasets. •A generalized hyper-gradient descent method is proposed.•A proximal recursive gradient algorithm with adaptive step size for non-convex optimization is proposed.•An exponentially increasing mini-batch size method is proposed.
ISSN:1568-4946
DOI:10.1016/j.asoc.2025.113073