Adaptive stochastic conjugate gradient for machine learning

Due to their faster convergence rate than gradient descent algorithms and less computational cost than second order algorithms, conjugate gradient (CG) algorithms have been widely used in machine learning. This paper considers conjugate gradient in the mini-batch setting. Concretely, we propose a st...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications Vol. 206; p. 117719
Main Author: Yang, Zhuang
Format: Journal Article
Language:English
Published: Elsevier Ltd 15.11.2022
Subjects:
ISSN:0957-4174, 1873-6793
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Due to their faster convergence rate than gradient descent algorithms and less computational cost than second order algorithms, conjugate gradient (CG) algorithms have been widely used in machine learning. This paper considers conjugate gradient in the mini-batch setting. Concretely, we propose a stable adaptive stochastic conjugate gradient (SCG) algorithm via incorporating both the stochastic recursive gradient algorithm (SARAH) and second order information into the CG-type algorithm. Unlike most of existing CG algorithms that spend a lot of time in determining the step size by using line search and may fail in stochastic optimization, the proposed algorithms use a local quadratic model to estimate the step size sequence, but do not require computing the Hessian information, which make the proposed algorithms attain a low computational cost as first-order algorithms. We establish the linear convergence rate of a class of SCG algorithms, when the loss function is the strongly convex. Moreover, we show that the complexity of the proposed algorithm matches modern stochastic optimization algorithms. As a by-product, we develop a practical variant of the proposed algorithm by setting a stopping criterion for the number of inner loop iterations. Various numerical experiments on machine learning problems demonstrate the efficiency of the proposed algorithms. •The efficacy of conjugate gradient with noisy gradients is verified.•The linear convergence result of our method for strongly convex cases is obtained.•Our methods outperform several modern stochastic optimization methods.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2022.117719