Adaptive stochastic conjugate gradient for machine learning

Due to their faster convergence rate than gradient descent algorithms and less computational cost than second order algorithms, conjugate gradient (CG) algorithms have been widely used in machine learning. This paper considers conjugate gradient in the mini-batch setting. Concretely, we propose a st...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Expert systems with applications Ročník 206; s. 117719
Hlavný autor:	Yang, Zhuang
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Elsevier Ltd 15.11.2022
Predmet:	Large scale learning Recursive iteration Second-order methods Stochastic conjugate gradient Sub sampling Stochastic conjugate gradient Sub sampling Recursive iteration Large scale learning Second-order methods
ISSN:	0957-4174, 1873-6793
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Due to their faster convergence rate than gradient descent algorithms and less computational cost than second order algorithms, conjugate gradient (CG) algorithms have been widely used in machine learning. This paper considers conjugate gradient in the mini-batch setting. Concretely, we propose a stable adaptive stochastic conjugate gradient (SCG) algorithm via incorporating both the stochastic recursive gradient algorithm (SARAH) and second order information into the CG-type algorithm. Unlike most of existing CG algorithms that spend a lot of time in determining the step size by using line search and may fail in stochastic optimization, the proposed algorithms use a local quadratic model to estimate the step size sequence, but do not require computing the Hessian information, which make the proposed algorithms attain a low computational cost as first-order algorithms. We establish the linear convergence rate of a class of SCG algorithms, when the loss function is the strongly convex. Moreover, we show that the complexity of the proposed algorithm matches modern stochastic optimization algorithms. As a by-product, we develop a practical variant of the proposed algorithm by setting a stopping criterion for the number of inner loop iterations. Various numerical experiments on machine learning problems demonstrate the efficiency of the proposed algorithms. •The efficacy of conjugate gradient with noisy gradients is verified.•The linear convergence result of our method for strongly convex cases is obtained.•Our methods outperform several modern stochastic optimization methods.
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2022.117719