Adaptive stochastic conjugate gradient for machine learning
Due to their faster convergence rate than gradient descent algorithms and less computational cost than second order algorithms, conjugate gradient (CG) algorithms have been widely used in machine learning. This paper considers conjugate gradient in the mini-batch setting. Concretely, we propose a st...
Uložené v:
| Vydané v: | Expert systems with applications Ročník 206; s. 117719 |
|---|---|
| Hlavný autor: | |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Elsevier Ltd
15.11.2022
|
| Predmet: | |
| ISSN: | 0957-4174, 1873-6793 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | Due to their faster convergence rate than gradient descent algorithms and less computational cost than second order algorithms, conjugate gradient (CG) algorithms have been widely used in machine learning. This paper considers conjugate gradient in the mini-batch setting. Concretely, we propose a stable adaptive stochastic conjugate gradient (SCG) algorithm via incorporating both the stochastic recursive gradient algorithm (SARAH) and second order information into the CG-type algorithm. Unlike most of existing CG algorithms that spend a lot of time in determining the step size by using line search and may fail in stochastic optimization, the proposed algorithms use a local quadratic model to estimate the step size sequence, but do not require computing the Hessian information, which make the proposed algorithms attain a low computational cost as first-order algorithms. We establish the linear convergence rate of a class of SCG algorithms, when the loss function is the strongly convex. Moreover, we show that the complexity of the proposed algorithm matches modern stochastic optimization algorithms. As a by-product, we develop a practical variant of the proposed algorithm by setting a stopping criterion for the number of inner loop iterations. Various numerical experiments on machine learning problems demonstrate the efficiency of the proposed algorithms.
•The efficacy of conjugate gradient with noisy gradients is verified.•The linear convergence result of our method for strongly convex cases is obtained.•Our methods outperform several modern stochastic optimization methods. |
|---|---|
| ISSN: | 0957-4174 1873-6793 |
| DOI: | 10.1016/j.eswa.2022.117719 |