SARAH-M: A fast stochastic recursive gradient descent algorithm via momentum

As a simple but effective way, the momentum method has been widely adopted in stochastic optimization algorithms for large-scale machine learning problems and the success of stochastic optimization with the momentum term for many applications in machine learning and other related areas has been repo...

Full description

Saved in:

Bibliographic Details
Published in:	Expert systems with applications Vol. 238; p. 122295
Main Author:	Yang, Zhuang
Format:	Journal Article
Language:	English
Published:	Elsevier Ltd 15.03.2024
Subjects:	Adaptive step size Machine learning Momentum Stochastic optimization Variance reduction Adaptive step size Momentum Stochastic optimization Machine learning Variance reduction
ISSN:	0957-4174
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	As a simple but effective way, the momentum method has been widely adopted in stochastic optimization algorithms for large-scale machine learning problems and the success of stochastic optimization with the momentum term for many applications in machine learning and other related areas has been reported everywhere. However, the understanding of how the momentum improves the performance of modern variance reduced stochastic gradient algorithms, e.g., the stochastic dual coordinate ascent average gradient (SDCA) method, the stochastically controlled stochastic gradient (SCSG) method, the stochastic recursive gradient algorithm (SARAH), etc., is still limited. To tackle this issue, this work studies the performance of SARAH with the momentum term theoretically and empirically, and develops a novel variance reduced stochastic gradient algorithm, termed as SARAH-M. We rigorously prove that SARAH-M attains a linear rate of convergence for minimizing the strongly convex function. We further propose an adaptive SARAH-M method (abbreviated as AdaSARAH-M) by incorporating the random Barzilai–Borwein (RBB) technique into SARAH-M, which provides an easy way to determine the step size for the original SARAH-M algorithm. The theoretical analysis that shows AdaSARAH-M with a linear convergence speed is also provided. Moreover, we show that the complexity of the proposed algorithms can outperform modern stochastic optimization algorithms. Finally, the numerical results, compared with state-of-the-art algorithms on benchmarking machine learning problems, verify the efficacy of the momentum in variance reduced stochastic gradient algorithms. •The efficacy of the variance reduced method with momentum is verified.•An adaptive variance reduced method with momentum is proposed.•The convergence properties of the proposed methods are provided.•Experimental results show great promise in standard machine learning tasks.
ISSN:	0957-4174
DOI:	10.1016/j.eswa.2023.122295