Parallel and distributed asynchronous adaptive stochastic gradient methods

Stochastic gradient methods (SGMs) are the predominant approaches to train deep learning models. The adaptive versions (e.g., Adam and AMSGrad) have been extensively used in practice, partly because they achieve faster convergence than the non-adaptive versions while incurring little overhead. On th...

Full description

Saved in:

Bibliographic Details
Published in:	Mathematical programming computation Vol. 15; no. 3; pp. 471 - 508
Main Authors:	Xu, Yangyang, Xu, Yibo, Yan, Yonggui, Sutcher-Shepard, Colin, Grinberg, Leopold, Chen, Jie
Format:	Journal Article
Language:	English
Published:	Berlin/Heidelberg Springer Berlin Heidelberg 01.09.2023 Springer Nature B.V
Subjects:	Convergence Deep learning Full Length Paper Machine learning Mathematics Mathematics and Statistics Mathematics of Computing Operations Research/Decision Theory Optimization Staling Theory of Computation Deep learning 65K05 Stochastic gradient method 90C15 68W15 Adaptive learning rate 65Y05
ISSN:	1867-2949, 1867-2957
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Be the first to leave a comment!