Parallel and distributed asynchronous adaptive stochastic gradient methods

Stochastic gradient methods (SGMs) are the predominant approaches to train deep learning models. The adaptive versions (e.g., Adam and AMSGrad) have been extensively used in practice, partly because they achieve faster convergence than the non-adaptive versions while incurring little overhead. On th...

Full description

Saved in:
Bibliographic Details
Published in:Mathematical programming computation Vol. 15; no. 3; pp. 471 - 508
Main Authors: Xu, Yangyang, Xu, Yibo, Yan, Yonggui, Sutcher-Shepard, Colin, Grinberg, Leopold, Chen, Jie
Format: Journal Article
Language:English
Published: Berlin/Heidelberg Springer Berlin Heidelberg 01.09.2023
Springer Nature B.V
Subjects:
ISSN:1867-2949, 1867-2957
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Be the first to leave a comment!
You must be logged in first