Parallel and distributed asynchronous adaptive stochastic gradient methods

Stochastic gradient methods (SGMs) are the predominant approaches to train deep learning models. The adaptive versions (e.g., Adam and AMSGrad) have been extensively used in practice, partly because they achieve faster convergence than the non-adaptive versions while incurring little overhead. On th...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Mathematical programming computation Ročník 15; číslo 3; s. 471 - 508
Hlavní autoři:	Xu, Yangyang, Xu, Yibo, Yan, Yonggui, Sutcher-Shepard, Colin, Grinberg, Leopold, Chen, Jie
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Berlin/Heidelberg Springer Berlin Heidelberg 01.09.2023 Springer Nature B.V
Témata:	Convergence Deep learning Full Length Paper Machine learning Mathematics Mathematics and Statistics Mathematics of Computing Operations Research/Decision Theory Optimization Staling Theory of Computation Deep learning 65K05 Stochastic gradient method 90C15 68W15 Adaptive learning rate 65Y05
ISSN:	1867-2949, 1867-2957
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Buďte první, kdo okomentuje tento záznam!