Parallel and distributed asynchronous adaptive stochastic gradient methods

Stochastic gradient methods (SGMs) are the predominant approaches to train deep learning models. The adaptive versions (e.g., Adam and AMSGrad) have been extensively used in practice, partly because they achieve faster convergence than the non-adaptive versions while incurring little overhead. On th...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Mathematical programming computation Ročník 15; číslo 3; s. 471 - 508
Hlavní autori:	Xu, Yangyang, Xu, Yibo, Yan, Yonggui, Sutcher-Shepard, Colin, Grinberg, Leopold, Chen, Jie
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Berlin/Heidelberg Springer Berlin Heidelberg 01.09.2023 Springer Nature B.V
Predmet:	Convergence Deep learning Full Length Paper Machine learning Mathematics Mathematics and Statistics Mathematics of Computing Operations Research/Decision Theory Optimization Staling Theory of Computation Deep learning 65K05 Stochastic gradient method 90C15 68W15 Adaptive learning rate 65Y05
ISSN:	1867-2949, 1867-2957
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Buďte prvý, kto okomentuje tento záznam!