Parallel and distributed asynchronous adaptive stochastic gradient methods
Stochastic gradient methods (SGMs) are the predominant approaches to train deep learning models. The adaptive versions (e.g., Adam and AMSGrad) have been extensively used in practice, partly because they achieve faster convergence than the non-adaptive versions while incurring little overhead. On th...
Saved in:
| Published in: | Mathematical programming computation Vol. 15; no. 3; pp. 471 - 508 |
|---|---|
| Main Authors: | , , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Berlin/Heidelberg
Springer Berlin Heidelberg
01.09.2023
Springer Nature B.V |
| Subjects: | |
| ISSN: | 1867-2949, 1867-2957 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Be the first to leave a comment!