Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training

Stochastic gradient descent~(SGD) and its variants have been the dominating optimization methods in machine learning. Compared to SGD with small-batch training, SGD with large-batch training can better utilize the computational power of current multi-core systems such as graphics processing units~(G...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org
Main Authors:	Shen-Yi, Zhao, Chang-Wei, Shi, Yin-Peng, Xie, Wu-Jun, Li
Format:	Paper
Language:	English
Published:	Ithaca Cornell University Library, arXiv.org 15.04.2024
Subjects:	Computation Machine learning Momentum Optimization Training
ISSN:	2331-8422
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Be the first to leave a comment!