Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training

Stochastic gradient descent~(SGD) and its variants have been the dominating optimization methods in machine learning. Compared to SGD with small-batch training, SGD with large-batch training can better utilize the computational power of current multi-core systems such as graphics processing units~(G...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org
Main Authors: Shen-Yi, Zhao, Chang-Wei, Shi, Yin-Peng, Xie, Wu-Jun, Li
Format: Paper
Language:English
Published: Ithaca Cornell University Library, arXiv.org 15.04.2024
Subjects:
ISSN:2331-8422
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Be the first to leave a comment!
You must be logged in first