Unified Algorithm Framework for Nonconvex Stochastic Optimization in Deep Neural Networks

This paper presents a unified algorithmic framework for nonconvex stochastic optimization, which is needed to train deep neural networks. The unified algorithm includes the existing adaptive-learning-rate optimization algorithms, such as Adaptive Moment Estimation (Adam), Adaptive Mean Square Gradie...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	IEEE access Ročník 9; s. 143807 - 143823
Hlavní autori:	Zhu, Yini, Iiduka, Hideaki
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Piscataway IEEE 2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:	Adam Adaptive algorithms Adaptive learning adaptive-learning-rate optimization algorithm AMSGrad AMSGWDC Approximation algorithms Artificial neural networks Convergence Convex functions Deep learning deep neural network GWDC Heuristic algorithms heuristic intelligent optimization methods learning rate Machine learning Neural networks nonconvex stochastic optimization Optimization Optimization algorithms stationary point problem Symmetric matrices
ISSN:	2169-3536, 2169-3536
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	This paper presents a unified algorithmic framework for nonconvex stochastic optimization, which is needed to train deep neural networks. The unified algorithm includes the existing adaptive-learning-rate optimization algorithms, such as Adaptive Moment Estimation (Adam), Adaptive Mean Square Gradient (AMSGrad), Adam with weighted gradient and dynamic bound of learning rate (GWDC), AMSGrad with weighted gradient and dynamic bound of learning rate (AMSGWDC), and Adapting stepsizes by the belief in observed gradients (AdaBelief). The paper also gives convergence analyses of the unified algorithm for constant and diminishing learning rates. When using a constant learning rate, the algorithm can approximate a stationary point of a nonconvex stochastic optimization problem. When using a diminishing rate, it converges to a stationary point of the problem. Hence, the analyses lead to the finding that the existing adaptive-learning-rate optimization algorithms can be applied to nonconvex stochastic optimization in deep neural networks in theory. Additionally, this paper provides numerical results showing that the unified algorithm can train deep neural networks in practice. Moreover, it provides numerical comparisons for unconstrained minimization using benchmark functions of the unified algorithm with certain heuristic intelligent optimization algorithms. The numerical comparisons show that a teaching-learning-based optimization algorithm and the unified algorithm perform well.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2021.3120749