A non-monotone trust-region method with noisy oracles and additional sampling

In this work, we introduce a novel stochastic second-order method, within the framework of a non-monotone trust-region approach, for solving the unconstrained, nonlinear, and non-convex optimization problems arising in the training of deep neural networks. The proposed algorithm makes use of subsamp...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Computational optimization and applications Ročník 89; číslo 1; s. 247 - 278
Hlavní autoři:	Krejić, Nataša, Krklec Jerinkić, Nataša, Martínez, Ángeles, Yousefi, Mahsa
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	New York Springer US 01.09.2024 Springer Nature B.V
Témata:	Adaptive algorithms Adaptive sampling Algorithms Approximation Artificial neural networks Convergence Convex and Discrete Geometry Convexity Error analysis Image classification Management Science Mathematics Mathematics and Statistics Neural networks Operations Research Operations Research/Decision Theory Optimization Sample size State-of-the-art reviews Statistics 90C53 90C30 65K05 Adaptive sampling Second-order methods Deep neural networks training 90C06 90C90 Stochastic optimization Non-monotone trust-region Quasi-Newton
ISSN:	0926-6003, 1573-2894
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	In this work, we introduce a novel stochastic second-order method, within the framework of a non-monotone trust-region approach, for solving the unconstrained, nonlinear, and non-convex optimization problems arising in the training of deep neural networks. The proposed algorithm makes use of subsampling strategies that yield noisy approximations of the finite sum objective function and its gradient. We introduce an adaptive sample size strategy based on inexpensive additional sampling to control the resulting approximation error. Depending on the estimated progress of the algorithm, this can yield sample size scenarios ranging from mini-batch to full sample functions. We provide convergence analysis for all possible scenarios and show that the proposed method achieves almost sure convergence under standard assumptions for the trust-region framework. We report numerical experiments showing that the proposed algorithm outperforms its state-of-the-art counterpart in deep neural network training for image classification and regression tasks while requiring a significantly smaller number of gradient evaluations.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0926-6003 1573-2894
DOI:	10.1007/s10589-024-00580-w