Sample size selection in optimization methods for machine learning

This paper presents a methodology for using varying sample sizes in batch-type optimization methods for large-scale machine learning problems. The first part of the paper deals with the delicate issue of dynamic sample selection in the evaluation of the function and gradient. We propose a criterion...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Mathematical programming Ročník 134; číslo 1; s. 127 - 155
Hlavní autoři: Byrd, Richard H., Chin, Gillian M., Nocedal, Jorge, Wu, Yuchen
Médium: Journal Article Konferenční příspěvek
Jazyk:angličtina
Vydáno: Berlin/Heidelberg Springer-Verlag 01.08.2012
Springer
Springer Nature B.V
Témata:
ISSN:0025-5610, 1436-4646
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:This paper presents a methodology for using varying sample sizes in batch-type optimization methods for large-scale machine learning problems. The first part of the paper deals with the delicate issue of dynamic sample selection in the evaluation of the function and gradient. We propose a criterion for increasing the sample size based on variance estimates obtained during the computation of a batch gradient. We establish an complexity bound on the total cost of a gradient method. The second part of the paper describes a practical Newton method that uses a smaller sample to compute Hessian vector-products than to evaluate the function and the gradient, and that also employs a dynamic sampling technique. The focus of the paper shifts in the third part of the paper to L 1 -regularized problems designed to produce sparse solutions. We propose a Newton-like method that consists of two phases: a (minimalistic) gradient projection phase that identifies zero variables, and subspace phase that applies a subsampled Hessian Newton iteration in the free variables. Numerical tests on speech recognition problems illustrate the performance of the algorithms.
Bibliografie:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-2
content type line 23
ISSN:0025-5610
1436-4646
DOI:10.1007/s10107-012-0572-5