Sample size selection in optimization methods for machine learning

This paper presents a methodology for using varying sample sizes in batch-type optimization methods for large-scale machine learning problems. The first part of the paper deals with the delicate issue of dynamic sample selection in the evaluation of the function and gradient. We propose a criterion...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Mathematical programming Ročník 134; číslo 1; s. 127 - 155
Hlavní autoři:	Byrd, Richard H., Chin, Gillian M., Nocedal, Jorge, Wu, Yuchen
Médium:	Journal Article Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	Berlin/Heidelberg Springer-Verlag 01.08.2012 Springer Springer Nature B.V
Témata:	Algorithms Alliances Applied sciences Approximation Calculus of variations and optimal control Calculus of Variations and Optimal Control; Optimization Combinatorics Datasets Exact sciences and technology Full Length Paper Input output Machine learning Mathematical analysis Mathematical and Computational Physics Mathematical Methods in Physics Mathematical programming Mathematics Mathematics and Statistics Mathematics of Computing Methods Numerical Analysis Operational research and scientific management Operational research. Management science Optimization Optimization algorithms Sample size Sample variance Sampling techniques Sciences and techniques of general use Studies Theoretical Voice recognition 65K05 49M15 49M37 Costs Sample size Iterative method Non linear programming Lot sizing Dimensioning Function evaluation Vector space Learning (artificial intelligence) Batch process Speech recognition Batch production Sampling Newton method Gradient method Mathematical programming Large scale system
ISSN:	0025-5610, 1436-4646
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	This paper presents a methodology for using varying sample sizes in batch-type optimization methods for large-scale machine learning problems. The first part of the paper deals with the delicate issue of dynamic sample selection in the evaluation of the function and gradient. We propose a criterion for increasing the sample size based on variance estimates obtained during the computation of a batch gradient. We establish an complexity bound on the total cost of a gradient method. The second part of the paper describes a practical Newton method that uses a smaller sample to compute Hessian vector-products than to evaluate the function and the gradient, and that also employs a dynamic sampling technique. The focus of the paper shifts in the third part of the paper to L 1 -regularized problems designed to produce sparse solutions. We propose a Newton-like method that consists of two phases: a (minimalistic) gradient projection phase that identifies zero variables, and subspace phase that applies a subsampled Hessian Newton iteration in the free variables. Numerical tests on speech recognition problems illustrate the performance of the algorithms.
Bibliografie:	SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-2 content type line 23
ISSN:	0025-5610 1436-4646
DOI:	10.1007/s10107-012-0572-5