Sample size selection in optimization methods for machine learning

This paper presents a methodology for using varying sample sizes in batch-type optimization methods for large-scale machine learning problems. The first part of the paper deals with the delicate issue of dynamic sample selection in the evaluation of the function and gradient. We propose a criterion...

Full description

Saved in:
Bibliographic Details
Published in:Mathematical programming Vol. 134; no. 1; pp. 127 - 155
Main Authors: Byrd, Richard H., Chin, Gillian M., Nocedal, Jorge, Wu, Yuchen
Format: Journal Article Conference Proceeding
Language:English
Published: Berlin/Heidelberg Springer-Verlag 01.08.2012
Springer
Springer Nature B.V
Subjects:
ISSN:0025-5610, 1436-4646
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper presents a methodology for using varying sample sizes in batch-type optimization methods for large-scale machine learning problems. The first part of the paper deals with the delicate issue of dynamic sample selection in the evaluation of the function and gradient. We propose a criterion for increasing the sample size based on variance estimates obtained during the computation of a batch gradient. We establish an complexity bound on the total cost of a gradient method. The second part of the paper describes a practical Newton method that uses a smaller sample to compute Hessian vector-products than to evaluate the function and the gradient, and that also employs a dynamic sampling technique. The focus of the paper shifts in the third part of the paper to L 1 -regularized problems designed to produce sparse solutions. We propose a Newton-like method that consists of two phases: a (minimalistic) gradient projection phase that identifies zero variables, and subspace phase that applies a subsampled Hessian Newton iteration in the free variables. Numerical tests on speech recognition problems illustrate the performance of the algorithms.
Bibliography:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-2
content type line 23
ISSN:0025-5610
1436-4646
DOI:10.1007/s10107-012-0572-5