A Hybrid Stochastic-Deterministic Minibatch Proximal Gradient Method for Efficient Optimization and Generalization

Despite the success of stochastic variance-reduced gradient (SVRG) algorithms in solving large-scale problems, their stochastic gradient complexity often scales linearly with data size and is expensive for huge data. Accordingly, we propose a hybrid stochastic-deterministic minibatch proximal gradie...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	IEEE transactions on pattern analysis and machine intelligence Ročník 44; číslo 10; s. 5933 - 5946
Hlavní autori:	Zhou, Pan, Yuan, Xiao-Tong, Lin, Zhouchen, Hoi, Steven C.H.
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	United States IEEE 01.10.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:	Algorithms Catalysts Complexity Computational complexity Computational modeling Convex optimization Linear prediction online convex optimization Optimization precondition Prediction algorithms Signal processing algorithms Stochastic processes stochastic variance-reduced algorithm
ISSN:	0162-8828, 1939-3539, 2160-9292, 1939-3539
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Despite the success of stochastic variance-reduced gradient (SVRG) algorithms in solving large-scale problems, their stochastic gradient complexity often scales linearly with data size and is expensive for huge data. Accordingly, we propose a hybrid stochastic-deterministic minibatch proximal gradient ( HSDMPG ) algorithm for strongly convex problems with linear prediction structure, e.g., least squares and logistic/softmax regression. HSDMPG enjoys improved computational complexity that is data-size-independent for large-scale problems. It iteratively samples an evolving minibatch of individual losses to estimate the original problem, and can efficiently minimize the sampled subproblems. For strongly convex loss of <inline-formula><tex-math notation="LaTeX">n</tex-math> <mml:math><mml:mi>n</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq1-3087328.gif"/> </inline-formula> components, HSDMPG attains an <inline-formula><tex-math notation="LaTeX">\epsilon</tex-math> <mml:math><mml:mi>ε</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq2-3087328.gif"/> </inline-formula>-optimization-error within <inline-formula><tex-math notation="LaTeX">\mathcal {O} \left(\kappa \log ^{\zeta +1}\left(\frac{1}{\epsilon }\right)\frac{1}{\epsilon }\bigwedge n\log ^{\zeta }\left(\frac{1}{\epsilon }\right)\right)</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="script">O</mml:mi><mml:mfenced separators="" open="(" close=")"><mml:mi>κ</mml:mi><mml:msup><mml:mo form="prefix">log</mml:mo><mml:mrow><mml:mi>ζ</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mfenced separators="" open="(" close=")"><mml:mfrac><mml:mn>1</mml:mn><mml:mi>ε</mml:mi></mml:mfrac></mml:mfenced><mml:mfrac><mml:mn>1</mml:mn><mml:mi>ε</mml:mi></mml:mfrac><mml:mo>⋀</mml:mo><mml:mi>n</mml:mi><mml:msup><mml:mo form="prefix">log</mml:mo><mml:mi>ζ</mml:mi></mml:msup><mml:mfenced separators="" open="(" close=")"><mml:mfrac><mml:mn>1</mml:mn><mml:mi>ε</mml:mi></mml:mfrac></mml:mfenced></mml:mfenced></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq3-3087328.gif"/> </inline-formula> stochastic gradient evaluations, where <inline-formula><tex-math notation="LaTeX">\kappa</tex-math> <mml:math><mml:mi>κ</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq4-3087328.gif"/> </inline-formula> is condition number, <inline-formula><tex-math notation="LaTeX">\zeta =1</tex-math> <mml:math><mml:mrow><mml:mi>ζ</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq5-3087328.gif"/> </inline-formula> for quadratic loss and <inline-formula><tex-math notation="LaTeX">\zeta =2</tex-math> <mml:math><mml:mrow><mml:mi>ζ</mml:mi><mml:mo>=</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq6-3087328.gif"/> </inline-formula> for generic loss. For large-scale problems, our complexity outperforms those of SVRG-type algorithms with/without dependence on data size. Particularly, when <inline-formula><tex-math notation="LaTeX">\epsilon =\mathcal {O}(1/\sqrt{n})</tex-math> <mml:math><mml:mrow><mml:mi>ε</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="script">O</mml:mi><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:msqrt><mml:mi>n</mml:mi></mml:msqrt><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq7-3087328.gif"/> </inline-formula> which matches the intrinsic excess error of a learning model and is sufficient for generalization, our complexity for quadratic and generic losses is respectively <inline-formula><tex-math notation="LaTeX">\mathcal {O} (n^{0.5}\log ^{2}(n))</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="script">O</mml:mi><mml:mo>(</mml:mo><mml:msup><mml:mi>n</mml:mi><mml:mrow><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>5</mml:mn></mml:mrow></mml:msup><mml:msup><mml:mo form="prefix">log</mml:mo><mml:mn>2</mml:mn></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>n</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq8-3087328.gif"/> </inline-formula> and <inline-formula><tex-math notation="LaTeX">\mathcal {O} (n^{0.5}\log ^{3}(n))</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="script">O</mml:mi><mml:mo>(</mml:mo><mml:msup><mml:mi>n</mml:mi><mml:mrow><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>5</mml:mn></mml:mrow></mml:msup><mml:msup><mml:mo form="prefix">log</mml:mo><mml:mn>3</mml:mn></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>n</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq9-3087328.gif"/> </inline-formula>, which for the first time achieves optimal generalization in less than a single pass over data. Besides, we extend HSDMPG to online strongly convex problems and prove its higher efficiency over the prior algorithms. Numerical results demonstrate the computational advantages of HSDMPG .
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	0162-8828 1939-3539 2160-9292 1939-3539
DOI:	10.1109/TPAMI.2021.3087328