A Hybrid Stochastic-Deterministic Minibatch Proximal Gradient Method for Efficient Optimization and Generalization

Despite the success of stochastic variance-reduced gradient (SVRG) algorithms in solving large-scale problems, their stochastic gradient complexity often scales linearly with data size and is expensive for huge data. Accordingly, we propose a hybrid stochastic-deterministic minibatch proximal gradie...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on pattern analysis and machine intelligence Vol. 44; no. 10; pp. 5933 - 5946
Main Authors: Zhou, Pan, Yuan, Xiao-Tong, Lin, Zhouchen, Hoi, Steven C.H.
Format: Journal Article
Language:English
Published: United States IEEE 01.10.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:0162-8828, 1939-3539, 2160-9292, 1939-3539
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Despite the success of stochastic variance-reduced gradient (SVRG) algorithms in solving large-scale problems, their stochastic gradient complexity often scales linearly with data size and is expensive for huge data. Accordingly, we propose a hybrid stochastic-deterministic minibatch proximal gradient ( HSDMPG ) algorithm for strongly convex problems with linear prediction structure, e.g., least squares and logistic/softmax regression. HSDMPG enjoys improved computational complexity that is data-size-independent for large-scale problems. It iteratively samples an evolving minibatch of individual losses to estimate the original problem, and can efficiently minimize the sampled subproblems. For strongly convex loss of <inline-formula><tex-math notation="LaTeX">n</tex-math> <mml:math><mml:mi>n</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq1-3087328.gif"/> </inline-formula> components, HSDMPG attains an <inline-formula><tex-math notation="LaTeX">\epsilon</tex-math> <mml:math><mml:mi>ε</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq2-3087328.gif"/> </inline-formula>-optimization-error within <inline-formula><tex-math notation="LaTeX">\mathcal {O} \left(\kappa \log ^{\zeta +1}\left(\frac{1}{\epsilon }\right)\frac{1}{\epsilon }\bigwedge n\log ^{\zeta }\left(\frac{1}{\epsilon }\right)\right)</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="script">O</mml:mi><mml:mfenced separators="" open="(" close=")"><mml:mi>κ</mml:mi><mml:msup><mml:mo form="prefix">log</mml:mo><mml:mrow><mml:mi>ζ</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mfenced separators="" open="(" close=")"><mml:mfrac><mml:mn>1</mml:mn><mml:mi>ε</mml:mi></mml:mfrac></mml:mfenced><mml:mfrac><mml:mn>1</mml:mn><mml:mi>ε</mml:mi></mml:mfrac><mml:mo>⋀</mml:mo><mml:mi>n</mml:mi><mml:msup><mml:mo form="prefix">log</mml:mo><mml:mi>ζ</mml:mi></mml:msup><mml:mfenced separators="" open="(" close=")"><mml:mfrac><mml:mn>1</mml:mn><mml:mi>ε</mml:mi></mml:mfrac></mml:mfenced></mml:mfenced></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq3-3087328.gif"/> </inline-formula> stochastic gradient evaluations, where <inline-formula><tex-math notation="LaTeX">\kappa</tex-math> <mml:math><mml:mi>κ</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq4-3087328.gif"/> </inline-formula> is condition number, <inline-formula><tex-math notation="LaTeX">\zeta =1</tex-math> <mml:math><mml:mrow><mml:mi>ζ</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq5-3087328.gif"/> </inline-formula> for quadratic loss and <inline-formula><tex-math notation="LaTeX">\zeta =2</tex-math> <mml:math><mml:mrow><mml:mi>ζ</mml:mi><mml:mo>=</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq6-3087328.gif"/> </inline-formula> for generic loss. For large-scale problems, our complexity outperforms those of SVRG-type algorithms with/without dependence on data size. Particularly, when <inline-formula><tex-math notation="LaTeX">\epsilon =\mathcal {O}(1/\sqrt{n})</tex-math> <mml:math><mml:mrow><mml:mi>ε</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="script">O</mml:mi><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:msqrt><mml:mi>n</mml:mi></mml:msqrt><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq7-3087328.gif"/> </inline-formula> which matches the intrinsic excess error of a learning model and is sufficient for generalization, our complexity for quadratic and generic losses is respectively <inline-formula><tex-math notation="LaTeX">\mathcal {O} (n^{0.5}\log ^{2}(n))</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="script">O</mml:mi><mml:mo>(</mml:mo><mml:msup><mml:mi>n</mml:mi><mml:mrow><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>5</mml:mn></mml:mrow></mml:msup><mml:msup><mml:mo form="prefix">log</mml:mo><mml:mn>2</mml:mn></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>n</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq8-3087328.gif"/> </inline-formula> and <inline-formula><tex-math notation="LaTeX">\mathcal {O} (n^{0.5}\log ^{3}(n))</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="script">O</mml:mi><mml:mo>(</mml:mo><mml:msup><mml:mi>n</mml:mi><mml:mrow><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>5</mml:mn></mml:mrow></mml:msup><mml:msup><mml:mo form="prefix">log</mml:mo><mml:mn>3</mml:mn></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>n</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq9-3087328.gif"/> </inline-formula>, which for the first time achieves optimal generalization in less than a single pass over data. Besides, we extend HSDMPG to online strongly convex problems and prove its higher efficiency over the prior algorithms. Numerical results demonstrate the computational advantages of HSDMPG .
AbstractList Despite the success of stochastic variance-reduced gradient (SVRG) algorithms in solving large-scale problems, their stochastic gradient complexity often scales linearly with data size and is expensive for huge data. Accordingly, we propose a hybrid stochastic-deterministic minibatch proximal gradient~(HSDMPG) algorithm for strongly convex problems with linear prediction structure, e.g.~least squares and logistic/softmax regression. HSDMPG~enjoys improved computational complexity that is data-size-independent for large-scale problems. It iteratively samples an evolving~minibatch of individual losses to estimate the original problem, and efficiently minimizes the sampled smaller-sized subproblems. For strongly convex loss of n components, HSDMPG~attains an ϵ-optimization-error within [Formula: see text] stochastic gradient evaluations, where κ is condition number, ζ = 1 for quadratic loss and ζ = 2 for generic loss. For large-scale problems, our complexity outperforms those of SVRG-type algorithms with/without dependence on data size. Particularly, when ϵ = O(1/√n) which matches the intrinsic excess error of a learning model and is sufficient for generalization, our complexity for quadratic and generic losses is respectively O (n log (n)) and O (n log (n)), which for the first time achieves optimal generalization in less than a single pass over data. Besides, we extend HSDMPG~to online strongly convex problems and prove its higher efficiency over the prior algorithms. Numerical results demonstrate the computational advantages of~HSDMPG.
Despite the success of stochastic variance-reduced gradient (SVRG) algorithms in solving large-scale problems, their stochastic gradient complexity often scales linearly with data size and is expensive for huge data. Accordingly, we propose a hybrid stochastic-deterministic minibatch proximal gradient~(HSDMPG) algorithm for strongly convex problems with linear prediction structure, e.g.~least squares and logistic/softmax regression. HSDMPG~enjoys improved computational complexity that is data-size-independent for large-scale problems. It iteratively samples an evolving~minibatch of individual losses to estimate the original problem, and efficiently minimizes the sampled smaller-sized subproblems. For strongly convex loss of n components, HSDMPG~attains an ϵ-optimization-error within [Formula: see text] stochastic gradient evaluations, where κ is condition number, ζ = 1 for quadratic loss and ζ = 2 for generic loss. For large-scale problems, our complexity outperforms those of SVRG-type algorithms with/without dependence on data size. Particularly, when ϵ = O(1/√n) which matches the intrinsic excess error of a learning model and is sufficient for generalization, our complexity for quadratic and generic losses is respectively O (n0.5log2(n)) and O (n0.5log3(n)), which for the first time achieves optimal generalization in less than a single pass over data. Besides, we extend HSDMPG~to online strongly convex problems and prove its higher efficiency over the prior algorithms. Numerical results demonstrate the computational advantages of~HSDMPG.Despite the success of stochastic variance-reduced gradient (SVRG) algorithms in solving large-scale problems, their stochastic gradient complexity often scales linearly with data size and is expensive for huge data. Accordingly, we propose a hybrid stochastic-deterministic minibatch proximal gradient~(HSDMPG) algorithm for strongly convex problems with linear prediction structure, e.g.~least squares and logistic/softmax regression. HSDMPG~enjoys improved computational complexity that is data-size-independent for large-scale problems. It iteratively samples an evolving~minibatch of individual losses to estimate the original problem, and efficiently minimizes the sampled smaller-sized subproblems. For strongly convex loss of n components, HSDMPG~attains an ϵ-optimization-error within [Formula: see text] stochastic gradient evaluations, where κ is condition number, ζ = 1 for quadratic loss and ζ = 2 for generic loss. For large-scale problems, our complexity outperforms those of SVRG-type algorithms with/without dependence on data size. Particularly, when ϵ = O(1/√n) which matches the intrinsic excess error of a learning model and is sufficient for generalization, our complexity for quadratic and generic losses is respectively O (n0.5log2(n)) and O (n0.5log3(n)), which for the first time achieves optimal generalization in less than a single pass over data. Besides, we extend HSDMPG~to online strongly convex problems and prove its higher efficiency over the prior algorithms. Numerical results demonstrate the computational advantages of~HSDMPG.
Despite the success of stochastic variance-reduced gradient (SVRG) algorithms in solving large-scale problems, their stochastic gradient complexity often scales linearly with data size and is expensive for huge data. Accordingly, we propose a hybrid stochastic-deterministic minibatch proximal gradient ( HSDMPG ) algorithm for strongly convex problems with linear prediction structure, e.g., least squares and logistic/softmax regression. HSDMPG enjoys improved computational complexity that is data-size-independent for large-scale problems. It iteratively samples an evolving minibatch of individual losses to estimate the original problem, and can efficiently minimize the sampled subproblems. For strongly convex loss of <inline-formula><tex-math notation="LaTeX">n</tex-math> <mml:math><mml:mi>n</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq1-3087328.gif"/> </inline-formula> components, HSDMPG attains an <inline-formula><tex-math notation="LaTeX">\epsilon</tex-math> <mml:math><mml:mi>ε</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq2-3087328.gif"/> </inline-formula>-optimization-error within <inline-formula><tex-math notation="LaTeX">\mathcal {O} \left(\kappa \log ^{\zeta +1}\left(\frac{1}{\epsilon }\right)\frac{1}{\epsilon }\bigwedge n\log ^{\zeta }\left(\frac{1}{\epsilon }\right)\right)</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="script">O</mml:mi><mml:mfenced separators="" open="(" close=")"><mml:mi>κ</mml:mi><mml:msup><mml:mo form="prefix">log</mml:mo><mml:mrow><mml:mi>ζ</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mfenced separators="" open="(" close=")"><mml:mfrac><mml:mn>1</mml:mn><mml:mi>ε</mml:mi></mml:mfrac></mml:mfenced><mml:mfrac><mml:mn>1</mml:mn><mml:mi>ε</mml:mi></mml:mfrac><mml:mo>⋀</mml:mo><mml:mi>n</mml:mi><mml:msup><mml:mo form="prefix">log</mml:mo><mml:mi>ζ</mml:mi></mml:msup><mml:mfenced separators="" open="(" close=")"><mml:mfrac><mml:mn>1</mml:mn><mml:mi>ε</mml:mi></mml:mfrac></mml:mfenced></mml:mfenced></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq3-3087328.gif"/> </inline-formula> stochastic gradient evaluations, where <inline-formula><tex-math notation="LaTeX">\kappa</tex-math> <mml:math><mml:mi>κ</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq4-3087328.gif"/> </inline-formula> is condition number, <inline-formula><tex-math notation="LaTeX">\zeta =1</tex-math> <mml:math><mml:mrow><mml:mi>ζ</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq5-3087328.gif"/> </inline-formula> for quadratic loss and <inline-formula><tex-math notation="LaTeX">\zeta =2</tex-math> <mml:math><mml:mrow><mml:mi>ζ</mml:mi><mml:mo>=</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq6-3087328.gif"/> </inline-formula> for generic loss. For large-scale problems, our complexity outperforms those of SVRG-type algorithms with/without dependence on data size. Particularly, when <inline-formula><tex-math notation="LaTeX">\epsilon =\mathcal {O}(1/\sqrt{n})</tex-math> <mml:math><mml:mrow><mml:mi>ε</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="script">O</mml:mi><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:msqrt><mml:mi>n</mml:mi></mml:msqrt><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq7-3087328.gif"/> </inline-formula> which matches the intrinsic excess error of a learning model and is sufficient for generalization, our complexity for quadratic and generic losses is respectively <inline-formula><tex-math notation="LaTeX">\mathcal {O} (n^{0.5}\log ^{2}(n))</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="script">O</mml:mi><mml:mo>(</mml:mo><mml:msup><mml:mi>n</mml:mi><mml:mrow><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>5</mml:mn></mml:mrow></mml:msup><mml:msup><mml:mo form="prefix">log</mml:mo><mml:mn>2</mml:mn></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>n</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq8-3087328.gif"/> </inline-formula> and <inline-formula><tex-math notation="LaTeX">\mathcal {O} (n^{0.5}\log ^{3}(n))</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="script">O</mml:mi><mml:mo>(</mml:mo><mml:msup><mml:mi>n</mml:mi><mml:mrow><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>5</mml:mn></mml:mrow></mml:msup><mml:msup><mml:mo form="prefix">log</mml:mo><mml:mn>3</mml:mn></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>n</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq9-3087328.gif"/> </inline-formula>, which for the first time achieves optimal generalization in less than a single pass over data. Besides, we extend HSDMPG to online strongly convex problems and prove its higher efficiency over the prior algorithms. Numerical results demonstrate the computational advantages of HSDMPG .
Despite the success of stochastic variance-reduced gradient (SVRG) algorithms in solving large-scale problems, their stochastic gradient complexity often scales linearly with data size and is expensive for huge data. Accordingly, we propose a hybrid stochastic-deterministic minibatch proximal gradient ( HSDMPG ) algorithm for strongly convex problems with linear prediction structure, e.g., least squares and logistic/softmax regression. HSDMPG enjoys improved computational complexity that is data-size-independent for large-scale problems. It iteratively samples an evolving minibatch of individual losses to estimate the original problem, and can efficiently minimize the sampled subproblems. For strongly convex loss of [Formula Omitted] components, HSDMPG attains an [Formula Omitted]-optimization-error within [Formula Omitted] stochastic gradient evaluations, where [Formula Omitted] is condition number, [Formula Omitted] for quadratic loss and [Formula Omitted] for generic loss. For large-scale problems, our complexity outperforms those of SVRG-type algorithms with/without dependence on data size. Particularly, when [Formula Omitted] which matches the intrinsic excess error of a learning model and is sufficient for generalization, our complexity for quadratic and generic losses is respectively [Formula Omitted] and [Formula Omitted], which for the first time achieves optimal generalization in less than a single pass over data. Besides, we extend HSDMPG to online strongly convex problems and prove its higher efficiency over the prior algorithms. Numerical results demonstrate the computational advantages of HSDMPG .
Author Lin, Zhouchen
Zhou, Pan
Yuan, Xiao-Tong
Hoi, Steven C.H.
Author_xml – sequence: 1
  givenname: Pan
  orcidid: 0000-0003-3400-8943
  surname: Zhou
  fullname: Zhou, Pan
  email: panzhou3@gmail.com
  organization: Salesforce, Sea AI Lab of Sea Group, Singapore, Singapore
– sequence: 2
  givenname: Xiao-Tong
  orcidid: 0000-0002-7151-8806
  surname: Yuan
  fullname: Yuan, Xiao-Tong
  email: xtyuan@nuist.edu.cn
  organization: School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing, China
– sequence: 3
  givenname: Zhouchen
  orcidid: 0000-0003-1493-7569
  surname: Lin
  fullname: Lin, Zhouchen
  email: zlin@pku.edu.cn
  organization: Key Lab. of Machine Perception (MoE), School of EECS, Peking University, Beijing, China
– sequence: 4
  givenname: Steven C.H.
  surname: Hoi
  fullname: Hoi, Steven C.H.
  email: shoi@salesforce.com
  organization: Salesforce Research, Singapore, Singapore
BackLink https://www.ncbi.nlm.nih.gov/pubmed/34101583$$D View this record in MEDLINE/PubMed
BookMark eNp9kUtr3DAUhUVJaSZp_0ALRdBNNp7qYdnyckiTSSBDAk3XQpKvGAVbmkoaSPrr63k0iyy6urqH70jinDN0EmIAhD5TMqeUdN8fHxar2zkjjM45kS1n8h2aMdqQqmMdO0EzQhtWScnkKTrL-YkQWgvCP6BTXlNCheQzlBb45sUk3-OfJdq1zsXb6gcUSKMPfrfh1XQwutg1fkjx2Y96wMukew-h4BWUdeyxiwlfOeftXrzfFD_6P7r4GLAOPV5CgKSHo_QRvXd6yPDpOM_Rr-urx8ub6u5-eXu5uKssF7RUTe9qA4K4lrRcg26IaSXn1BLRcCDOCC4ld7qmtpWNNMbKhjLXts4RIYzl5-jicO8mxd9byEWNPlsYBh0gbrNigneCNR3jE_rtDfoUtylMv1OspbXsGKvpRH09UlszQq82aQojvah_aU4AOwA2xZwTuFeEErWrTO0rU7vK1LGyySTfmKwv-6BK0n74v_XLweoB4PWtrq7lLpq_A3ekBw
CODEN ITPIDJ
CitedBy_id crossref_primary_10_1109_TPAMI_2024_3382294
crossref_primary_10_1007_s00607_024_01362_2
crossref_primary_10_4018_JGIM_293286
Cites_doi 10.1109/TPAMI.2019.2954874
10.1561/2200000018
10.1007/0-387-34239-7
10.7551/mitpress/8996.003.0015
10.1007/978-1-4419-8853-9
10.1007/s10107-014-0839-0
10.1145/3055399.3055448
10.1214/aoms/1177729586
10.1109/MSP.2010.936020
10.23919/ACC.2019.8814680
10.1109/TPAMI.2008.79
10.1109/CVPR.2017.419
10.1017/cbo9780511804458
10.1111/j.2517-6161.1996.tb02080.x
10.1201/b18401
10.1017/CBO9781107298019
10.1201/9781315366920
10.1137/110830629
10.1007/s10107-012-0573-4
10.1162/153244302760200704
10.1109/TPAMI.2013.57
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
DBID 97E
RIA
RIE
AAYXX
CITATION
NPM
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
7X8
DOI 10.1109/TPAMI.2021.3087328
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005-present
IEEE All-Society Periodicals Package (ASPP) 1998-Present
IEEE Electronic Library (IEL)
CrossRef
PubMed
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitle CrossRef
PubMed
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitleList PubMed
MEDLINE - Academic

Technology Research Database
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
– sequence: 3
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 2160-9292
1939-3539
EndPage 5946
ExternalDocumentID 34101583
10_1109_TPAMI_2021_3087328
9448388
Genre orig-research
Journal Article
GrantInformation_xml – fundername: PKU-Baidu Fund
  grantid: 2020BD006
– fundername: National Natural Science Foundation of China; Natural Science Foundation of China
  grantid: 61876090; 61936005
  funderid: 10.13039/501100001809
– fundername: National Key Research and Development Program of China
  grantid: 2018AAA0100400
– fundername: NSF China
  grantid: 61625301; 61731018
– fundername: Key-Area Research and Development Program of Guangdong Province
  grantid: 2019B121204008
GroupedDBID ---
-DZ
-~X
.DC
0R~
29I
4.4
53G
5GY
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
ACNCT
AENEX
AGQYO
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
F5P
HZ~
IEDLZ
IFIPE
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNS
RXW
TAE
TN5
UHB
~02
AAYXX
CITATION
NPM
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
7X8
ID FETCH-LOGICAL-c351t-6df4be50f7073aea60b78331c0563e0fb53883fa41c7868bbc8612f77ff055bc3
IEDL.DBID RIE
ISICitedReferencesCount 3
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000853875300010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0162-8828
1939-3539
IngestDate Sat Sep 27 17:18:51 EDT 2025
Sun Nov 30 05:32:19 EST 2025
Sun Nov 09 08:39:31 EST 2025
Sat Nov 29 08:04:19 EST 2025
Tue Nov 18 22:30:36 EST 2025
Wed Aug 27 02:04:21 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 10
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c351t-6df4be50f7073aea60b78331c0563e0fb53883fa41c7868bbc8612f77ff055bc3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0002-7151-8806
0000-0003-1493-7569
0000-0003-3400-8943
PMID 34101583
PQID 2714892241
PQPubID 85458
PageCount 14
ParticipantIDs proquest_journals_2714892241
crossref_citationtrail_10_1109_TPAMI_2021_3087328
pubmed_primary_34101583
crossref_primary_10_1109_TPAMI_2021_3087328
proquest_miscellaneous_2539526923
ieee_primary_9448388
PublicationCentury 2000
PublicationDate 2022-10-01
PublicationDateYYYYMMDD 2022-10-01
PublicationDate_xml – month: 10
  year: 2022
  text: 2022-10-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: New York
PublicationTitle IEEE transactions on pattern analysis and machine intelligence
PublicationTitleAbbrev TPAMI
PublicationTitleAlternate IEEE Trans Pattern Anal Mach Intell
PublicationYear 2022
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref15
ref11
ref10
Boyarshinov (ref12) 2005
ref16
Zhou (ref40)
Lin (ref29)
Hardt (ref44)
ref47
Defazio (ref17)
Mokhtari (ref41)
Yuan (ref49) 2020
Zhang (ref30)
Zhou (ref39)
ref8
ref7
Feldman (ref48)
ref9
ref4
ref3
ref6
ref5
Shamir (ref43)
Hendrikx (ref36) 2019
Mokhtari (ref42)
ref37
ref2
Nitanda (ref31)
Shamir (ref27) 2011
ref38
Zhou (ref34)
Bottou (ref35); 91
Zhou (ref46)
Lin (ref19)
Dieuleveut (ref32) 2017; 18
ref24
ref23
ref26
Lei (ref20)
Cauchy (ref14) 1847; 25
ref21
ref28
Bach (ref33)
Zhou (ref45)
Lan (ref22)
Johnson (ref18)
Monga (ref1) 2017
Shalev-Shwartz (ref25)
References_xml – start-page: 113
  volume-title: Proc. Conf. Learn. Theory
  ident: ref25
  article-title: Stochastic convex optimization
– start-page: 1234
  volume-title: Proc. Conf. Neural Inf. Process. Syst.
  ident: ref40
  article-title: New insight into hybrid stochastic gradient descent: Beyond with-replacement sampling and convexity
– start-page: 4062
  volume-title: Proc. Conf. Neural Inf. Process. Syst.
  ident: ref41
  article-title: Adaptive newton method for empirical risk minimization to statistical accuracy
– ident: ref4
  doi: 10.1109/TPAMI.2019.2954874
– start-page: 11556
  volume-title: Proc. Int. Conf. Mach. Learn.
  ident: ref34
  article-title: Hybrid stochastic-deterministic minibatch proximal gradient: Less-than-single-pass optimization with nearly optimal generalization
– start-page: 195
  volume-title: Proc. Artif. Intell. Statist.
  ident: ref31
  article-title: Accelerated stochastic gradient descent for minimizing finite sums
– start-page: 1646
  volume-title: Proc. Conf. Neural Inf. Process. Syst.
  ident: ref17
  article-title: SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives
– year: 2011
  ident: ref27
  article-title: Making gradient descent optimal for strongly convex stochastic optimization
– start-page: 353
  volume-title: Proc. Int. Conf. Mach. Learn.
  ident: ref30
  article-title: Stochastic primal-dual coordinate method for regularized empirical risk minimization
– volume: 91
  start-page: 12
  issue: 8
  volume-title: Proc. Neuro-Nımes
  ident: ref35
  article-title: Stochastic gradient learning in neural networks
– ident: ref16
  doi: 10.1561/2200000018
– start-page: 148
  volume-title: Proc. Artif. Intell. Statist.
  ident: ref20
  article-title: Less than a single pass: Stochastically controlled stochastic gradient
– start-page: 1
  year: 2020
  ident: ref49
  article-title: On convergence of distributed approximate Newton methods: Globalization, sharper bounds and beyond
  publication-title: J. Mach. Learn. Res.
– volume: 25
  start-page: 536
  year: 1847
  ident: ref14
  article-title: Méthode générale pour la résolution des systèmes déquations simultanées
  publication-title: Comptesrendus des séances de l’Académie des sciences de Paris
– ident: ref24
  doi: 10.1007/0-387-34239-7
– start-page: 773
  volume-title: Proc. Conf. Neural Inf. Process. Syst.
  ident: ref33
  article-title: Non-strongly-convex smooth stochastic approximation with convergence rate O (1/n)
– ident: ref23
  doi: 10.7551/mitpress/8996.003.0015
– start-page: 2060
  volume-title: Proc. Conf. Neural Inf. Process. Syst.
  ident: ref42
  article-title: First-order adaptive sample size methods to reduce complexity of empirical risk minimization
– ident: ref10
  doi: 10.1007/978-1-4419-8853-9
– ident: ref28
  doi: 10.1007/s10107-014-0839-0
– ident: ref21
  doi: 10.1145/3055399.3055448
– ident: ref15
  doi: 10.1214/aoms/1177729586
– year: 2005
  ident: ref12
  article-title: Machine learning in computational finance
– ident: ref7
  doi: 10.1109/MSP.2010.936020
– ident: ref37
  doi: 10.23919/ACC.2019.8814680
– start-page: 315
  volume-title: Proc. Conf. Neural Inf. Process. Syst.
  ident: ref18
  article-title: Accelerating stochastic gradient descent using predictive variance reduction
– ident: ref2
  doi: 10.1109/TPAMI.2008.79
– start-page: 1225
  volume-title: Proc. Int. Conf. Mach. Learn.
  ident: ref44
  article-title: Train faster, generalize better: Stability of stochastic gradient descent
– ident: ref5
  doi: 10.1109/CVPR.2017.419
– ident: ref6
  doi: 10.1017/cbo9780511804458
– volume: 18
  start-page: 3520
  issue: 1
  year: 2017
  ident: ref32
  article-title: Harder, better, faster, stronger convergence rates for least-squares regression
  publication-title: J. Mach. Learn. Res.
– ident: ref9
  doi: 10.1111/j.2517-6161.1996.tb02080.x
– ident: ref11
  doi: 10.1201/b18401
– ident: ref26
  doi: 10.1017/CBO9781107298019
– ident: ref8
  doi: 10.1201/9781315366920
– start-page: 1
  volume-title: Proc. Conf. Neural Inf. Process. Syst.
  ident: ref39
  article-title: Efficient stochastic gradient hard thresholding
– start-page: 5960
  volume-title: Proc. Int. Conf. Mach. Learn.
  ident: ref45
  article-title: Understanding generalization and optimization performance of deep CNNs
– start-page: 1000
  volume-title: Proc. Int. Conf. Mach. Learn.
  ident: ref43
  article-title: Communication-efficient distributed optimization using an approximate newton-type method
– ident: ref38
  doi: 10.1137/110830629
– ident: ref13
  doi: 10.1007/s10107-012-0573-4
– start-page: 10462
  volume-title: Proc. Conf. Neural Inf. Process. Syst.
  ident: ref22
  article-title: A unified variance-reduced accelerated gradient method for convex optimization
– ident: ref47
  doi: 10.1162/153244302760200704
– start-page: 1
  volume-title: Proc. Int. Conf. Learn. Representations
  ident: ref46
  article-title: Empirical risk landscape analysis for understanding deep neural networks
– volume-title: Handbook of Convex Optimization Methods in Imaging Science
  year: 2017
  ident: ref1
– year: 2019
  ident: ref36
  article-title: Asynchronous accelerated proximal stochastic gradient for strongly convex distributed finite sums
– start-page: 1270
  volume-title: Proc. Conf. Learn. Theory
  ident: ref48
  article-title: High probability generalization bounds for uniformly stable algorithms with nearly optimal rate
– start-page: 3059
  volume-title: Proc. Conf. Neural Inf. Process. Syst.
  ident: ref29
  article-title: An accelerated proximal coordinate gradient method
– ident: ref3
  doi: 10.1109/TPAMI.2013.57
– start-page: 3384
  volume-title: Proc. Conf. Neural Inf. Process. Syst.
  ident: ref19
  article-title: A universal catalyst for first-order optimization
SSID ssj0014503
Score 2.426824
Snippet Despite the success of stochastic variance-reduced gradient (SVRG) algorithms in solving large-scale problems, their stochastic gradient complexity often...
SourceID proquest
pubmed
crossref
ieee
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 5933
SubjectTerms Algorithms
Catalysts
Complexity
Computational complexity
Computational modeling
Convex optimization
Linear prediction
online convex optimization
Optimization
precondition
Prediction algorithms
Signal processing algorithms
Stochastic processes
stochastic variance-reduced algorithm
Title A Hybrid Stochastic-Deterministic Minibatch Proximal Gradient Method for Efficient Optimization and Generalization
URI https://ieeexplore.ieee.org/document/9448388
https://www.ncbi.nlm.nih.gov/pubmed/34101583
https://www.proquest.com/docview/2714892241
https://www.proquest.com/docview/2539526923
Volume 44
WOSCitedRecordID wos000853875300010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 2160-9292
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014503
  issn: 0162-8828
  databaseCode: RIE
  dateStart: 19790101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1La9wwEB6SkEN7aF59uE2CCrm1TmzLsqTj0jyhmywkLXszkiyTQGMXx1vaf9-RVjYppIXehDx-wIxG31gz8wEcoBlYXRkRM5NhgJIrG2uZqFgzbYosL1jl-VO-fuaXl2I-l7MV-DjWwlhrffKZPXRDf5ZftWbhfpUdSYwlqBCrsMo5X9ZqjScGOfMsyIhgcIVjGDEUyCTy6GY2mV5gKJilh67_Hc0cSR96b9wJBf1jP_IEK3_Hmn7POd34v6_dhBcBW5LJ0hi2YMU227Ax8DaQsIy34fmjJoQ70E3I-S9Xt0Wu-9bcKte4OT4OWTK-jTOZ4kCjz74ls679eXePLznrfK5YT6aegpog9iUnvh2Fm7xCR3QfKjyJaioSuluHqZfw5fTk5tN5HJgYYkNZ2sdFVefasqTm6BGUVUWiuaA0NQifqE1qjW5T0FrlqeGiEFobgcip5ryuE4Zap69grWkb-wYI-gehcoQ5tEJxTqWtjcCxlawyiP4iSAd9lCa0KXdsGd9KH64ksvTqLJ06y6DOCD6M93xfNun4p_SOU9YoGfQUwe6g9jKs44cy4xguSgdzIng_XsYV6I5VVGPbBcowKh1Pe0YjeL00l_HZg5W9ffqd7-BZ5sopfHLgLqz13cLuwbr50d89dPto5nOx7838Ny-x9zc
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fb9QwDLbGhgQ8MNhgFAYLEm_QrW2aNnk8sY2bdnecxIH2ViVpqk1i7dT1EPvv5-TSCiSYtLcodX9IdpzPje0P4AOagVGl5iHTCQYoqTShEpEMFVM6S9KMlY4_5cckn8342ZmYr8GnoRbGGOOSz8y-Hbqz_LLRS_ur7EBgLEE5fwAbLE2TeFWtNZwZpMzxICOGwTWOgURfIhOJg8V8ND3BYDCJ920HPJpYmj7037gXcvrXjuQoVv6PNt2uc7x5v-99Bk89uiSjlTk8hzVTb8Fmz9xA_ELegid_tCHchnZExje2cot86xp9Lm3r5vDQ58m4Rs5kigOFXvuczNvm98UlvuRL67LFOjJ1JNQE0S85cg0p7ORXdEWXvsaTyLokvr-1n3oB34-PFp_HoediCDVlcRdmZZUqw6IqR58gjcwilXNKY40AipqoUug4Oa1kGuucZ1wpzRE7VXleVRFDvdOXsF43tXkFBD0ElykCHVqieE6FqTTHsRGs1Ij_Aoh7fRTaNyq3fBk_CxewRKJw6iysOguvzgA-Dvdcrdp03Cm9bZU1SHo9BbDbq73wK_m6SHIMGIUFOgG8Hy7jGrQHK7I2zRJlGBWWqT2hAeyszGV4dm9lr__9zj14NF5MJ8XkZHb6Bh4ntrjCpQruwnrXLs1beKh_dRfX7Ttn7Lda3PmW
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Hybrid+Stochastic-Deterministic+Minibatch+Proximal+Gradient+Method+for+Efficient+Optimization+and+Generalization&rft.jtitle=IEEE+transactions+on+pattern+analysis+and+machine+intelligence&rft.au=Zhou%2C+Pan&rft.au=Yuan%2C+Xiao-Tong&rft.au=Lin%2C+Zhouchen&rft.au=Hoi%2C+Steven+C.H.&rft.date=2022-10-01&rft.issn=0162-8828&rft.eissn=2160-9292&rft.volume=44&rft.issue=10&rft.spage=5933&rft.epage=5946&rft_id=info:doi/10.1109%2FTPAMI.2021.3087328&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TPAMI_2021_3087328
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0162-8828&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0162-8828&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0162-8828&client=summon