A Hybrid Stochastic-Deterministic Minibatch Proximal Gradient Method for Efficient Optimization and Generalization
Despite the success of stochastic variance-reduced gradient (SVRG) algorithms in solving large-scale problems, their stochastic gradient complexity often scales linearly with data size and is expensive for huge data. Accordingly, we propose a hybrid stochastic-deterministic minibatch proximal gradie...
Saved in:
| Published in: | IEEE transactions on pattern analysis and machine intelligence Vol. 44; no. 10; pp. 5933 - 5946 |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
United States
IEEE
01.10.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 0162-8828, 1939-3539, 2160-9292, 1939-3539 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Despite the success of stochastic variance-reduced gradient (SVRG) algorithms in solving large-scale problems, their stochastic gradient complexity often scales linearly with data size and is expensive for huge data. Accordingly, we propose a hybrid stochastic-deterministic minibatch proximal gradient ( HSDMPG ) algorithm for strongly convex problems with linear prediction structure, e.g., least squares and logistic/softmax regression. HSDMPG enjoys improved computational complexity that is data-size-independent for large-scale problems. It iteratively samples an evolving minibatch of individual losses to estimate the original problem, and can efficiently minimize the sampled subproblems. For strongly convex loss of <inline-formula><tex-math notation="LaTeX">n</tex-math> <mml:math><mml:mi>n</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq1-3087328.gif"/> </inline-formula> components, HSDMPG attains an <inline-formula><tex-math notation="LaTeX">\epsilon</tex-math> <mml:math><mml:mi>ε</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq2-3087328.gif"/> </inline-formula>-optimization-error within <inline-formula><tex-math notation="LaTeX">\mathcal {O} \left(\kappa \log ^{\zeta +1}\left(\frac{1}{\epsilon }\right)\frac{1}{\epsilon }\bigwedge n\log ^{\zeta }\left(\frac{1}{\epsilon }\right)\right)</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="script">O</mml:mi><mml:mfenced separators="" open="(" close=")"><mml:mi>κ</mml:mi><mml:msup><mml:mo form="prefix">log</mml:mo><mml:mrow><mml:mi>ζ</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mfenced separators="" open="(" close=")"><mml:mfrac><mml:mn>1</mml:mn><mml:mi>ε</mml:mi></mml:mfrac></mml:mfenced><mml:mfrac><mml:mn>1</mml:mn><mml:mi>ε</mml:mi></mml:mfrac><mml:mo>⋀</mml:mo><mml:mi>n</mml:mi><mml:msup><mml:mo form="prefix">log</mml:mo><mml:mi>ζ</mml:mi></mml:msup><mml:mfenced separators="" open="(" close=")"><mml:mfrac><mml:mn>1</mml:mn><mml:mi>ε</mml:mi></mml:mfrac></mml:mfenced></mml:mfenced></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq3-3087328.gif"/> </inline-formula> stochastic gradient evaluations, where <inline-formula><tex-math notation="LaTeX">\kappa</tex-math> <mml:math><mml:mi>κ</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq4-3087328.gif"/> </inline-formula> is condition number, <inline-formula><tex-math notation="LaTeX">\zeta =1</tex-math> <mml:math><mml:mrow><mml:mi>ζ</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq5-3087328.gif"/> </inline-formula> for quadratic loss and <inline-formula><tex-math notation="LaTeX">\zeta =2</tex-math> <mml:math><mml:mrow><mml:mi>ζ</mml:mi><mml:mo>=</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq6-3087328.gif"/> </inline-formula> for generic loss. For large-scale problems, our complexity outperforms those of SVRG-type algorithms with/without dependence on data size. Particularly, when <inline-formula><tex-math notation="LaTeX">\epsilon =\mathcal {O}(1/\sqrt{n})</tex-math> <mml:math><mml:mrow><mml:mi>ε</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="script">O</mml:mi><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:msqrt><mml:mi>n</mml:mi></mml:msqrt><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq7-3087328.gif"/> </inline-formula> which matches the intrinsic excess error of a learning model and is sufficient for generalization, our complexity for quadratic and generic losses is respectively <inline-formula><tex-math notation="LaTeX">\mathcal {O} (n^{0.5}\log ^{2}(n))</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="script">O</mml:mi><mml:mo>(</mml:mo><mml:msup><mml:mi>n</mml:mi><mml:mrow><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>5</mml:mn></mml:mrow></mml:msup><mml:msup><mml:mo form="prefix">log</mml:mo><mml:mn>2</mml:mn></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>n</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq8-3087328.gif"/> </inline-formula> and <inline-formula><tex-math notation="LaTeX">\mathcal {O} (n^{0.5}\log ^{3}(n))</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="script">O</mml:mi><mml:mo>(</mml:mo><mml:msup><mml:mi>n</mml:mi><mml:mrow><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>5</mml:mn></mml:mrow></mml:msup><mml:msup><mml:mo form="prefix">log</mml:mo><mml:mn>3</mml:mn></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>n</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq9-3087328.gif"/> </inline-formula>, which for the first time achieves optimal generalization in less than a single pass over data. Besides, we extend HSDMPG to online strongly convex problems and prove its higher efficiency over the prior algorithms. Numerical results demonstrate the computational advantages of HSDMPG . |
|---|---|
| AbstractList | Despite the success of stochastic variance-reduced gradient (SVRG) algorithms in solving large-scale problems, their stochastic gradient complexity often scales linearly with data size and is expensive for huge data. Accordingly, we propose a hybrid stochastic-deterministic minibatch proximal gradient~(HSDMPG) algorithm for strongly convex problems with linear prediction structure, e.g.~least squares and logistic/softmax regression. HSDMPG~enjoys improved computational complexity that is data-size-independent for large-scale problems. It iteratively samples an evolving~minibatch of individual losses to estimate the original problem, and efficiently minimizes the sampled smaller-sized subproblems. For strongly convex loss of n components, HSDMPG~attains an ϵ-optimization-error within [Formula: see text] stochastic gradient evaluations, where κ is condition number, ζ = 1 for quadratic loss and ζ = 2 for generic loss. For large-scale problems, our complexity outperforms those of SVRG-type algorithms with/without dependence on data size. Particularly, when ϵ = O(1/√n) which matches the intrinsic excess error of a learning model and is sufficient for generalization, our complexity for quadratic and generic losses is respectively O (n
log
(n)) and O (n
log
(n)), which for the first time achieves optimal generalization in less than a single pass over data. Besides, we extend HSDMPG~to online strongly convex problems and prove its higher efficiency over the prior algorithms. Numerical results demonstrate the computational advantages of~HSDMPG. Despite the success of stochastic variance-reduced gradient (SVRG) algorithms in solving large-scale problems, their stochastic gradient complexity often scales linearly with data size and is expensive for huge data. Accordingly, we propose a hybrid stochastic-deterministic minibatch proximal gradient~(HSDMPG) algorithm for strongly convex problems with linear prediction structure, e.g.~least squares and logistic/softmax regression. HSDMPG~enjoys improved computational complexity that is data-size-independent for large-scale problems. It iteratively samples an evolving~minibatch of individual losses to estimate the original problem, and efficiently minimizes the sampled smaller-sized subproblems. For strongly convex loss of n components, HSDMPG~attains an ϵ-optimization-error within [Formula: see text] stochastic gradient evaluations, where κ is condition number, ζ = 1 for quadratic loss and ζ = 2 for generic loss. For large-scale problems, our complexity outperforms those of SVRG-type algorithms with/without dependence on data size. Particularly, when ϵ = O(1/√n) which matches the intrinsic excess error of a learning model and is sufficient for generalization, our complexity for quadratic and generic losses is respectively O (n0.5log2(n)) and O (n0.5log3(n)), which for the first time achieves optimal generalization in less than a single pass over data. Besides, we extend HSDMPG~to online strongly convex problems and prove its higher efficiency over the prior algorithms. Numerical results demonstrate the computational advantages of~HSDMPG.Despite the success of stochastic variance-reduced gradient (SVRG) algorithms in solving large-scale problems, their stochastic gradient complexity often scales linearly with data size and is expensive for huge data. Accordingly, we propose a hybrid stochastic-deterministic minibatch proximal gradient~(HSDMPG) algorithm for strongly convex problems with linear prediction structure, e.g.~least squares and logistic/softmax regression. HSDMPG~enjoys improved computational complexity that is data-size-independent for large-scale problems. It iteratively samples an evolving~minibatch of individual losses to estimate the original problem, and efficiently minimizes the sampled smaller-sized subproblems. For strongly convex loss of n components, HSDMPG~attains an ϵ-optimization-error within [Formula: see text] stochastic gradient evaluations, where κ is condition number, ζ = 1 for quadratic loss and ζ = 2 for generic loss. For large-scale problems, our complexity outperforms those of SVRG-type algorithms with/without dependence on data size. Particularly, when ϵ = O(1/√n) which matches the intrinsic excess error of a learning model and is sufficient for generalization, our complexity for quadratic and generic losses is respectively O (n0.5log2(n)) and O (n0.5log3(n)), which for the first time achieves optimal generalization in less than a single pass over data. Besides, we extend HSDMPG~to online strongly convex problems and prove its higher efficiency over the prior algorithms. Numerical results demonstrate the computational advantages of~HSDMPG. Despite the success of stochastic variance-reduced gradient (SVRG) algorithms in solving large-scale problems, their stochastic gradient complexity often scales linearly with data size and is expensive for huge data. Accordingly, we propose a hybrid stochastic-deterministic minibatch proximal gradient ( HSDMPG ) algorithm for strongly convex problems with linear prediction structure, e.g., least squares and logistic/softmax regression. HSDMPG enjoys improved computational complexity that is data-size-independent for large-scale problems. It iteratively samples an evolving minibatch of individual losses to estimate the original problem, and can efficiently minimize the sampled subproblems. For strongly convex loss of <inline-formula><tex-math notation="LaTeX">n</tex-math> <mml:math><mml:mi>n</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq1-3087328.gif"/> </inline-formula> components, HSDMPG attains an <inline-formula><tex-math notation="LaTeX">\epsilon</tex-math> <mml:math><mml:mi>ε</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq2-3087328.gif"/> </inline-formula>-optimization-error within <inline-formula><tex-math notation="LaTeX">\mathcal {O} \left(\kappa \log ^{\zeta +1}\left(\frac{1}{\epsilon }\right)\frac{1}{\epsilon }\bigwedge n\log ^{\zeta }\left(\frac{1}{\epsilon }\right)\right)</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="script">O</mml:mi><mml:mfenced separators="" open="(" close=")"><mml:mi>κ</mml:mi><mml:msup><mml:mo form="prefix">log</mml:mo><mml:mrow><mml:mi>ζ</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mfenced separators="" open="(" close=")"><mml:mfrac><mml:mn>1</mml:mn><mml:mi>ε</mml:mi></mml:mfrac></mml:mfenced><mml:mfrac><mml:mn>1</mml:mn><mml:mi>ε</mml:mi></mml:mfrac><mml:mo>⋀</mml:mo><mml:mi>n</mml:mi><mml:msup><mml:mo form="prefix">log</mml:mo><mml:mi>ζ</mml:mi></mml:msup><mml:mfenced separators="" open="(" close=")"><mml:mfrac><mml:mn>1</mml:mn><mml:mi>ε</mml:mi></mml:mfrac></mml:mfenced></mml:mfenced></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq3-3087328.gif"/> </inline-formula> stochastic gradient evaluations, where <inline-formula><tex-math notation="LaTeX">\kappa</tex-math> <mml:math><mml:mi>κ</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq4-3087328.gif"/> </inline-formula> is condition number, <inline-formula><tex-math notation="LaTeX">\zeta =1</tex-math> <mml:math><mml:mrow><mml:mi>ζ</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq5-3087328.gif"/> </inline-formula> for quadratic loss and <inline-formula><tex-math notation="LaTeX">\zeta =2</tex-math> <mml:math><mml:mrow><mml:mi>ζ</mml:mi><mml:mo>=</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq6-3087328.gif"/> </inline-formula> for generic loss. For large-scale problems, our complexity outperforms those of SVRG-type algorithms with/without dependence on data size. Particularly, when <inline-formula><tex-math notation="LaTeX">\epsilon =\mathcal {O}(1/\sqrt{n})</tex-math> <mml:math><mml:mrow><mml:mi>ε</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="script">O</mml:mi><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>/</mml:mo><mml:msqrt><mml:mi>n</mml:mi></mml:msqrt><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq7-3087328.gif"/> </inline-formula> which matches the intrinsic excess error of a learning model and is sufficient for generalization, our complexity for quadratic and generic losses is respectively <inline-formula><tex-math notation="LaTeX">\mathcal {O} (n^{0.5}\log ^{2}(n))</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="script">O</mml:mi><mml:mo>(</mml:mo><mml:msup><mml:mi>n</mml:mi><mml:mrow><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>5</mml:mn></mml:mrow></mml:msup><mml:msup><mml:mo form="prefix">log</mml:mo><mml:mn>2</mml:mn></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>n</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq8-3087328.gif"/> </inline-formula> and <inline-formula><tex-math notation="LaTeX">\mathcal {O} (n^{0.5}\log ^{3}(n))</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="script">O</mml:mi><mml:mo>(</mml:mo><mml:msup><mml:mi>n</mml:mi><mml:mrow><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>5</mml:mn></mml:mrow></mml:msup><mml:msup><mml:mo form="prefix">log</mml:mo><mml:mn>3</mml:mn></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>n</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq9-3087328.gif"/> </inline-formula>, which for the first time achieves optimal generalization in less than a single pass over data. Besides, we extend HSDMPG to online strongly convex problems and prove its higher efficiency over the prior algorithms. Numerical results demonstrate the computational advantages of HSDMPG . Despite the success of stochastic variance-reduced gradient (SVRG) algorithms in solving large-scale problems, their stochastic gradient complexity often scales linearly with data size and is expensive for huge data. Accordingly, we propose a hybrid stochastic-deterministic minibatch proximal gradient ( HSDMPG ) algorithm for strongly convex problems with linear prediction structure, e.g., least squares and logistic/softmax regression. HSDMPG enjoys improved computational complexity that is data-size-independent for large-scale problems. It iteratively samples an evolving minibatch of individual losses to estimate the original problem, and can efficiently minimize the sampled subproblems. For strongly convex loss of [Formula Omitted] components, HSDMPG attains an [Formula Omitted]-optimization-error within [Formula Omitted] stochastic gradient evaluations, where [Formula Omitted] is condition number, [Formula Omitted] for quadratic loss and [Formula Omitted] for generic loss. For large-scale problems, our complexity outperforms those of SVRG-type algorithms with/without dependence on data size. Particularly, when [Formula Omitted] which matches the intrinsic excess error of a learning model and is sufficient for generalization, our complexity for quadratic and generic losses is respectively [Formula Omitted] and [Formula Omitted], which for the first time achieves optimal generalization in less than a single pass over data. Besides, we extend HSDMPG to online strongly convex problems and prove its higher efficiency over the prior algorithms. Numerical results demonstrate the computational advantages of HSDMPG . |
| Author | Lin, Zhouchen Zhou, Pan Yuan, Xiao-Tong Hoi, Steven C.H. |
| Author_xml | – sequence: 1 givenname: Pan orcidid: 0000-0003-3400-8943 surname: Zhou fullname: Zhou, Pan email: panzhou3@gmail.com organization: Salesforce, Sea AI Lab of Sea Group, Singapore, Singapore – sequence: 2 givenname: Xiao-Tong orcidid: 0000-0002-7151-8806 surname: Yuan fullname: Yuan, Xiao-Tong email: xtyuan@nuist.edu.cn organization: School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing, China – sequence: 3 givenname: Zhouchen orcidid: 0000-0003-1493-7569 surname: Lin fullname: Lin, Zhouchen email: zlin@pku.edu.cn organization: Key Lab. of Machine Perception (MoE), School of EECS, Peking University, Beijing, China – sequence: 4 givenname: Steven C.H. surname: Hoi fullname: Hoi, Steven C.H. email: shoi@salesforce.com organization: Salesforce Research, Singapore, Singapore |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/34101583$$D View this record in MEDLINE/PubMed |
| BookMark | eNp9kUtr3DAUhUVJaSZp_0ALRdBNNp7qYdnyckiTSSBDAk3XQpKvGAVbmkoaSPrr63k0iyy6urqH70jinDN0EmIAhD5TMqeUdN8fHxar2zkjjM45kS1n8h2aMdqQqmMdO0EzQhtWScnkKTrL-YkQWgvCP6BTXlNCheQzlBb45sUk3-OfJdq1zsXb6gcUSKMPfrfh1XQwutg1fkjx2Y96wMukew-h4BWUdeyxiwlfOeftXrzfFD_6P7r4GLAOPV5CgKSHo_QRvXd6yPDpOM_Rr-urx8ub6u5-eXu5uKssF7RUTe9qA4K4lrRcg26IaSXn1BLRcCDOCC4ld7qmtpWNNMbKhjLXts4RIYzl5-jicO8mxd9byEWNPlsYBh0gbrNigneCNR3jE_rtDfoUtylMv1OspbXsGKvpRH09UlszQq82aQojvah_aU4AOwA2xZwTuFeEErWrTO0rU7vK1LGyySTfmKwv-6BK0n74v_XLweoB4PWtrq7lLpq_A3ekBw |
| CODEN | ITPIDJ |
| CitedBy_id | crossref_primary_10_1109_TPAMI_2024_3382294 crossref_primary_10_1007_s00607_024_01362_2 crossref_primary_10_4018_JGIM_293286 |
| Cites_doi | 10.1109/TPAMI.2019.2954874 10.1561/2200000018 10.1007/0-387-34239-7 10.7551/mitpress/8996.003.0015 10.1007/978-1-4419-8853-9 10.1007/s10107-014-0839-0 10.1145/3055399.3055448 10.1214/aoms/1177729586 10.1109/MSP.2010.936020 10.23919/ACC.2019.8814680 10.1109/TPAMI.2008.79 10.1109/CVPR.2017.419 10.1017/cbo9780511804458 10.1111/j.2517-6161.1996.tb02080.x 10.1201/b18401 10.1017/CBO9781107298019 10.1201/9781315366920 10.1137/110830629 10.1007/s10107-012-0573-4 10.1162/153244302760200704 10.1109/TPAMI.2013.57 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022 |
| DBID | 97E RIA RIE AAYXX CITATION NPM 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8 |
| DOI | 10.1109/TPAMI.2021.3087328 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE Electronic Library (IEL) CrossRef PubMed Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional MEDLINE - Academic |
| DatabaseTitle | CrossRef PubMed Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional MEDLINE - Academic |
| DatabaseTitleList | PubMed MEDLINE - Academic Technology Research Database |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher – sequence: 3 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EISSN | 2160-9292 1939-3539 |
| EndPage | 5946 |
| ExternalDocumentID | 34101583 10_1109_TPAMI_2021_3087328 9448388 |
| Genre | orig-research Journal Article |
| GrantInformation_xml | – fundername: PKU-Baidu Fund grantid: 2020BD006 – fundername: National Natural Science Foundation of China; Natural Science Foundation of China grantid: 61876090; 61936005 funderid: 10.13039/501100001809 – fundername: National Key Research and Development Program of China grantid: 2018AAA0100400 – fundername: NSF China grantid: 61625301; 61731018 – fundername: Key-Area Research and Development Program of Guangdong Province grantid: 2019B121204008 |
| GroupedDBID | --- -DZ -~X .DC 0R~ 29I 4.4 53G 5GY 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK ACNCT AENEX AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS RXW TAE TN5 UHB ~02 AAYXX CITATION NPM 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8 |
| ID | FETCH-LOGICAL-c351t-6df4be50f7073aea60b78331c0563e0fb53883fa41c7868bbc8612f77ff055bc3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 3 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000853875300010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0162-8828 1939-3539 |
| IngestDate | Sat Sep 27 17:18:51 EDT 2025 Sun Nov 30 05:32:19 EST 2025 Sun Nov 09 08:39:31 EST 2025 Sat Nov 29 08:04:19 EST 2025 Tue Nov 18 22:30:36 EST 2025 Wed Aug 27 02:04:21 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 10 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c351t-6df4be50f7073aea60b78331c0563e0fb53883fa41c7868bbc8612f77ff055bc3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ORCID | 0000-0002-7151-8806 0000-0003-1493-7569 0000-0003-3400-8943 |
| PMID | 34101583 |
| PQID | 2714892241 |
| PQPubID | 85458 |
| PageCount | 14 |
| ParticipantIDs | proquest_journals_2714892241 crossref_citationtrail_10_1109_TPAMI_2021_3087328 pubmed_primary_34101583 crossref_primary_10_1109_TPAMI_2021_3087328 proquest_miscellaneous_2539526923 ieee_primary_9448388 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-10-01 |
| PublicationDateYYYYMMDD | 2022-10-01 |
| PublicationDate_xml | – month: 10 year: 2022 text: 2022-10-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States – name: New York |
| PublicationTitle | IEEE transactions on pattern analysis and machine intelligence |
| PublicationTitleAbbrev | TPAMI |
| PublicationTitleAlternate | IEEE Trans Pattern Anal Mach Intell |
| PublicationYear | 2022 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref15 ref11 ref10 Boyarshinov (ref12) 2005 ref16 Zhou (ref40) Lin (ref29) Hardt (ref44) ref47 Defazio (ref17) Mokhtari (ref41) Yuan (ref49) 2020 Zhang (ref30) Zhou (ref39) ref8 ref7 Feldman (ref48) ref9 ref4 ref3 ref6 ref5 Shamir (ref43) Hendrikx (ref36) 2019 Mokhtari (ref42) ref37 ref2 Nitanda (ref31) Shamir (ref27) 2011 ref38 Zhou (ref34) Bottou (ref35); 91 Zhou (ref46) Lin (ref19) Dieuleveut (ref32) 2017; 18 ref24 ref23 ref26 Lei (ref20) Cauchy (ref14) 1847; 25 ref21 ref28 Bach (ref33) Zhou (ref45) Lan (ref22) Johnson (ref18) Monga (ref1) 2017 Shalev-Shwartz (ref25) |
| References_xml | – start-page: 113 volume-title: Proc. Conf. Learn. Theory ident: ref25 article-title: Stochastic convex optimization – start-page: 1234 volume-title: Proc. Conf. Neural Inf. Process. Syst. ident: ref40 article-title: New insight into hybrid stochastic gradient descent: Beyond with-replacement sampling and convexity – start-page: 4062 volume-title: Proc. Conf. Neural Inf. Process. Syst. ident: ref41 article-title: Adaptive newton method for empirical risk minimization to statistical accuracy – ident: ref4 doi: 10.1109/TPAMI.2019.2954874 – start-page: 11556 volume-title: Proc. Int. Conf. Mach. Learn. ident: ref34 article-title: Hybrid stochastic-deterministic minibatch proximal gradient: Less-than-single-pass optimization with nearly optimal generalization – start-page: 195 volume-title: Proc. Artif. Intell. Statist. ident: ref31 article-title: Accelerated stochastic gradient descent for minimizing finite sums – start-page: 1646 volume-title: Proc. Conf. Neural Inf. Process. Syst. ident: ref17 article-title: SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives – year: 2011 ident: ref27 article-title: Making gradient descent optimal for strongly convex stochastic optimization – start-page: 353 volume-title: Proc. Int. Conf. Mach. Learn. ident: ref30 article-title: Stochastic primal-dual coordinate method for regularized empirical risk minimization – volume: 91 start-page: 12 issue: 8 volume-title: Proc. Neuro-Nımes ident: ref35 article-title: Stochastic gradient learning in neural networks – ident: ref16 doi: 10.1561/2200000018 – start-page: 148 volume-title: Proc. Artif. Intell. Statist. ident: ref20 article-title: Less than a single pass: Stochastically controlled stochastic gradient – start-page: 1 year: 2020 ident: ref49 article-title: On convergence of distributed approximate Newton methods: Globalization, sharper bounds and beyond publication-title: J. Mach. Learn. Res. – volume: 25 start-page: 536 year: 1847 ident: ref14 article-title: Méthode générale pour la résolution des systèmes déquations simultanées publication-title: Comptesrendus des séances de l’Académie des sciences de Paris – ident: ref24 doi: 10.1007/0-387-34239-7 – start-page: 773 volume-title: Proc. Conf. Neural Inf. Process. Syst. ident: ref33 article-title: Non-strongly-convex smooth stochastic approximation with convergence rate O (1/n) – ident: ref23 doi: 10.7551/mitpress/8996.003.0015 – start-page: 2060 volume-title: Proc. Conf. Neural Inf. Process. Syst. ident: ref42 article-title: First-order adaptive sample size methods to reduce complexity of empirical risk minimization – ident: ref10 doi: 10.1007/978-1-4419-8853-9 – ident: ref28 doi: 10.1007/s10107-014-0839-0 – ident: ref21 doi: 10.1145/3055399.3055448 – ident: ref15 doi: 10.1214/aoms/1177729586 – year: 2005 ident: ref12 article-title: Machine learning in computational finance – ident: ref7 doi: 10.1109/MSP.2010.936020 – ident: ref37 doi: 10.23919/ACC.2019.8814680 – start-page: 315 volume-title: Proc. Conf. Neural Inf. Process. Syst. ident: ref18 article-title: Accelerating stochastic gradient descent using predictive variance reduction – ident: ref2 doi: 10.1109/TPAMI.2008.79 – start-page: 1225 volume-title: Proc. Int. Conf. Mach. Learn. ident: ref44 article-title: Train faster, generalize better: Stability of stochastic gradient descent – ident: ref5 doi: 10.1109/CVPR.2017.419 – ident: ref6 doi: 10.1017/cbo9780511804458 – volume: 18 start-page: 3520 issue: 1 year: 2017 ident: ref32 article-title: Harder, better, faster, stronger convergence rates for least-squares regression publication-title: J. Mach. Learn. Res. – ident: ref9 doi: 10.1111/j.2517-6161.1996.tb02080.x – ident: ref11 doi: 10.1201/b18401 – ident: ref26 doi: 10.1017/CBO9781107298019 – ident: ref8 doi: 10.1201/9781315366920 – start-page: 1 volume-title: Proc. Conf. Neural Inf. Process. Syst. ident: ref39 article-title: Efficient stochastic gradient hard thresholding – start-page: 5960 volume-title: Proc. Int. Conf. Mach. Learn. ident: ref45 article-title: Understanding generalization and optimization performance of deep CNNs – start-page: 1000 volume-title: Proc. Int. Conf. Mach. Learn. ident: ref43 article-title: Communication-efficient distributed optimization using an approximate newton-type method – ident: ref38 doi: 10.1137/110830629 – ident: ref13 doi: 10.1007/s10107-012-0573-4 – start-page: 10462 volume-title: Proc. Conf. Neural Inf. Process. Syst. ident: ref22 article-title: A unified variance-reduced accelerated gradient method for convex optimization – ident: ref47 doi: 10.1162/153244302760200704 – start-page: 1 volume-title: Proc. Int. Conf. Learn. Representations ident: ref46 article-title: Empirical risk landscape analysis for understanding deep neural networks – volume-title: Handbook of Convex Optimization Methods in Imaging Science year: 2017 ident: ref1 – year: 2019 ident: ref36 article-title: Asynchronous accelerated proximal stochastic gradient for strongly convex distributed finite sums – start-page: 1270 volume-title: Proc. Conf. Learn. Theory ident: ref48 article-title: High probability generalization bounds for uniformly stable algorithms with nearly optimal rate – start-page: 3059 volume-title: Proc. Conf. Neural Inf. Process. Syst. ident: ref29 article-title: An accelerated proximal coordinate gradient method – ident: ref3 doi: 10.1109/TPAMI.2013.57 – start-page: 3384 volume-title: Proc. Conf. Neural Inf. Process. Syst. ident: ref19 article-title: A universal catalyst for first-order optimization |
| SSID | ssj0014503 |
| Score | 2.426824 |
| Snippet | Despite the success of stochastic variance-reduced gradient (SVRG) algorithms in solving large-scale problems, their stochastic gradient complexity often... |
| SourceID | proquest pubmed crossref ieee |
| SourceType | Aggregation Database Index Database Enrichment Source Publisher |
| StartPage | 5933 |
| SubjectTerms | Algorithms Catalysts Complexity Computational complexity Computational modeling Convex optimization Linear prediction online convex optimization Optimization precondition Prediction algorithms Signal processing algorithms Stochastic processes stochastic variance-reduced algorithm |
| Title | A Hybrid Stochastic-Deterministic Minibatch Proximal Gradient Method for Efficient Optimization and Generalization |
| URI | https://ieeexplore.ieee.org/document/9448388 https://www.ncbi.nlm.nih.gov/pubmed/34101583 https://www.proquest.com/docview/2714892241 https://www.proquest.com/docview/2539526923 |
| Volume | 44 |
| WOSCitedRecordID | wos000853875300010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 2160-9292 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014503 issn: 0162-8828 databaseCode: RIE dateStart: 19790101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1La9wwEB6SkEN7aF59uE2CCrm1TmzLsqTj0jyhmywkLXszkiyTQGMXx1vaf9-RVjYppIXehDx-wIxG31gz8wEcoBlYXRkRM5NhgJIrG2uZqFgzbYosL1jl-VO-fuaXl2I-l7MV-DjWwlhrffKZPXRDf5ZftWbhfpUdSYwlqBCrsMo5X9ZqjScGOfMsyIhgcIVjGDEUyCTy6GY2mV5gKJilh67_Hc0cSR96b9wJBf1jP_IEK3_Hmn7POd34v6_dhBcBW5LJ0hi2YMU227Ax8DaQsIy34fmjJoQ70E3I-S9Xt0Wu-9bcKte4OT4OWTK-jTOZ4kCjz74ls679eXePLznrfK5YT6aegpog9iUnvh2Fm7xCR3QfKjyJaioSuluHqZfw5fTk5tN5HJgYYkNZ2sdFVefasqTm6BGUVUWiuaA0NQifqE1qjW5T0FrlqeGiEFobgcip5ryuE4Zap69grWkb-wYI-gehcoQ5tEJxTqWtjcCxlawyiP4iSAd9lCa0KXdsGd9KH64ksvTqLJ06y6DOCD6M93xfNun4p_SOU9YoGfQUwe6g9jKs44cy4xguSgdzIng_XsYV6I5VVGPbBcowKh1Pe0YjeL00l_HZg5W9ffqd7-BZ5sopfHLgLqz13cLuwbr50d89dPto5nOx7838Ny-x9zc |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fb9QwDLbGhgQ8MNhgFAYLEm_QrW2aNnk8sY2bdnecxIH2ViVpqk1i7dT1EPvv5-TSCiSYtLcodX9IdpzPje0P4AOagVGl5iHTCQYoqTShEpEMFVM6S9KMlY4_5cckn8342ZmYr8GnoRbGGOOSz8y-Hbqz_LLRS_ur7EBgLEE5fwAbLE2TeFWtNZwZpMzxICOGwTWOgURfIhOJg8V8ND3BYDCJ920HPJpYmj7037gXcvrXjuQoVv6PNt2uc7x5v-99Bk89uiSjlTk8hzVTb8Fmz9xA_ELegid_tCHchnZExje2cot86xp9Lm3r5vDQ58m4Rs5kigOFXvuczNvm98UlvuRL67LFOjJ1JNQE0S85cg0p7ORXdEWXvsaTyLokvr-1n3oB34-PFp_HoediCDVlcRdmZZUqw6IqR58gjcwilXNKY40AipqoUug4Oa1kGuucZ1wpzRE7VXleVRFDvdOXsF43tXkFBD0ElykCHVqieE6FqTTHsRGs1Ij_Aoh7fRTaNyq3fBk_CxewRKJw6iysOguvzgA-Dvdcrdp03Cm9bZU1SHo9BbDbq73wK_m6SHIMGIUFOgG8Hy7jGrQHK7I2zRJlGBWWqT2hAeyszGV4dm9lr__9zj14NF5MJ8XkZHb6Bh4ntrjCpQruwnrXLs1beKh_dRfX7Ttn7Lda3PmW |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Hybrid+Stochastic-Deterministic+Minibatch+Proximal+Gradient+Method+for+Efficient+Optimization+and+Generalization&rft.jtitle=IEEE+transactions+on+pattern+analysis+and+machine+intelligence&rft.au=Zhou%2C+Pan&rft.au=Yuan%2C+Xiao-Tong&rft.au=Lin%2C+Zhouchen&rft.au=Hoi%2C+Steven+C.H.&rft.date=2022-10-01&rft.issn=0162-8828&rft.eissn=2160-9292&rft.volume=44&rft.issue=10&rft.spage=5933&rft.epage=5946&rft_id=info:doi/10.1109%2FTPAMI.2021.3087328&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TPAMI_2021_3087328 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0162-8828&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0162-8828&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0162-8828&client=summon |