Multi-stage stochastic gradient method with momentum acceleration
•Stage-wise optimization and momentum have been widely employed to accelerate SGD.•Negative momentum provides acceleration and stabilization on stochastic first-order methods.•Negative momentum extends Nesterovs momentum to the stage-wise optimization.•Gradient correction avoids the oscillations and...
Gespeichert in:
| Veröffentlicht in: | Signal processing Jg. 188; S. 108201 |
|---|---|
| Hauptverfasser: | , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Elsevier B.V
01.11.2021
|
| Schlagworte: | |
| ISSN: | 0165-1684, 1872-7557 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | •Stage-wise optimization and momentum have been widely employed to accelerate SGD.•Negative momentum provides acceleration and stabilization on stochastic first-order methods.•Negative momentum extends Nesterovs momentum to the stage-wise optimization.•Gradient correction avoids the oscillations and make stochastic gradient more effective and tolerant.
Multi-stage optimization which invokes a stochastic algorithm restarting with the returned solution of previous stage, has been widely employed in stochastic optimization. Momentum acceleration technique is famously known for building gradient-based algorithms with fast convergence in large-scale optimization. In order to take the advantage of this acceleration in multi-stage stochastic optimization, we develop a multi-stage stochastic gradient descent with momentum acceleration method, named MAGNET, for first-order stochastic convex optimization. The main ingredient is the employment of a negative momentum, which extends the Nesterov’s momentum to the multi-stage optimization. It can be incorporated in a stochastic gradient-based algorithm in multi-stage mechanism and provide acceleration. The proposed algorithm obtains an accelerated rate of convergence, and is adaptive and free from hyper-parameter tuning. The experimental results demonstrate that our algorithm is competitive with some state-of-the-art methods for solving several typical optimization problems in machine learning. |
|---|---|
| AbstractList | •Stage-wise optimization and momentum have been widely employed to accelerate SGD.•Negative momentum provides acceleration and stabilization on stochastic first-order methods.•Negative momentum extends Nesterovs momentum to the stage-wise optimization.•Gradient correction avoids the oscillations and make stochastic gradient more effective and tolerant.
Multi-stage optimization which invokes a stochastic algorithm restarting with the returned solution of previous stage, has been widely employed in stochastic optimization. Momentum acceleration technique is famously known for building gradient-based algorithms with fast convergence in large-scale optimization. In order to take the advantage of this acceleration in multi-stage stochastic optimization, we develop a multi-stage stochastic gradient descent with momentum acceleration method, named MAGNET, for first-order stochastic convex optimization. The main ingredient is the employment of a negative momentum, which extends the Nesterov’s momentum to the multi-stage optimization. It can be incorporated in a stochastic gradient-based algorithm in multi-stage mechanism and provide acceleration. The proposed algorithm obtains an accelerated rate of convergence, and is adaptive and free from hyper-parameter tuning. The experimental results demonstrate that our algorithm is competitive with some state-of-the-art methods for solving several typical optimization problems in machine learning. |
| ArticleNumber | 108201 |
| Author | Luo, Zhijian Qian, Yuntao Chen, Siyu Hou, Yueen |
| Author_xml | – sequence: 1 givenname: Zhijian surname: Luo fullname: Luo, Zhijian email: luozhijian@zju.edu.cn organization: Institute of Artificial Intelligence, College of Computer Science and Technology Zhejiang University, Hangzhou, 310027, China – sequence: 2 givenname: Siyu surname: Chen fullname: Chen, Siyu email: sychen@zju.edu.cn organization: Institute of Artificial Intelligence, College of Computer Science and Technology Zhejiang University, Hangzhou, 310027, China – sequence: 3 givenname: Yuntao surname: Qian fullname: Qian, Yuntao email: ytqian@zu.edu.cn organization: Institute of Artificial Intelligence, College of Computer Science and Technology Zhejiang University, Hangzhou, 310027, China – sequence: 4 givenname: Yueen surname: Hou fullname: Hou, Yueen email: houyueen@jyu.edu.cn organization: School of Computer, Jiaying University, Meizhou, 514015, China |
| BookMark | eNqFkM9KAzEQh4NUsK2-gYd9ga35002yHoRStAoVL3oO0yTbpuxuSpIqvr2p68mDnmYY5vsx803QqPe9Reia4BnBhN_sZ9FtD8HPKKYkjyTF5AyNiRS0FFUlRmic16qScDm_QJMY9xhjwjgeo8XzsU2ujAm2tojJ6x3E5HSxDWCc7VPR2bTzpvhwaVd0vsujY1eA1ra1AZLz_SU6b6CN9uqnTtHbw_3r8rFcv6yelot1qRnmqQTCiDHCSGhEg4VhrK430lSY07mtMEDuGKdsI2RDiawZZbVmQvNNDRVoyqbodsjVwccYbKO0S98XpACuVQSrkwy1V4MMdZKhBhkZnv-CD8F1ED7_w-4GzObH3p0NKupsRVvjgtVJGe_-DvgCiB19tg |
| CitedBy_id | crossref_primary_10_1109_TSC_2022_3177316 crossref_primary_10_1016_j_ins_2023_119546 crossref_primary_10_1002_rnc_6479 crossref_primary_10_3390_app15158261 crossref_primary_10_1007_s10957_022_02157_1 crossref_primary_10_1016_j_asoc_2023_110174 |
| Cites_doi | 10.3389/fams.2017.00009 10.1561/9781601988614 10.1137/110848864 10.1214/aoms/1177729586 10.1137/130942954 10.1214/18-EJS1395 10.1137/110848876 10.1016/0041-5553(64)90137-5 10.1007/s10107-010-0434-y |
| ContentType | Journal Article |
| Copyright | 2021 Elsevier B.V. |
| Copyright_xml | – notice: 2021 Elsevier B.V. |
| DBID | AAYXX CITATION |
| DOI | 10.1016/j.sigpro.2021.108201 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 1872-7557 |
| ExternalDocumentID | 10_1016_j_sigpro_2021_108201 S0165168421002395 |
| GroupedDBID | --K --M -~X .DC .~1 0R~ 123 1B1 1~. 1~5 4.4 457 4G. 53G 5VS 7-5 71M 8P~ 9JN AABNK AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABFNM ABFRF ABMAC ABXDB ABYKQ ACDAQ ACGFO ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADJOM ADMUD ADTZH AEBSH AECPX AEFWE AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F0J F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q G8K GBLVA GBOLZ HLZ HVGLF HZ~ IHE J1W JJJVA KOM LG9 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SBC SDF SDG SDP SES SEW SPC SPCBC SST SSV SSZ T5K TAE TN5 WUQ XPP ZMT ~02 ~G- 9DU AATTM AAXKI AAYWO AAYXX ABDPE ABJNI ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD |
| ID | FETCH-LOGICAL-c306t-a131dd7d8af7f07d3399b8d50624e50aa5063623b78f21893239c37c6b9a5ac23 |
| ISICitedReferencesCount | 5 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000684282300013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0165-1684 |
| IngestDate | Tue Nov 18 21:53:58 EST 2025 Sat Nov 29 07:21:35 EST 2025 Fri Feb 23 02:46:54 EST 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Stochastic gradient descent Convex optimization Multi-stage Momentum acceleration |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c306t-a131dd7d8af7f07d3399b8d50624e50aa5063623b78f21893239c37c6b9a5ac23 |
| ParticipantIDs | crossref_citationtrail_10_1016_j_sigpro_2021_108201 crossref_primary_10_1016_j_sigpro_2021_108201 elsevier_sciencedirect_doi_10_1016_j_sigpro_2021_108201 |
| PublicationCentury | 2000 |
| PublicationDate | November 2021 2021-11-00 |
| PublicationDateYYYYMMDD | 2021-11-01 |
| PublicationDate_xml | – month: 11 year: 2021 text: November 2021 |
| PublicationDecade | 2020 |
| PublicationTitle | Signal processing |
| PublicationYear | 2021 |
| Publisher | Elsevier B.V |
| Publisher_xml | – name: Elsevier B.V |
| References | Robbins, Monro (bib0001) 1951; 22 Gadat, Panloup, Saadane (bib0007) 2018; 12 Hazan, Kale (bib0018) 2014; 15 Aybat, Fallah, Gurbuzbalaban, Ozdaglar (bib0014) 2019; vol. 32 Nesterov (bib0010) 1983; 27 Ochs, Chen, Brox, Pock (bib0013) 2014; 7 Zhou, Shang, Cheng (bib0031) 2018 Rakhlin, Shamir, Sridharan (bib0017) 2012 Krizhevsky, Sutskever, Hinton (bib0038) 2012 Chen, Xu, Chen, Yang (bib0020) 2018 Allen-Zhu, Yuan (bib0037) 2016 Polyak (bib0009) 1964; 4 N.B. Kovachki, A.M. Stuart, Analysis of momentum methods 15 (2014). Ghadimi, Lan (bib0033) 2012; 22 T. Yang, Q. Lin, Z. Li, Unified convergence analysis of stochastic momentum methods for convex and non-convex optimization S. Bubeck, Theory of convex optimization for machine learning Duchi, Hazan, Singer (bib0025) 2011; 12 Liu, Zhang, Zhang, Rong, Yang (bib0021) 2018 Johnson, Zhang (bib0026) 2013 Ghadimi, Lan (bib0034) 2013; 23 S. Ruder, An overview of gradient descent optimization algorithms Lan (bib0036) 2012; 133 Yuan, Yan, Jin, Yang (bib0024) 2019; vol. 32 Defazio, Bach, Lacostejulien (bib0027) 2014 Hien, Nguyen, Xu, Lu, Feng (bib0030) 2019 Goodfellow, Bengio, Courville (bib0011) 2016 Yang, Lin (bib0019) 2018; 19 Agarwal, Wainwright, Bartlett, Ravikumar (bib0003) 2009 Murata, Suzuki (bib0029) 2017 Allen-Zhu (bib0016) 2017; 18 (2014). Darken, Moody (bib0039) 1992 Sutskever, Martens, Dahl, Hinton (bib0012) 2013 Kidambi, Netrapalli, Jain, Kakade (bib0015) 2018 N. Loizou, P. Richtárik, Momentum and stochastic momentum for stochastic gradient, newton, proximal point and subspace descent methods Xu, Lin, Yang (bib0022) 2017 (2019). (2017). Nitanda (bib0028) 2014 (2016). Konečnỳ, Richtárik (bib0002) 2017; 3 Chen, Yang, Yi, Zhou, Chen (bib0023) 2018 S. Bubeck, Convex optimization: algorithms and complexity, arXiv Polyak (10.1016/j.sigpro.2021.108201_bib0009) 1964; 4 Ghadimi (10.1016/j.sigpro.2021.108201_bib0033) 2012; 22 Robbins (10.1016/j.sigpro.2021.108201_bib0001) 1951; 22 Hien (10.1016/j.sigpro.2021.108201_bib0030) 2019 Yang (10.1016/j.sigpro.2021.108201_bib0019) 2018; 19 10.1016/j.sigpro.2021.108201_bib0032 10.1016/j.sigpro.2021.108201_bib0035 Lan (10.1016/j.sigpro.2021.108201_bib0036) 2012; 133 Nesterov (10.1016/j.sigpro.2021.108201_bib0010) 1983; 27 Aybat (10.1016/j.sigpro.2021.108201_bib0014) 2019; vol. 32 Nitanda (10.1016/j.sigpro.2021.108201_bib0028) 2014 Agarwal (10.1016/j.sigpro.2021.108201_bib0003) 2009 Ghadimi (10.1016/j.sigpro.2021.108201_bib0034) 2013; 23 Rakhlin (10.1016/j.sigpro.2021.108201_bib0017) 2012 Allen-Zhu (10.1016/j.sigpro.2021.108201_bib0016) 2017; 18 Defazio (10.1016/j.sigpro.2021.108201_bib0027) 2014 Kidambi (10.1016/j.sigpro.2021.108201_bib0015) 2018 Xu (10.1016/j.sigpro.2021.108201_bib0022) 2017 Darken (10.1016/j.sigpro.2021.108201_bib0039) 1992 Goodfellow (10.1016/j.sigpro.2021.108201_bib0011) 2016 Sutskever (10.1016/j.sigpro.2021.108201_bib0012) 2013 Allen-Zhu (10.1016/j.sigpro.2021.108201_bib0037) 2016 Johnson (10.1016/j.sigpro.2021.108201_bib0026) 2013 10.1016/j.sigpro.2021.108201_bib0004 Murata (10.1016/j.sigpro.2021.108201_bib0029) 2017 Zhou (10.1016/j.sigpro.2021.108201_bib0031) 2018 10.1016/j.sigpro.2021.108201_bib0006 Liu (10.1016/j.sigpro.2021.108201_bib0021) 2018 10.1016/j.sigpro.2021.108201_bib0005 10.1016/j.sigpro.2021.108201_bib0008 Chen (10.1016/j.sigpro.2021.108201_bib0023) 2018 Chen (10.1016/j.sigpro.2021.108201_bib0020) 2018 Duchi (10.1016/j.sigpro.2021.108201_bib0025) 2011; 12 Konečnỳ (10.1016/j.sigpro.2021.108201_bib0002) 2017; 3 Ochs (10.1016/j.sigpro.2021.108201_bib0013) 2014; 7 Yuan (10.1016/j.sigpro.2021.108201_bib0024) 2019; vol. 32 Hazan (10.1016/j.sigpro.2021.108201_bib0018) 2014; 15 Krizhevsky (10.1016/j.sigpro.2021.108201_bib0038) 2012 Gadat (10.1016/j.sigpro.2021.108201_bib0007) 2018; 12 |
| References_xml | – volume: 22 start-page: 400 year: 1951 end-page: 407 ident: bib0001 article-title: A stochastic approximation method publication-title: Ann. Math. Stat. – reference: N. Loizou, P. Richtárik, Momentum and stochastic momentum for stochastic gradient, newton, proximal point and subspace descent methods, – volume: 4 start-page: 1 year: 1964 end-page: 17 ident: bib0009 article-title: Some methods of speeding up the convergence of iteration methods publication-title: USSR Comput. Math. Math. Phys. – volume: 18 start-page: 8194 year: 2017 end-page: 8244 ident: bib0016 article-title: Katyusha: the first direct acceleration of stochastic gradient methods publication-title: J. Mach. Learn. Res. – reference: S. Ruder, An overview of gradient descent optimization algorithms, – reference: (2014). – volume: 12 start-page: 461 year: 2018 end-page: 529 ident: bib0007 article-title: Stochastic heavy ball publication-title: Electron. J. Stat. – year: 2018 ident: bib0023 article-title: Universal stagewise learning for non-convex problems with convergence on averaged solutions publication-title: Proc. Int. Conf. Learn. Repr. – start-page: 1 year: 2009 end-page: 9 ident: bib0003 article-title: Information-theoretic lower bounds on the oracle complexity of convex optimization publication-title: Proc. Neural Info. Process. Syst. – start-page: 1009 year: 1992 end-page: 1016 ident: bib0039 article-title: Towards faster stochastic gradient search publication-title: Proc. Neural Inf. Process. Syst. – volume: 7 start-page: 1388 year: 2014 end-page: 1419 ident: bib0013 article-title: iPiano: inertial proximal algorithm for nonconvex optimization publication-title: SIAM J. Imaging Sci. – start-page: 5975 year: 2018 end-page: 5984 ident: bib0031 article-title: A simple stochastic variance reduced algorithm with fast convergence rates publication-title: Proc. Int. Conf. Mach. Learn. – reference: (2019). – start-page: 1139 year: 2013 end-page: 1147 ident: bib0012 article-title: On the importance of initialization and momentum in deep learning publication-title: Proc. Int. Conf. Mach. Learn. – volume: 15 start-page: 2489 year: 2014 end-page: 2512 ident: bib0018 article-title: Beyond the regret minimization barrier: optimal algorithms for stochastic strongly-convex optimization publication-title: J. Mach. Learn. Res. – start-page: 1 year: 2018 end-page: 9 ident: bib0015 article-title: On the insufficiency of existing momentum schemes for stochastic optimization publication-title: Inf. Theory Appl. – start-page: 608 year: 2017 end-page: 617 ident: bib0029 article-title: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization publication-title: Proc. Neural Inf. Process. Syst. – volume: 27 year: 1983 ident: bib0010 article-title: A method for solving the convex programming problem with convergence rate publication-title: Soviet Math. Doklady – volume: vol. 32 start-page: 8523 year: 2019 end-page: 8534 ident: bib0014 article-title: A universally optimal multistage accelerated stochastic gradient method publication-title: Advances in Neural Information Processing Systems – reference: T. Yang, Q. Lin, Z. Li, Unified convergence analysis of stochastic momentum methods for convex and non-convex optimization, – start-page: 315 year: 2013 end-page: 323 ident: bib0026 article-title: Accelerating stochastic gradient descent using predictive variance reduction publication-title: Proc. Neural Inf. Process. Syst. – reference: S. Bubeck, Theory of convex optimization for machine learning, – reference: S. Bubeck, Convex optimization: algorithms and complexity, arXiv: – volume: 22 start-page: 1469 year: 2012 end-page: 1492 ident: bib0033 article-title: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: a generic algorithmic framework publication-title: SIAM J. Optim. – reference: (2016). – start-page: 1097 year: 2012 end-page: 1105 ident: bib0038 article-title: ImageNet classification with deep convolutional neural networks publication-title: Proc. Neural Inf. Process. Syst. – start-page: 4678 year: 2018 end-page: 4689 ident: bib0021 article-title: Fast rates of ERM and stochastic approximation: adaptive to error bound conditions publication-title: Proc. Neural Inf. Process. Syst. – volume: 12 year: 2011 ident: bib0025 article-title: Adaptive subgradient methods for online learning and stochastic optimization. publication-title: J. Mach. Learn. Res. – volume: 23 start-page: 2061 year: 2013 end-page: 2089 ident: bib0034 article-title: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms publication-title: SIAM J. Optim. – reference: 15 (2014). – year: 2016 ident: bib0011 article-title: Deep Learning – start-page: 1080 year: 2016 end-page: 1089 ident: bib0037 article-title: Improved SVRG for non-strongly-convex or sum-of-non-convex objectives publication-title: Proc. Int. Conf. Mach. Learn. – reference: (2017). – start-page: 1 year: 2019 end-page: 26 ident: bib0030 article-title: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization publication-title: J. Optim. Theory Appl. – reference: N.B. Kovachki, A.M. Stuart, Analysis of momentum methods, – start-page: 912 year: 2018 end-page: 920 ident: bib0020 article-title: SADAGRAD: strongly adaptive stochastic gradient methods publication-title: Proc. Int. Conf. Mach. Learn. – start-page: 1646 year: 2014 end-page: 1654 ident: bib0027 article-title: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives – volume: 133 start-page: 365 year: 2012 end-page: 397 ident: bib0036 article-title: An optimal method for stochastic composite optimization publication-title: Math. Program. – start-page: 1571 year: 2012 end-page: 1578 ident: bib0017 article-title: Making gradient descent optimal for strongly convex stochastic optimization publication-title: Proc. Int. Conf. Mach. Learn. – start-page: 1574 year: 2014 end-page: 1582 ident: bib0028 article-title: Stochastic proximal gradient descent with acceleration techniques publication-title: Proc. Neural Inf. Process. Syst. – volume: vol. 32 year: 2019 ident: bib0024 article-title: Stagewise training accelerates convergence of testing error over SGD publication-title: Advances in Neural Information Processing Systems – volume: 19 start-page: 236 year: 2018 end-page: 268 ident: bib0019 article-title: RSG: beating subgradient method without smoothness and strong convexity publication-title: J. Mach. Learn. Res. – start-page: 3821 year: 2017 end-page: 3830 ident: bib0022 article-title: Stochastic convex optimization: faster local growth implies faster global convergence publication-title: Proc. Int. Conf. Mach. Learn. – volume: 3 start-page: 9 year: 2017 ident: bib0002 article-title: Semi-stochastic gradient descent methods publication-title: Front. Appl. Math. Stat – start-page: 1 year: 2009 ident: 10.1016/j.sigpro.2021.108201_bib0003 article-title: Information-theoretic lower bounds on the oracle complexity of convex optimization – start-page: 1646 year: 2014 ident: 10.1016/j.sigpro.2021.108201_bib0027 – start-page: 1 year: 2019 ident: 10.1016/j.sigpro.2021.108201_bib0030 article-title: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization publication-title: J. Optim. Theory Appl. – start-page: 1097 year: 2012 ident: 10.1016/j.sigpro.2021.108201_bib0038 article-title: ImageNet classification with deep convolutional neural networks – volume: 27 issue: 3 year: 1983 ident: 10.1016/j.sigpro.2021.108201_bib0010 article-title: A method for solving the convex programming problem with convergence rate O(1/k2) publication-title: Soviet Math. Doklady – start-page: 5975 year: 2018 ident: 10.1016/j.sigpro.2021.108201_bib0031 article-title: A simple stochastic variance reduced algorithm with fast convergence rates – volume: 3 start-page: 9 year: 2017 ident: 10.1016/j.sigpro.2021.108201_bib0002 article-title: Semi-stochastic gradient descent methods publication-title: Front. Appl. Math. Stat doi: 10.3389/fams.2017.00009 – ident: 10.1016/j.sigpro.2021.108201_bib0032 doi: 10.1561/9781601988614 – start-page: 1 year: 2018 ident: 10.1016/j.sigpro.2021.108201_bib0015 article-title: On the insufficiency of existing momentum schemes for stochastic optimization – start-page: 1139 year: 2013 ident: 10.1016/j.sigpro.2021.108201_bib0012 article-title: On the importance of initialization and momentum in deep learning – start-page: 1571 year: 2012 ident: 10.1016/j.sigpro.2021.108201_bib0017 article-title: Making gradient descent optimal for strongly convex stochastic optimization – start-page: 912 year: 2018 ident: 10.1016/j.sigpro.2021.108201_bib0020 article-title: SADAGRAD: strongly adaptive stochastic gradient methods – volume: 22 start-page: 1469 issue: 4 year: 2012 ident: 10.1016/j.sigpro.2021.108201_bib0033 article-title: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: a generic algorithmic framework publication-title: SIAM J. Optim. doi: 10.1137/110848864 – volume: 22 start-page: 400 issue: 3 year: 1951 ident: 10.1016/j.sigpro.2021.108201_bib0001 article-title: A stochastic approximation method publication-title: Ann. Math. Stat. doi: 10.1214/aoms/1177729586 – volume: 7 start-page: 1388 issue: 2 year: 2014 ident: 10.1016/j.sigpro.2021.108201_bib0013 article-title: iPiano: inertial proximal algorithm for nonconvex optimization publication-title: SIAM J. Imaging Sci. doi: 10.1137/130942954 – ident: 10.1016/j.sigpro.2021.108201_bib0005 – volume: 15 start-page: 2489 issue: 1 year: 2014 ident: 10.1016/j.sigpro.2021.108201_bib0018 article-title: Beyond the regret minimization barrier: optimal algorithms for stochastic strongly-convex optimization publication-title: J. Mach. Learn. Res. – start-page: 4678 year: 2018 ident: 10.1016/j.sigpro.2021.108201_bib0021 article-title: Fast rates of ERM and stochastic approximation: adaptive to error bound conditions – ident: 10.1016/j.sigpro.2021.108201_bib0035 – volume: 12 start-page: 461 issue: 1 year: 2018 ident: 10.1016/j.sigpro.2021.108201_bib0007 article-title: Stochastic heavy ball publication-title: Electron. J. Stat. doi: 10.1214/18-EJS1395 – volume: 23 start-page: 2061 issue: 4 year: 2013 ident: 10.1016/j.sigpro.2021.108201_bib0034 article-title: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms publication-title: SIAM J. Optim. doi: 10.1137/110848876 – year: 2016 ident: 10.1016/j.sigpro.2021.108201_bib0011 – start-page: 1009 year: 1992 ident: 10.1016/j.sigpro.2021.108201_bib0039 article-title: Towards faster stochastic gradient search – start-page: 315 year: 2013 ident: 10.1016/j.sigpro.2021.108201_bib0026 article-title: Accelerating stochastic gradient descent using predictive variance reduction – volume: 4 start-page: 1 issue: 5 year: 1964 ident: 10.1016/j.sigpro.2021.108201_bib0009 article-title: Some methods of speeding up the convergence of iteration methods publication-title: USSR Comput. Math. Math. Phys. doi: 10.1016/0041-5553(64)90137-5 – year: 2018 ident: 10.1016/j.sigpro.2021.108201_bib0023 article-title: Universal stagewise learning for non-convex problems with convergence on averaged solutions – volume: vol. 32 start-page: 8523 year: 2019 ident: 10.1016/j.sigpro.2021.108201_bib0014 article-title: A universally optimal multistage accelerated stochastic gradient method – volume: 18 start-page: 8194 issue: 1 year: 2017 ident: 10.1016/j.sigpro.2021.108201_bib0016 article-title: Katyusha: the first direct acceleration of stochastic gradient methods publication-title: J. Mach. Learn. Res. – start-page: 608 year: 2017 ident: 10.1016/j.sigpro.2021.108201_bib0029 article-title: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization – start-page: 3821 year: 2017 ident: 10.1016/j.sigpro.2021.108201_bib0022 article-title: Stochastic convex optimization: faster local growth implies faster global convergence – volume: vol. 32 year: 2019 ident: 10.1016/j.sigpro.2021.108201_bib0024 article-title: Stagewise training accelerates convergence of testing error over SGD – volume: 12 issue: 7 year: 2011 ident: 10.1016/j.sigpro.2021.108201_bib0025 article-title: Adaptive subgradient methods for online learning and stochastic optimization. publication-title: J. Mach. Learn. Res. – start-page: 1080 year: 2016 ident: 10.1016/j.sigpro.2021.108201_bib0037 article-title: Improved SVRG for non-strongly-convex or sum-of-non-convex objectives – ident: 10.1016/j.sigpro.2021.108201_bib0004 – start-page: 1574 year: 2014 ident: 10.1016/j.sigpro.2021.108201_bib0028 article-title: Stochastic proximal gradient descent with acceleration techniques – ident: 10.1016/j.sigpro.2021.108201_bib0008 – volume: 133 start-page: 365 issue: 1 year: 2012 ident: 10.1016/j.sigpro.2021.108201_bib0036 article-title: An optimal method for stochastic composite optimization publication-title: Math. Program. doi: 10.1007/s10107-010-0434-y – ident: 10.1016/j.sigpro.2021.108201_bib0006 – volume: 19 start-page: 236 issue: 1 year: 2018 ident: 10.1016/j.sigpro.2021.108201_bib0019 article-title: RSG: beating subgradient method without smoothness and strong convexity publication-title: J. Mach. Learn. Res. |
| SSID | ssj0001360 |
| Score | 2.3898318 |
| Snippet | •Stage-wise optimization and momentum have been widely employed to accelerate SGD.•Negative momentum provides acceleration and stabilization on stochastic... |
| SourceID | crossref elsevier |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 108201 |
| SubjectTerms | Convex optimization Momentum acceleration Multi-stage Stochastic gradient descent |
| Title | Multi-stage stochastic gradient method with momentum acceleration |
| URI | https://dx.doi.org/10.1016/j.sigpro.2021.108201 |
| Volume | 188 |
| WOSCitedRecordID | wos000684282300013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1872-7557 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001360 issn: 0165-1684 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1JT-MwFLZmCofhgBgWsSsHbshVXSdxcqwQCBBCI7GocIkcxy6pIK3aBMG_5zl2lhkQMxzmEllW4iT-Xvw-v7wFoYMYdLYEqouJYBS7RPRwDCsk5lLFzBOwHSv_mN5esMvLYDgMf9lyqfOynADLsuDlJZz-V6ihD8DWobNfgLseFDqgDaDDEWCH4z8BX4bUYiB9I508diIeuE7FfDialc5dua0ZbQywTzr_Qq79k4UA_TNrYLJ89Sodabo6NeEElZrTDjxFaWO9f0jHLQE7srEeV-lrUZtUU2NjvSuynE9qMZoUplPaWDRreegTG4LXMkb6Hia-KfHWrKZBaz0kJcP4cKk2VoNxd56O4CW6-gbd5vTfM2P_obFqP8LKRW0cmVEiPUpkRvmOFvrMC4MOWhicHQ_Pa_1MaBk7Xj99FVBZev29f5qPCUuLhFyvoGW7e3AGBvWf6JvMVtFSK6fkGhq08Hca_J0Kf8fg72j8nQp_p43_Oro5Ob4-OsW2TgYWsOHLMSeUJAlLAq6Y6rGEAumMg8Tr-X1Xej3OoQU8hcYsUMDogLHTUFAm_DjkHhd9uoE62SSTm8jhiaSKu6F0lQAirUKqGHcDP-FUwYdLthCtJiMSNom8rmXyGH0GxRbC9VVTk0TlL-ezap4jSwQNwYtAeD69cvuLd9pBPxrJ3kWdfFbIPbQonvN0Ptu3kvMGkj2A5g |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Multi-stage+stochastic+gradient+method+with+momentum+acceleration&rft.jtitle=Signal+processing&rft.au=Luo%2C+Zhijian&rft.au=Chen%2C+Siyu&rft.au=Qian%2C+Yuntao&rft.au=Hou%2C+Yueen&rft.date=2021-11-01&rft.issn=0165-1684&rft.volume=188&rft.spage=108201&rft_id=info:doi/10.1016%2Fj.sigpro.2021.108201&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_sigpro_2021_108201 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0165-1684&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0165-1684&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0165-1684&client=summon |