Multi-stage stochastic gradient method with momentum acceleration

•Stage-wise optimization and momentum have been widely employed to accelerate SGD.•Negative momentum provides acceleration and stabilization on stochastic first-order methods.•Negative momentum extends Nesterovs momentum to the stage-wise optimization.•Gradient correction avoids the oscillations and...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Signal processing Jg. 188; S. 108201
Hauptverfasser: Luo, Zhijian, Chen, Siyu, Qian, Yuntao, Hou, Yueen
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier B.V 01.11.2021
Schlagworte:
ISSN:0165-1684, 1872-7557
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract •Stage-wise optimization and momentum have been widely employed to accelerate SGD.•Negative momentum provides acceleration and stabilization on stochastic first-order methods.•Negative momentum extends Nesterovs momentum to the stage-wise optimization.•Gradient correction avoids the oscillations and make stochastic gradient more effective and tolerant. Multi-stage optimization which invokes a stochastic algorithm restarting with the returned solution of previous stage, has been widely employed in stochastic optimization. Momentum acceleration technique is famously known for building gradient-based algorithms with fast convergence in large-scale optimization. In order to take the advantage of this acceleration in multi-stage stochastic optimization, we develop a multi-stage stochastic gradient descent with momentum acceleration method, named MAGNET, for first-order stochastic convex optimization. The main ingredient is the employment of a negative momentum, which extends the Nesterov’s momentum to the multi-stage optimization. It can be incorporated in a stochastic gradient-based algorithm in multi-stage mechanism and provide acceleration. The proposed algorithm obtains an accelerated rate of convergence, and is adaptive and free from hyper-parameter tuning. The experimental results demonstrate that our algorithm is competitive with some state-of-the-art methods for solving several typical optimization problems in machine learning.
AbstractList •Stage-wise optimization and momentum have been widely employed to accelerate SGD.•Negative momentum provides acceleration and stabilization on stochastic first-order methods.•Negative momentum extends Nesterovs momentum to the stage-wise optimization.•Gradient correction avoids the oscillations and make stochastic gradient more effective and tolerant. Multi-stage optimization which invokes a stochastic algorithm restarting with the returned solution of previous stage, has been widely employed in stochastic optimization. Momentum acceleration technique is famously known for building gradient-based algorithms with fast convergence in large-scale optimization. In order to take the advantage of this acceleration in multi-stage stochastic optimization, we develop a multi-stage stochastic gradient descent with momentum acceleration method, named MAGNET, for first-order stochastic convex optimization. The main ingredient is the employment of a negative momentum, which extends the Nesterov’s momentum to the multi-stage optimization. It can be incorporated in a stochastic gradient-based algorithm in multi-stage mechanism and provide acceleration. The proposed algorithm obtains an accelerated rate of convergence, and is adaptive and free from hyper-parameter tuning. The experimental results demonstrate that our algorithm is competitive with some state-of-the-art methods for solving several typical optimization problems in machine learning.
ArticleNumber 108201
Author Luo, Zhijian
Qian, Yuntao
Chen, Siyu
Hou, Yueen
Author_xml – sequence: 1
  givenname: Zhijian
  surname: Luo
  fullname: Luo, Zhijian
  email: luozhijian@zju.edu.cn
  organization: Institute of Artificial Intelligence, College of Computer Science and Technology Zhejiang University, Hangzhou, 310027, China
– sequence: 2
  givenname: Siyu
  surname: Chen
  fullname: Chen, Siyu
  email: sychen@zju.edu.cn
  organization: Institute of Artificial Intelligence, College of Computer Science and Technology Zhejiang University, Hangzhou, 310027, China
– sequence: 3
  givenname: Yuntao
  surname: Qian
  fullname: Qian, Yuntao
  email: ytqian@zu.edu.cn
  organization: Institute of Artificial Intelligence, College of Computer Science and Technology Zhejiang University, Hangzhou, 310027, China
– sequence: 4
  givenname: Yueen
  surname: Hou
  fullname: Hou, Yueen
  email: houyueen@jyu.edu.cn
  organization: School of Computer, Jiaying University, Meizhou, 514015, China
BookMark eNqFkM9KAzEQh4NUsK2-gYd9ga35002yHoRStAoVL3oO0yTbpuxuSpIqvr2p68mDnmYY5vsx803QqPe9Reia4BnBhN_sZ9FtD8HPKKYkjyTF5AyNiRS0FFUlRmic16qScDm_QJMY9xhjwjgeo8XzsU2ujAm2tojJ6x3E5HSxDWCc7VPR2bTzpvhwaVd0vsujY1eA1ra1AZLz_SU6b6CN9uqnTtHbw_3r8rFcv6yelot1qRnmqQTCiDHCSGhEg4VhrK430lSY07mtMEDuGKdsI2RDiawZZbVmQvNNDRVoyqbodsjVwccYbKO0S98XpACuVQSrkwy1V4MMdZKhBhkZnv-CD8F1ED7_w-4GzObH3p0NKupsRVvjgtVJGe_-DvgCiB19tg
CitedBy_id crossref_primary_10_1109_TSC_2022_3177316
crossref_primary_10_1016_j_ins_2023_119546
crossref_primary_10_1002_rnc_6479
crossref_primary_10_3390_app15158261
crossref_primary_10_1007_s10957_022_02157_1
crossref_primary_10_1016_j_asoc_2023_110174
Cites_doi 10.3389/fams.2017.00009
10.1561/9781601988614
10.1137/110848864
10.1214/aoms/1177729586
10.1137/130942954
10.1214/18-EJS1395
10.1137/110848876
10.1016/0041-5553(64)90137-5
10.1007/s10107-010-0434-y
ContentType Journal Article
Copyright 2021 Elsevier B.V.
Copyright_xml – notice: 2021 Elsevier B.V.
DBID AAYXX
CITATION
DOI 10.1016/j.sigpro.2021.108201
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1872-7557
ExternalDocumentID 10_1016_j_sigpro_2021_108201
S0165168421002395
GroupedDBID --K
--M
-~X
.DC
.~1
0R~
123
1B1
1~.
1~5
4.4
457
4G.
53G
5VS
7-5
71M
8P~
9JN
AABNK
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
ABBOA
ABFNM
ABFRF
ABMAC
ABXDB
ABYKQ
ACDAQ
ACGFO
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADJOM
ADMUD
ADTZH
AEBSH
AECPX
AEFWE
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F0J
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
G8K
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
IHE
J1W
JJJVA
KOM
LG9
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
ROL
RPZ
SBC
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SST
SSV
SSZ
T5K
TAE
TN5
WUQ
XPP
ZMT
~02
~G-
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABJNI
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
ID FETCH-LOGICAL-c306t-a131dd7d8af7f07d3399b8d50624e50aa5063623b78f21893239c37c6b9a5ac23
ISICitedReferencesCount 5
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000684282300013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0165-1684
IngestDate Tue Nov 18 21:53:58 EST 2025
Sat Nov 29 07:21:35 EST 2025
Fri Feb 23 02:46:54 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Stochastic gradient descent
Convex optimization
Multi-stage
Momentum acceleration
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c306t-a131dd7d8af7f07d3399b8d50624e50aa5063623b78f21893239c37c6b9a5ac23
ParticipantIDs crossref_citationtrail_10_1016_j_sigpro_2021_108201
crossref_primary_10_1016_j_sigpro_2021_108201
elsevier_sciencedirect_doi_10_1016_j_sigpro_2021_108201
PublicationCentury 2000
PublicationDate November 2021
2021-11-00
PublicationDateYYYYMMDD 2021-11-01
PublicationDate_xml – month: 11
  year: 2021
  text: November 2021
PublicationDecade 2020
PublicationTitle Signal processing
PublicationYear 2021
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Robbins, Monro (bib0001) 1951; 22
Gadat, Panloup, Saadane (bib0007) 2018; 12
Hazan, Kale (bib0018) 2014; 15
Aybat, Fallah, Gurbuzbalaban, Ozdaglar (bib0014) 2019; vol. 32
Nesterov (bib0010) 1983; 27
Ochs, Chen, Brox, Pock (bib0013) 2014; 7
Zhou, Shang, Cheng (bib0031) 2018
Rakhlin, Shamir, Sridharan (bib0017) 2012
Krizhevsky, Sutskever, Hinton (bib0038) 2012
Chen, Xu, Chen, Yang (bib0020) 2018
Allen-Zhu, Yuan (bib0037) 2016
Polyak (bib0009) 1964; 4
N.B. Kovachki, A.M. Stuart, Analysis of momentum methods
15 (2014).
Ghadimi, Lan (bib0033) 2012; 22
T. Yang, Q. Lin, Z. Li, Unified convergence analysis of stochastic momentum methods for convex and non-convex optimization
S. Bubeck, Theory of convex optimization for machine learning
Duchi, Hazan, Singer (bib0025) 2011; 12
Liu, Zhang, Zhang, Rong, Yang (bib0021) 2018
Johnson, Zhang (bib0026) 2013
Ghadimi, Lan (bib0034) 2013; 23
S. Ruder, An overview of gradient descent optimization algorithms
Lan (bib0036) 2012; 133
Yuan, Yan, Jin, Yang (bib0024) 2019; vol. 32
Defazio, Bach, Lacostejulien (bib0027) 2014
Hien, Nguyen, Xu, Lu, Feng (bib0030) 2019
Goodfellow, Bengio, Courville (bib0011) 2016
Yang, Lin (bib0019) 2018; 19
Agarwal, Wainwright, Bartlett, Ravikumar (bib0003) 2009
Murata, Suzuki (bib0029) 2017
Allen-Zhu (bib0016) 2017; 18
(2014).
Darken, Moody (bib0039) 1992
Sutskever, Martens, Dahl, Hinton (bib0012) 2013
Kidambi, Netrapalli, Jain, Kakade (bib0015) 2018
N. Loizou, P. Richtárik, Momentum and stochastic momentum for stochastic gradient, newton, proximal point and subspace descent methods
Xu, Lin, Yang (bib0022) 2017
(2019).
(2017).
Nitanda (bib0028) 2014
(2016).
Konečnỳ, Richtárik (bib0002) 2017; 3
Chen, Yang, Yi, Zhou, Chen (bib0023) 2018
S. Bubeck, Convex optimization: algorithms and complexity, arXiv
Polyak (10.1016/j.sigpro.2021.108201_bib0009) 1964; 4
Ghadimi (10.1016/j.sigpro.2021.108201_bib0033) 2012; 22
Robbins (10.1016/j.sigpro.2021.108201_bib0001) 1951; 22
Hien (10.1016/j.sigpro.2021.108201_bib0030) 2019
Yang (10.1016/j.sigpro.2021.108201_bib0019) 2018; 19
10.1016/j.sigpro.2021.108201_bib0032
10.1016/j.sigpro.2021.108201_bib0035
Lan (10.1016/j.sigpro.2021.108201_bib0036) 2012; 133
Nesterov (10.1016/j.sigpro.2021.108201_bib0010) 1983; 27
Aybat (10.1016/j.sigpro.2021.108201_bib0014) 2019; vol. 32
Nitanda (10.1016/j.sigpro.2021.108201_bib0028) 2014
Agarwal (10.1016/j.sigpro.2021.108201_bib0003) 2009
Ghadimi (10.1016/j.sigpro.2021.108201_bib0034) 2013; 23
Rakhlin (10.1016/j.sigpro.2021.108201_bib0017) 2012
Allen-Zhu (10.1016/j.sigpro.2021.108201_bib0016) 2017; 18
Defazio (10.1016/j.sigpro.2021.108201_bib0027) 2014
Kidambi (10.1016/j.sigpro.2021.108201_bib0015) 2018
Xu (10.1016/j.sigpro.2021.108201_bib0022) 2017
Darken (10.1016/j.sigpro.2021.108201_bib0039) 1992
Goodfellow (10.1016/j.sigpro.2021.108201_bib0011) 2016
Sutskever (10.1016/j.sigpro.2021.108201_bib0012) 2013
Allen-Zhu (10.1016/j.sigpro.2021.108201_bib0037) 2016
Johnson (10.1016/j.sigpro.2021.108201_bib0026) 2013
10.1016/j.sigpro.2021.108201_bib0004
Murata (10.1016/j.sigpro.2021.108201_bib0029) 2017
Zhou (10.1016/j.sigpro.2021.108201_bib0031) 2018
10.1016/j.sigpro.2021.108201_bib0006
Liu (10.1016/j.sigpro.2021.108201_bib0021) 2018
10.1016/j.sigpro.2021.108201_bib0005
10.1016/j.sigpro.2021.108201_bib0008
Chen (10.1016/j.sigpro.2021.108201_bib0023) 2018
Chen (10.1016/j.sigpro.2021.108201_bib0020) 2018
Duchi (10.1016/j.sigpro.2021.108201_bib0025) 2011; 12
Konečnỳ (10.1016/j.sigpro.2021.108201_bib0002) 2017; 3
Ochs (10.1016/j.sigpro.2021.108201_bib0013) 2014; 7
Yuan (10.1016/j.sigpro.2021.108201_bib0024) 2019; vol. 32
Hazan (10.1016/j.sigpro.2021.108201_bib0018) 2014; 15
Krizhevsky (10.1016/j.sigpro.2021.108201_bib0038) 2012
Gadat (10.1016/j.sigpro.2021.108201_bib0007) 2018; 12
References_xml – volume: 22
  start-page: 400
  year: 1951
  end-page: 407
  ident: bib0001
  article-title: A stochastic approximation method
  publication-title: Ann. Math. Stat.
– reference: N. Loizou, P. Richtárik, Momentum and stochastic momentum for stochastic gradient, newton, proximal point and subspace descent methods,
– volume: 4
  start-page: 1
  year: 1964
  end-page: 17
  ident: bib0009
  article-title: Some methods of speeding up the convergence of iteration methods
  publication-title: USSR Comput. Math. Math. Phys.
– volume: 18
  start-page: 8194
  year: 2017
  end-page: 8244
  ident: bib0016
  article-title: Katyusha: the first direct acceleration of stochastic gradient methods
  publication-title: J. Mach. Learn. Res.
– reference: S. Ruder, An overview of gradient descent optimization algorithms,
– reference: (2014).
– volume: 12
  start-page: 461
  year: 2018
  end-page: 529
  ident: bib0007
  article-title: Stochastic heavy ball
  publication-title: Electron. J. Stat.
– year: 2018
  ident: bib0023
  article-title: Universal stagewise learning for non-convex problems with convergence on averaged solutions
  publication-title: Proc. Int. Conf. Learn. Repr.
– start-page: 1
  year: 2009
  end-page: 9
  ident: bib0003
  article-title: Information-theoretic lower bounds on the oracle complexity of convex optimization
  publication-title: Proc. Neural Info. Process. Syst.
– start-page: 1009
  year: 1992
  end-page: 1016
  ident: bib0039
  article-title: Towards faster stochastic gradient search
  publication-title: Proc. Neural Inf. Process. Syst.
– volume: 7
  start-page: 1388
  year: 2014
  end-page: 1419
  ident: bib0013
  article-title: iPiano: inertial proximal algorithm for nonconvex optimization
  publication-title: SIAM J. Imaging Sci.
– start-page: 5975
  year: 2018
  end-page: 5984
  ident: bib0031
  article-title: A simple stochastic variance reduced algorithm with fast convergence rates
  publication-title: Proc. Int. Conf. Mach. Learn.
– reference: (2019).
– start-page: 1139
  year: 2013
  end-page: 1147
  ident: bib0012
  article-title: On the importance of initialization and momentum in deep learning
  publication-title: Proc. Int. Conf. Mach. Learn.
– volume: 15
  start-page: 2489
  year: 2014
  end-page: 2512
  ident: bib0018
  article-title: Beyond the regret minimization barrier: optimal algorithms for stochastic strongly-convex optimization
  publication-title: J. Mach. Learn. Res.
– start-page: 1
  year: 2018
  end-page: 9
  ident: bib0015
  article-title: On the insufficiency of existing momentum schemes for stochastic optimization
  publication-title: Inf. Theory Appl.
– start-page: 608
  year: 2017
  end-page: 617
  ident: bib0029
  article-title: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization
  publication-title: Proc. Neural Inf. Process. Syst.
– volume: 27
  year: 1983
  ident: bib0010
  article-title: A method for solving the convex programming problem with convergence rate
  publication-title: Soviet Math. Doklady
– volume: vol. 32
  start-page: 8523
  year: 2019
  end-page: 8534
  ident: bib0014
  article-title: A universally optimal multistage accelerated stochastic gradient method
  publication-title: Advances in Neural Information Processing Systems
– reference: T. Yang, Q. Lin, Z. Li, Unified convergence analysis of stochastic momentum methods for convex and non-convex optimization,
– start-page: 315
  year: 2013
  end-page: 323
  ident: bib0026
  article-title: Accelerating stochastic gradient descent using predictive variance reduction
  publication-title: Proc. Neural Inf. Process. Syst.
– reference: S. Bubeck, Theory of convex optimization for machine learning,
– reference: S. Bubeck, Convex optimization: algorithms and complexity, arXiv:
– volume: 22
  start-page: 1469
  year: 2012
  end-page: 1492
  ident: bib0033
  article-title: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: a generic algorithmic framework
  publication-title: SIAM J. Optim.
– reference: (2016).
– start-page: 1097
  year: 2012
  end-page: 1105
  ident: bib0038
  article-title: ImageNet classification with deep convolutional neural networks
  publication-title: Proc. Neural Inf. Process. Syst.
– start-page: 4678
  year: 2018
  end-page: 4689
  ident: bib0021
  article-title: Fast rates of ERM and stochastic approximation: adaptive to error bound conditions
  publication-title: Proc. Neural Inf. Process. Syst.
– volume: 12
  year: 2011
  ident: bib0025
  article-title: Adaptive subgradient methods for online learning and stochastic optimization.
  publication-title: J. Mach. Learn. Res.
– volume: 23
  start-page: 2061
  year: 2013
  end-page: 2089
  ident: bib0034
  article-title: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms
  publication-title: SIAM J. Optim.
– reference: 15 (2014).
– year: 2016
  ident: bib0011
  article-title: Deep Learning
– start-page: 1080
  year: 2016
  end-page: 1089
  ident: bib0037
  article-title: Improved SVRG for non-strongly-convex or sum-of-non-convex objectives
  publication-title: Proc. Int. Conf. Mach. Learn.
– reference: (2017).
– start-page: 1
  year: 2019
  end-page: 26
  ident: bib0030
  article-title: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization
  publication-title: J. Optim. Theory Appl.
– reference: N.B. Kovachki, A.M. Stuart, Analysis of momentum methods,
– start-page: 912
  year: 2018
  end-page: 920
  ident: bib0020
  article-title: SADAGRAD: strongly adaptive stochastic gradient methods
  publication-title: Proc. Int. Conf. Mach. Learn.
– start-page: 1646
  year: 2014
  end-page: 1654
  ident: bib0027
  article-title: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives
– volume: 133
  start-page: 365
  year: 2012
  end-page: 397
  ident: bib0036
  article-title: An optimal method for stochastic composite optimization
  publication-title: Math. Program.
– start-page: 1571
  year: 2012
  end-page: 1578
  ident: bib0017
  article-title: Making gradient descent optimal for strongly convex stochastic optimization
  publication-title: Proc. Int. Conf. Mach. Learn.
– start-page: 1574
  year: 2014
  end-page: 1582
  ident: bib0028
  article-title: Stochastic proximal gradient descent with acceleration techniques
  publication-title: Proc. Neural Inf. Process. Syst.
– volume: vol. 32
  year: 2019
  ident: bib0024
  article-title: Stagewise training accelerates convergence of testing error over SGD
  publication-title: Advances in Neural Information Processing Systems
– volume: 19
  start-page: 236
  year: 2018
  end-page: 268
  ident: bib0019
  article-title: RSG: beating subgradient method without smoothness and strong convexity
  publication-title: J. Mach. Learn. Res.
– start-page: 3821
  year: 2017
  end-page: 3830
  ident: bib0022
  article-title: Stochastic convex optimization: faster local growth implies faster global convergence
  publication-title: Proc. Int. Conf. Mach. Learn.
– volume: 3
  start-page: 9
  year: 2017
  ident: bib0002
  article-title: Semi-stochastic gradient descent methods
  publication-title: Front. Appl. Math. Stat
– start-page: 1
  year: 2009
  ident: 10.1016/j.sigpro.2021.108201_bib0003
  article-title: Information-theoretic lower bounds on the oracle complexity of convex optimization
– start-page: 1646
  year: 2014
  ident: 10.1016/j.sigpro.2021.108201_bib0027
– start-page: 1
  year: 2019
  ident: 10.1016/j.sigpro.2021.108201_bib0030
  article-title: Accelerated randomized mirror descent algorithms for composite non-strongly convex optimization
  publication-title: J. Optim. Theory Appl.
– start-page: 1097
  year: 2012
  ident: 10.1016/j.sigpro.2021.108201_bib0038
  article-title: ImageNet classification with deep convolutional neural networks
– volume: 27
  issue: 3
  year: 1983
  ident: 10.1016/j.sigpro.2021.108201_bib0010
  article-title: A method for solving the convex programming problem with convergence rate O(1/k2)
  publication-title: Soviet Math. Doklady
– start-page: 5975
  year: 2018
  ident: 10.1016/j.sigpro.2021.108201_bib0031
  article-title: A simple stochastic variance reduced algorithm with fast convergence rates
– volume: 3
  start-page: 9
  year: 2017
  ident: 10.1016/j.sigpro.2021.108201_bib0002
  article-title: Semi-stochastic gradient descent methods
  publication-title: Front. Appl. Math. Stat
  doi: 10.3389/fams.2017.00009
– ident: 10.1016/j.sigpro.2021.108201_bib0032
  doi: 10.1561/9781601988614
– start-page: 1
  year: 2018
  ident: 10.1016/j.sigpro.2021.108201_bib0015
  article-title: On the insufficiency of existing momentum schemes for stochastic optimization
– start-page: 1139
  year: 2013
  ident: 10.1016/j.sigpro.2021.108201_bib0012
  article-title: On the importance of initialization and momentum in deep learning
– start-page: 1571
  year: 2012
  ident: 10.1016/j.sigpro.2021.108201_bib0017
  article-title: Making gradient descent optimal for strongly convex stochastic optimization
– start-page: 912
  year: 2018
  ident: 10.1016/j.sigpro.2021.108201_bib0020
  article-title: SADAGRAD: strongly adaptive stochastic gradient methods
– volume: 22
  start-page: 1469
  issue: 4
  year: 2012
  ident: 10.1016/j.sigpro.2021.108201_bib0033
  article-title: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: a generic algorithmic framework
  publication-title: SIAM J. Optim.
  doi: 10.1137/110848864
– volume: 22
  start-page: 400
  issue: 3
  year: 1951
  ident: 10.1016/j.sigpro.2021.108201_bib0001
  article-title: A stochastic approximation method
  publication-title: Ann. Math. Stat.
  doi: 10.1214/aoms/1177729586
– volume: 7
  start-page: 1388
  issue: 2
  year: 2014
  ident: 10.1016/j.sigpro.2021.108201_bib0013
  article-title: iPiano: inertial proximal algorithm for nonconvex optimization
  publication-title: SIAM J. Imaging Sci.
  doi: 10.1137/130942954
– ident: 10.1016/j.sigpro.2021.108201_bib0005
– volume: 15
  start-page: 2489
  issue: 1
  year: 2014
  ident: 10.1016/j.sigpro.2021.108201_bib0018
  article-title: Beyond the regret minimization barrier: optimal algorithms for stochastic strongly-convex optimization
  publication-title: J. Mach. Learn. Res.
– start-page: 4678
  year: 2018
  ident: 10.1016/j.sigpro.2021.108201_bib0021
  article-title: Fast rates of ERM and stochastic approximation: adaptive to error bound conditions
– ident: 10.1016/j.sigpro.2021.108201_bib0035
– volume: 12
  start-page: 461
  issue: 1
  year: 2018
  ident: 10.1016/j.sigpro.2021.108201_bib0007
  article-title: Stochastic heavy ball
  publication-title: Electron. J. Stat.
  doi: 10.1214/18-EJS1395
– volume: 23
  start-page: 2061
  issue: 4
  year: 2013
  ident: 10.1016/j.sigpro.2021.108201_bib0034
  article-title: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms
  publication-title: SIAM J. Optim.
  doi: 10.1137/110848876
– year: 2016
  ident: 10.1016/j.sigpro.2021.108201_bib0011
– start-page: 1009
  year: 1992
  ident: 10.1016/j.sigpro.2021.108201_bib0039
  article-title: Towards faster stochastic gradient search
– start-page: 315
  year: 2013
  ident: 10.1016/j.sigpro.2021.108201_bib0026
  article-title: Accelerating stochastic gradient descent using predictive variance reduction
– volume: 4
  start-page: 1
  issue: 5
  year: 1964
  ident: 10.1016/j.sigpro.2021.108201_bib0009
  article-title: Some methods of speeding up the convergence of iteration methods
  publication-title: USSR Comput. Math. Math. Phys.
  doi: 10.1016/0041-5553(64)90137-5
– year: 2018
  ident: 10.1016/j.sigpro.2021.108201_bib0023
  article-title: Universal stagewise learning for non-convex problems with convergence on averaged solutions
– volume: vol. 32
  start-page: 8523
  year: 2019
  ident: 10.1016/j.sigpro.2021.108201_bib0014
  article-title: A universally optimal multistage accelerated stochastic gradient method
– volume: 18
  start-page: 8194
  issue: 1
  year: 2017
  ident: 10.1016/j.sigpro.2021.108201_bib0016
  article-title: Katyusha: the first direct acceleration of stochastic gradient methods
  publication-title: J. Mach. Learn. Res.
– start-page: 608
  year: 2017
  ident: 10.1016/j.sigpro.2021.108201_bib0029
  article-title: Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization
– start-page: 3821
  year: 2017
  ident: 10.1016/j.sigpro.2021.108201_bib0022
  article-title: Stochastic convex optimization: faster local growth implies faster global convergence
– volume: vol. 32
  year: 2019
  ident: 10.1016/j.sigpro.2021.108201_bib0024
  article-title: Stagewise training accelerates convergence of testing error over SGD
– volume: 12
  issue: 7
  year: 2011
  ident: 10.1016/j.sigpro.2021.108201_bib0025
  article-title: Adaptive subgradient methods for online learning and stochastic optimization.
  publication-title: J. Mach. Learn. Res.
– start-page: 1080
  year: 2016
  ident: 10.1016/j.sigpro.2021.108201_bib0037
  article-title: Improved SVRG for non-strongly-convex or sum-of-non-convex objectives
– ident: 10.1016/j.sigpro.2021.108201_bib0004
– start-page: 1574
  year: 2014
  ident: 10.1016/j.sigpro.2021.108201_bib0028
  article-title: Stochastic proximal gradient descent with acceleration techniques
– ident: 10.1016/j.sigpro.2021.108201_bib0008
– volume: 133
  start-page: 365
  issue: 1
  year: 2012
  ident: 10.1016/j.sigpro.2021.108201_bib0036
  article-title: An optimal method for stochastic composite optimization
  publication-title: Math. Program.
  doi: 10.1007/s10107-010-0434-y
– ident: 10.1016/j.sigpro.2021.108201_bib0006
– volume: 19
  start-page: 236
  issue: 1
  year: 2018
  ident: 10.1016/j.sigpro.2021.108201_bib0019
  article-title: RSG: beating subgradient method without smoothness and strong convexity
  publication-title: J. Mach. Learn. Res.
SSID ssj0001360
Score 2.3898318
Snippet •Stage-wise optimization and momentum have been widely employed to accelerate SGD.•Negative momentum provides acceleration and stabilization on stochastic...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 108201
SubjectTerms Convex optimization
Momentum acceleration
Multi-stage
Stochastic gradient descent
Title Multi-stage stochastic gradient method with momentum acceleration
URI https://dx.doi.org/10.1016/j.sigpro.2021.108201
Volume 188
WOSCitedRecordID wos000684282300013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1872-7557
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001360
  issn: 0165-1684
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1JT-MwFLZmCofhgBgWsSsHbshVXSdxcqwQCBBCI7GocIkcxy6pIK3aBMG_5zl2lhkQMxzmEllW4iT-Xvw-v7wFoYMYdLYEqouJYBS7RPRwDCsk5lLFzBOwHSv_mN5esMvLYDgMf9lyqfOynADLsuDlJZz-V6ihD8DWobNfgLseFDqgDaDDEWCH4z8BX4bUYiB9I508diIeuE7FfDialc5dua0ZbQywTzr_Qq79k4UA_TNrYLJ89Sodabo6NeEElZrTDjxFaWO9f0jHLQE7srEeV-lrUZtUU2NjvSuynE9qMZoUplPaWDRreegTG4LXMkb6Hia-KfHWrKZBaz0kJcP4cKk2VoNxd56O4CW6-gbd5vTfM2P_obFqP8LKRW0cmVEiPUpkRvmOFvrMC4MOWhicHQ_Pa_1MaBk7Xj99FVBZev29f5qPCUuLhFyvoGW7e3AGBvWf6JvMVtFSK6fkGhq08Hca_J0Kf8fg72j8nQp_p43_Oro5Ob4-OsW2TgYWsOHLMSeUJAlLAq6Y6rGEAumMg8Tr-X1Xej3OoQU8hcYsUMDogLHTUFAm_DjkHhd9uoE62SSTm8jhiaSKu6F0lQAirUKqGHcDP-FUwYdLthCtJiMSNom8rmXyGH0GxRbC9VVTk0TlL-ezap4jSwQNwYtAeD69cvuLd9pBPxrJ3kWdfFbIPbQonvN0Ptu3kvMGkj2A5g
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Multi-stage+stochastic+gradient+method+with+momentum+acceleration&rft.jtitle=Signal+processing&rft.au=Luo%2C+Zhijian&rft.au=Chen%2C+Siyu&rft.au=Qian%2C+Yuntao&rft.au=Hou%2C+Yueen&rft.date=2021-11-01&rft.issn=0165-1684&rft.volume=188&rft.spage=108201&rft_id=info:doi/10.1016%2Fj.sigpro.2021.108201&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_sigpro_2021_108201
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0165-1684&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0165-1684&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0165-1684&client=summon