SARAH-M: A fast stochastic recursive gradient descent algorithm via momentum

As a simple but effective way, the momentum method has been widely adopted in stochastic optimization algorithms for large-scale machine learning problems and the success of stochastic optimization with the momentum term for many applications in machine learning and other related areas has been repo...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications Vol. 238; p. 122295
Main Author: Yang, Zhuang
Format: Journal Article
Language:English
Published: Elsevier Ltd 15.03.2024
Subjects:
ISSN:0957-4174
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract As a simple but effective way, the momentum method has been widely adopted in stochastic optimization algorithms for large-scale machine learning problems and the success of stochastic optimization with the momentum term for many applications in machine learning and other related areas has been reported everywhere. However, the understanding of how the momentum improves the performance of modern variance reduced stochastic gradient algorithms, e.g., the stochastic dual coordinate ascent average gradient (SDCA) method, the stochastically controlled stochastic gradient (SCSG) method, the stochastic recursive gradient algorithm (SARAH), etc., is still limited. To tackle this issue, this work studies the performance of SARAH with the momentum term theoretically and empirically, and develops a novel variance reduced stochastic gradient algorithm, termed as SARAH-M. We rigorously prove that SARAH-M attains a linear rate of convergence for minimizing the strongly convex function. We further propose an adaptive SARAH-M method (abbreviated as AdaSARAH-M) by incorporating the random Barzilai–Borwein (RBB) technique into SARAH-M, which provides an easy way to determine the step size for the original SARAH-M algorithm. The theoretical analysis that shows AdaSARAH-M with a linear convergence speed is also provided. Moreover, we show that the complexity of the proposed algorithms can outperform modern stochastic optimization algorithms. Finally, the numerical results, compared with state-of-the-art algorithms on benchmarking machine learning problems, verify the efficacy of the momentum in variance reduced stochastic gradient algorithms. •The efficacy of the variance reduced method with momentum is verified.•An adaptive variance reduced method with momentum is proposed.•The convergence properties of the proposed methods are provided.•Experimental results show great promise in standard machine learning tasks.
AbstractList As a simple but effective way, the momentum method has been widely adopted in stochastic optimization algorithms for large-scale machine learning problems and the success of stochastic optimization with the momentum term for many applications in machine learning and other related areas has been reported everywhere. However, the understanding of how the momentum improves the performance of modern variance reduced stochastic gradient algorithms, e.g., the stochastic dual coordinate ascent average gradient (SDCA) method, the stochastically controlled stochastic gradient (SCSG) method, the stochastic recursive gradient algorithm (SARAH), etc., is still limited. To tackle this issue, this work studies the performance of SARAH with the momentum term theoretically and empirically, and develops a novel variance reduced stochastic gradient algorithm, termed as SARAH-M. We rigorously prove that SARAH-M attains a linear rate of convergence for minimizing the strongly convex function. We further propose an adaptive SARAH-M method (abbreviated as AdaSARAH-M) by incorporating the random Barzilai–Borwein (RBB) technique into SARAH-M, which provides an easy way to determine the step size for the original SARAH-M algorithm. The theoretical analysis that shows AdaSARAH-M with a linear convergence speed is also provided. Moreover, we show that the complexity of the proposed algorithms can outperform modern stochastic optimization algorithms. Finally, the numerical results, compared with state-of-the-art algorithms on benchmarking machine learning problems, verify the efficacy of the momentum in variance reduced stochastic gradient algorithms. •The efficacy of the variance reduced method with momentum is verified.•An adaptive variance reduced method with momentum is proposed.•The convergence properties of the proposed methods are provided.•Experimental results show great promise in standard machine learning tasks.
ArticleNumber 122295
Author Yang, Zhuang
Author_xml – sequence: 1
  givenname: Zhuang
  orcidid: 0000-0002-8374-2928
  surname: Yang
  fullname: Yang, Zhuang
  email: zhuangyng@163.com
  organization: School of Computer Science and Technology, Soochow University, Suzhou, 215006, China
BookMark eNp9kMtKAzEUQLOoYKv-gKv8wIx5TGYacTMUtUJF8LEOMbnTpnQmkqQV_94MdeWiq3O5cC7cM0OTwQ-A0DUlJSW0vtmWEL91yQjjJWWMSTFBUyJFU1S0qc7RLMYtIbQhpJmi1Vv72i6L51vc4k7HhGPyZpMHZ3AAsw_RHQCvg7YOhoQtRDNS79Y-uLTp8cFp3Ps-L_f9JTrr9C7C1R8v0MfD_ftiWaxeHp8W7aownJBUSE5EV3fENg1YqGwlpJVWzK1mwERHBeMVt7U1UugaPivZZHINsia6prXlF2h-vGuCjzFAp4xLOjk_pKDdTlGixhJqq8YSaiyhjiWyyv6pX8H1Ovyclu6OEuSnDg6CiibnMGBdbpSU9e6U_gvS73vk
CitedBy_id crossref_primary_10_1007_s13042_024_02514_8
crossref_primary_10_1109_TCSS_2024_3411630
crossref_primary_10_1007_s13042_024_02524_6
crossref_primary_10_1016_j_asoc_2025_113073
Cites_doi 10.1007/s11075-015-0078-3
10.24963/ijcai.2019/422
10.1137/140961791
10.1109/TIP.2021.3123555
10.1016/j.knosys.2020.105941
10.1016/j.ins.2020.12.075
10.1007/s10107-018-1319-8
10.1137/110831659
10.1109/JAS.2022.105923
10.1007/s10589-020-00220-z
10.24963/ijcai.2019/556
10.1137/0330046
10.1137/21M1453311
10.1145/1961189.1961199
10.1109/MSP.2020.2974267
10.1016/j.neunet.2012.09.014
10.1109/TAC.1983.1103184
10.1137/16M1080677
10.1007/s10589-022-00375-x
10.1137/16M1080173
10.1007/s10957-022-02132-w
10.1109/TKDE.2016.2604302
10.1137/130919398
10.1145/2623330.2623612
ContentType Journal Article
Copyright 2023 Elsevier Ltd
Copyright_xml – notice: 2023 Elsevier Ltd
DBID AAYXX
CITATION
DOI 10.1016/j.eswa.2023.122295
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
ExternalDocumentID 10_1016_j_eswa_2023_122295
S0957417423027975
GroupedDBID --K
--M
.DC
.~1
0R~
13V
1B1
1RT
1~.
1~5
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
9JO
AAAKF
AABNK
AACTN
AAEDT
AAEDW
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AARIN
AAXKI
AAXUO
AAYFN
ABBOA
ABFNM
ABMAC
ABMVD
ABUCO
ACDAQ
ACGFS
ACHRH
ACNTT
ACRLP
ACZNC
ADBBV
ADEZE
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFJKZ
AFKWA
AFTJW
AGHFR
AGUBO
AGUMN
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJOXV
AKRWK
ALEQD
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
APLSM
AXJTR
BJAXD
BKOJK
BLXMC
BNSAS
CS3
DU5
EBS
EFJIC
EO8
EO9
EP2
EP3
F5P
FDB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
HAMUX
IHE
J1W
JJJVA
KOM
LG9
LY1
LY7
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
PQQKQ
Q38
RIG
ROL
RPZ
SDF
SDG
SDP
SDS
SES
SEW
SPC
SPCBC
SSB
SSD
SSL
SST
SSV
SSZ
T5K
TN5
~G-
29G
9DU
AAAKG
AAQXK
AATTM
AAYWO
AAYXX
ABJNI
ABKBG
ABUFD
ABWVN
ABXDB
ACLOT
ACNNM
ACRPL
ACVFH
ADCNI
ADJOM
ADMUD
ADNMO
AEIPS
AEUPX
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKYEP
ANKPU
APXCP
ASPBG
AVWKF
AZFZN
CITATION
EFKBS
EFLBG
EJD
FEDTE
FGOYB
G-2
HLZ
HVGLF
HZ~
R2-
SBC
SET
WUQ
XPP
ZMT
~HD
ID FETCH-LOGICAL-c300t-9305f6f0d77ede4d459d9d58da2e25f152343d6dc95a6eb4975a63ae960a616d3
ISICitedReferencesCount 6
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001105380600001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0957-4174
IngestDate Tue Nov 18 21:47:17 EST 2025
Sat Nov 29 06:15:33 EST 2025
Sat Oct 19 15:54:24 EDT 2024
IsPeerReviewed true
IsScholarly true
Keywords Adaptive step size
Momentum
Stochastic optimization
Machine learning
Variance reduction
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c300t-9305f6f0d77ede4d459d9d58da2e25f152343d6dc95a6eb4975a63ae960a616d3
ORCID 0000-0002-8374-2928
ParticipantIDs crossref_citationtrail_10_1016_j_eswa_2023_122295
crossref_primary_10_1016_j_eswa_2023_122295
elsevier_sciencedirect_doi_10_1016_j_eswa_2023_122295
PublicationCentury 2000
PublicationDate 2024-03-15
PublicationDateYYYYMMDD 2024-03-15
PublicationDate_xml – month: 03
  year: 2024
  text: 2024-03-15
  day: 15
PublicationDecade 2020
PublicationTitle Expert systems with applications
PublicationYear 2024
Publisher Elsevier Ltd
Publisher_xml – name: Elsevier Ltd
References (pp. 15236–15245).
Nguyen, van Dijk, Phan, Nguyen, Weng, Kalagnanam (b27) 2022; 82
Reddi, S. J., Kale, S., & Kumar, S. (2018). On the convergence of adam and beyond. In
Toth, Ellis, Evans, Hamilton, Kelley, Pawlowski, Slattery (b46) 2017; 39
Nguyen, Nguyen, Richtárik, Scheinberg, Takáč, van Dijk (b30) 2019; 20
Li, M., Zhang, T., Chen, Y., & Smola, A. J. (2014). Efficient mini-batch training for stochastic optimization. In
Amari (b2) 2013; 37
Nguyen, Liu, Scheinberg, Takáč (b28) 2017
Ruszczynski, Syski (b37) 1983; 28
Spall (b45) 2012
Lan (b17) 2020
Xu, Xu (b55) 2023; 196
Loizou, Richtárik (b21) 2020; 77
Mu, Liu, Liu, Fan (b24) 2016; 29
Wang, Nguyen, Sun, Baraniuk, Osher (b50) 2022; 15
Ma, J., & Yarats, D. (2018). Quasi-hyperbolic momentum and adam for deep learning. In
Chang, Lin (b4) 2011; 2
Duchi, Bartlett, Wainwright (b8) 2012; 22
Cutkosky, A., & Orabona, F. (2019). Momentum-based variance reduction in non-convex sgd. In
Schmidt, M., Babanezhad, R., Ahmed, M. O., Defazio, A., Clifton, A., & Sarkar, A. (2015). Non-uniform stochastic average gradient method for training conditional random fields. In
Shalev-Shwartz, Zhang (b42) 2013; 14
Toth, Kelley (b47) 2015; 53
Wei, Bao, Liu (b51) 2021; 34
Yang, Chen, Wang (b57) 2020; 198
Liu, Y., Shang, F., & Jiao, L. (2019). Accelerated incremental gradient descent using momentum acceleration with scaling factor. In
.
Tran, Nguyen, Tran-Dinh (b48) 2021
Allen-Zhu (b1) 2017; 18
Hou, Zeng, Wang, Chen (b13) 2022; 10
Xu, Y., Yuan, Z., Yang, R., & Yang, T. (2019). On the convergence of (stochastic) gradient descent with extrapolation for non-convex minimization. In
(pp. 4003–4009).
Gitman, Lang, Zhang, Xiao (b11) 2019; 32
Zhou, K., Ding, Q., Shang, F., Cheng, J., Li, D., & Luo, Z. (2019). Direct acceleration of SAGA using sampled negative momentum. In
Higham, Strabić (b12) 2016; 72
Lei, Jordan (b18) 2017
Csiba, Richtárik (b5) 2018; 19
Polyak, Juditsky (b33) 1992; 30
Wang, Ji, Zhou, Tarokh (b49) 2019; 32
Bottou, Curtis, Nocedal (b3) 2018; 60
Nitanda (b31) 2014
(pp. 1602–1610).
Roux, Schmidt, Bach (b35) 2012
Scieur, d’Aspremont, Bach (b40) 2020; 179
Defazio, Bach, Lacoste-Julien (b7) 2014
Sebbouh, Gower, Defazio (b41) 2021
(pp. 661–670).
(pp. 3045–3051).
Kingma, Ba (b15) 2015
Kovalev, Horváth, Richtárik (b16) 2020
Xiao, Zhang (b52) 2014; 24
Neu, Rosasco (b26) 2018
Fang, Li, Lin, Zhang (b9) 2018
Yasuda, Mahboubi, Indrapriyadarsini, Ninomiya, Asai (b59) 2019
Polyak (b32) 1987
Roux, Schmidt, Bach (b36) 2012
Mou, Li, Wainwright, Bartlett, Jordan (b23) 2020
Nguyen, Liu, Scheinberg, Takáč (b29) 2017
Sidi (b44) 2003
Johnson, Zhang (b14) 2013
Yang, Chen, Wang (b58) 2021; 558
Nesterov (b25) 2004
Xin, Kar, Khan (b54) 2020; 37
Gidel, G., Berard, H., Vignoud, G., Vincent, P., & Lacoste-Julien, S. (2018). A variational inequality perspective on generative adversarial networks. In
Scieur, Bach, d’Aspremont (b39) 2017; 30
Xie, Ma, Xue, Sun, Zheng, Guo (b53) 2021; 30
Shang, Jiao, Zhou, Cheng, Ren, Jin (b43) 2018
Neu (10.1016/j.eswa.2023.122295_b26) 2018
Allen-Zhu (10.1016/j.eswa.2023.122295_b1) 2017; 18
Nesterov (10.1016/j.eswa.2023.122295_b25) 2004
Higham (10.1016/j.eswa.2023.122295_b12) 2016; 72
Nguyen (10.1016/j.eswa.2023.122295_b27) 2022; 82
Sebbouh (10.1016/j.eswa.2023.122295_b41) 2021
Lei (10.1016/j.eswa.2023.122295_b18) 2017
10.1016/j.eswa.2023.122295_b34
Duchi (10.1016/j.eswa.2023.122295_b8) 2012; 22
10.1016/j.eswa.2023.122295_b38
Xin (10.1016/j.eswa.2023.122295_b54) 2020; 37
Nitanda (10.1016/j.eswa.2023.122295_b31) 2014
Nguyen (10.1016/j.eswa.2023.122295_b28) 2017
Nguyen (10.1016/j.eswa.2023.122295_b30) 2019; 20
Kingma (10.1016/j.eswa.2023.122295_b15) 2015
10.1016/j.eswa.2023.122295_b19
Loizou (10.1016/j.eswa.2023.122295_b21) 2020; 77
Xie (10.1016/j.eswa.2023.122295_b53) 2021; 30
Mu (10.1016/j.eswa.2023.122295_b24) 2016; 29
Toth (10.1016/j.eswa.2023.122295_b46) 2017; 39
10.1016/j.eswa.2023.122295_b22
10.1016/j.eswa.2023.122295_b6
Lan (10.1016/j.eswa.2023.122295_b17) 2020
Spall (10.1016/j.eswa.2023.122295_b45) 2012
10.1016/j.eswa.2023.122295_b60
Fang (10.1016/j.eswa.2023.122295_b9) 2018
Polyak (10.1016/j.eswa.2023.122295_b33) 1992; 30
10.1016/j.eswa.2023.122295_b20
Kovalev (10.1016/j.eswa.2023.122295_b16) 2020
Nguyen (10.1016/j.eswa.2023.122295_b29) 2017
Shang (10.1016/j.eswa.2023.122295_b43) 2018
Roux (10.1016/j.eswa.2023.122295_b36) 2012
Gitman (10.1016/j.eswa.2023.122295_b11) 2019; 32
Johnson (10.1016/j.eswa.2023.122295_b14) 2013
Bottou (10.1016/j.eswa.2023.122295_b3) 2018; 60
Scieur (10.1016/j.eswa.2023.122295_b40) 2020; 179
10.1016/j.eswa.2023.122295_b10
10.1016/j.eswa.2023.122295_b56
Sidi (10.1016/j.eswa.2023.122295_b44) 2003
Yasuda (10.1016/j.eswa.2023.122295_b59) 2019
Wang (10.1016/j.eswa.2023.122295_b50) 2022; 15
Wei (10.1016/j.eswa.2023.122295_b51) 2021; 34
Mou (10.1016/j.eswa.2023.122295_b23) 2020
Ruszczynski (10.1016/j.eswa.2023.122295_b37) 1983; 28
Amari (10.1016/j.eswa.2023.122295_b2) 2013; 37
Chang (10.1016/j.eswa.2023.122295_b4) 2011; 2
Scieur (10.1016/j.eswa.2023.122295_b39) 2017; 30
Csiba (10.1016/j.eswa.2023.122295_b5) 2018; 19
Polyak (10.1016/j.eswa.2023.122295_b32) 1987
Tran (10.1016/j.eswa.2023.122295_b48) 2021
Yang (10.1016/j.eswa.2023.122295_b57) 2020; 198
Toth (10.1016/j.eswa.2023.122295_b47) 2015; 53
Hou (10.1016/j.eswa.2023.122295_b13) 2022; 10
Defazio (10.1016/j.eswa.2023.122295_b7) 2014
Roux (10.1016/j.eswa.2023.122295_b35) 2012
Yang (10.1016/j.eswa.2023.122295_b58) 2021; 558
Xu (10.1016/j.eswa.2023.122295_b55) 2023; 196
Wang (10.1016/j.eswa.2023.122295_b49) 2019; 32
Shalev-Shwartz (10.1016/j.eswa.2023.122295_b42) 2013; 14
Xiao (10.1016/j.eswa.2023.122295_b52) 2014; 24
References_xml – volume: 82
  start-page: 561
  year: 2022
  end-page: 593
  ident: b27
  article-title: Finite-sum smooth optimization with SARAH
  publication-title: Computational Optimization and Applications
– reference: Li, M., Zhang, T., Chen, Y., & Smola, A. J. (2014). Efficient mini-batch training for stochastic optimization. In
– reference: Zhou, K., Ding, Q., Shang, F., Cheng, J., Li, D., & Luo, Z. (2019). Direct acceleration of SAGA using sampled negative momentum. In
– start-page: 2947
  year: 2020
  end-page: 2997
  ident: b23
  article-title: On linear stochastic approximation: Fine-grained polyak-ruppert and non-asymptotic concentration
  publication-title: Conference on learning theory
– start-page: 3222
  year: 2018
  end-page: 3242
  ident: b26
  article-title: Iterate averaging as regularization for stochastic gradient descent
  publication-title: Conference on learning theory
– start-page: 10379
  year: 2021
  end-page: 10389
  ident: b48
  article-title: SMG: A shuffling gradient-based method with momentum
  publication-title: International conference on machine learning
– volume: 29
  start-page: 458
  year: 2016
  end-page: 471
  ident: b24
  article-title: Stochastic gradient made stable: A manifold propagation approach for large-scale optimization
  publication-title: IEEE Transactions on Knowledge and Data Engineering
– volume: 18
  start-page: 8194
  year: 2017
  end-page: 8244
  ident: b1
  article-title: Katyusha: The first direct acceleration of stochastic gradient methods
  publication-title: Journal of Machine Learning Research
– volume: 37
  start-page: 102
  year: 2020
  end-page: 113
  ident: b54
  article-title: Decentralized stochastic optimization and machine learning: A unified variance-reduction framework for robust performance and fast convergence
  publication-title: IEEE Signal Processing Magazine
– start-page: 815
  year: 2018
  end-page: 830
  ident: b43
  article-title: Asvrg: Accelerated proximal svrg
  publication-title: Asian conference on machine learning
– volume: 34
  year: 2021
  ident: b51
  article-title: Stochastic anderson mixing for nonconvex stochastic optimization
  publication-title: Advances in Neural Information Processing Systems
– volume: 179
  start-page: 47
  year: 2020
  end-page: 83
  ident: b40
  article-title: Regularized nonlinear acceleration
  publication-title: Mathematical Programming
– start-page: 173
  year: 2012
  end-page: 201
  ident: b45
  article-title: Stochastic optimization
  publication-title: Handbook of computational statistics
– start-page: 148
  year: 2017
  end-page: 156
  ident: b18
  article-title: Less than a single pass: Stochastically controlled stochastic gradient
  publication-title: Artificial intelligence and statistics
– reference: Ma, J., & Yarats, D. (2018). Quasi-hyperbolic momentum and adam for deep learning. In
– reference: (pp. 1602–1610).
– volume: 24
  start-page: 2057
  year: 2014
  end-page: 2075
  ident: b52
  article-title: A proximal stochastic gradient method with progressive variance reduction
  publication-title: SIAM Journal on Optimization
– reference: (pp. 661–670).
– year: 2003
  ident: b44
  article-title: Practical extrapolation methods: theory and applications, Vol. 10
– volume: 77
  start-page: 653
  year: 2020
  end-page: 710
  ident: b21
  article-title: Momentum and stochastic momentum for stochastic gradient, newton, proximal point and subspace descent methods
  publication-title: Computational Optimization and Applications
– year: 2017
  ident: b29
  article-title: Stochastic recursive gradient algorithm for nonconvex optimization
– start-page: 2613
  year: 2017
  end-page: 2621
  ident: b28
  article-title: SARAH: A novel method for machine learning problems using stochastic recursive gradient
  publication-title: International conference on machine learning-Volume 70
– year: 2004
  ident: b25
  article-title: Introductory lectures on convex optimization : basic course
– reference: Schmidt, M., Babanezhad, R., Ahmed, M. O., Defazio, A., Clifton, A., & Sarkar, A. (2015). Non-uniform stochastic average gradient method for training conditional random fields. In
– volume: 2
  start-page: 1
  year: 2011
  end-page: 27
  ident: b4
  article-title: Libsvm: a library for support vector machines
  publication-title: ACM Transactions on Intelligent Systems and Technology (TIST)
– volume: 10
  start-page: 685
  year: 2022
  end-page: 699
  ident: b13
  article-title: Distributed momentum-based frank-wolfe algorithm for stochastic optimization
  publication-title: IEEE/CAA Journal of Automatica Sinica
– volume: 15
  start-page: 738
  year: 2022
  end-page: 761
  ident: b50
  article-title: Scheduled restart momentum for accelerated stochastic gradient descent
  publication-title: SIAM Journal on Imaging Sciences
– year: 2015
  ident: b15
  article-title: ADAM: A method for stochastic optimization
  publication-title: ICLR (Poster)
– volume: 32
  start-page: 2406
  year: 2019
  end-page: 2416
  ident: b49
  article-title: Spiderboost and momentum: Faster variance reduction algorithms
  publication-title: Advances in Neural Information Processing Systems
– reference: (pp. 4003–4009).
– reference: Cutkosky, A., & Orabona, F. (2019). Momentum-based variance reduction in non-convex sgd. In
– volume: 19
  start-page: 962
  year: 2018
  end-page: 982
  ident: b5
  article-title: Importance sampling for minibatches
  publication-title: Journal of Machine Learning Research
– start-page: 689
  year: 2018
  end-page: 699
  ident: b9
  article-title: SPIDER: Near-optimal non-convex optimization via stochastic path-integrated differential estimator
  publication-title: Advances in neural information processing systems
– reference: Reddi, S. J., Kale, S., & Kumar, S. (2018). On the convergence of adam and beyond. In
– start-page: 1574
  year: 2014
  end-page: 1582
  ident: b31
  article-title: Stochastic proximal gradient descent with acceleration techniques
  publication-title: Advances in neural information processing systems
– volume: 39
  start-page: S47
  year: 2017
  end-page: S65
  ident: b46
  article-title: Local improvement results for anderson acceleration with inaccurate function evaluations
  publication-title: SIAM Journal on Scientific Computing
– volume: 53
  start-page: 805
  year: 2015
  end-page: 819
  ident: b47
  article-title: Convergence analysis for anderson acceleration
  publication-title: SIAM Journal on Numerical Analysis
– start-page: 315
  year: 2013
  end-page: 323
  ident: b14
  article-title: Accelerating stochastic gradient descent using predictive variance reduction
  publication-title: Advances in neural information processing systems
– reference: Liu, Y., Shang, F., & Jiao, L. (2019). Accelerated incremental gradient descent using momentum acceleration with scaling factor. In
– volume: 22
  start-page: 674
  year: 2012
  end-page: 701
  ident: b8
  article-title: Randomized smoothing for stochastic optimization
  publication-title: SIAM Journal on Optimization
– start-page: 1646
  year: 2014
  end-page: 1654
  ident: b7
  article-title: SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives
  publication-title: Advances in neural information processing systems
– volume: 30
  start-page: 9208
  year: 2021
  end-page: 9219
  ident: b53
  article-title: Ds-ui: Dual-supervised mixture of gaussian mixture models for uncertainty inference in image recognition
  publication-title: IEEE Transactions on Image Processing
– year: 2020
  ident: b17
  article-title: First-order and stochastic optimization methods for machine learning
– volume: 37
  start-page: 48
  year: 2013
  end-page: 51
  ident: b2
  article-title: Dreaming of mathematical neuroscience for half a century
  publication-title: Neural Networks
– volume: 30
  start-page: 838
  year: 1992
  end-page: 855
  ident: b33
  article-title: Acceleration of stochastic approximation by averaging
  publication-title: SIAM Journal on Control and Optimization
– volume: 28
  start-page: 1097
  year: 1983
  end-page: 1105
  ident: b37
  article-title: Stochastic approximation method with gradient averaging for unconstrained problems
  publication-title: IEEE Transactions on Automatic Control
– volume: 32
  start-page: 9633
  year: 2019
  end-page: 9643
  ident: b11
  article-title: Understanding the role of momentum in stochastic gradient methods
  publication-title: Advances in Neural Information Processing Systems
– volume: 198
  year: 2020
  ident: b57
  article-title: An accelerated stochastic variance-reduced method for machine learning problems
  publication-title: Knowledge-Based Systems
– start-page: 451
  year: 2020
  end-page: 467
  ident: b16
  article-title: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop
  publication-title: Algorithmic learning theory
– start-page: 1
  year: 1987
  ident: b32
  publication-title: Introduction to optimization
– volume: 60
  start-page: 223
  year: 2018
  end-page: 311
  ident: b3
  article-title: Optimization methods for large-scale machine learning
  publication-title: SIAM Review
– volume: 72
  start-page: 1021
  year: 2016
  end-page: 1042
  ident: b12
  article-title: Anderson acceleration of the alternating projections method for computing the nearest correlation matrix
  publication-title: Numerical Algorithms
– reference: Gidel, G., Berard, H., Vignoud, G., Vincent, P., & Lacoste-Julien, S. (2018). A variational inequality perspective on generative adversarial networks. In
– start-page: 2663
  year: 2012
  end-page: 2671
  ident: b35
  article-title: A stochastic gradient method with an exponential convergence rate for finite training sets
  publication-title: Advances in neural information processing systems
– volume: 196
  start-page: 266
  year: 2023
  end-page: 297
  ident: b55
  article-title: Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization
  publication-title: Journal of Optimization Theory and Applications
– start-page: 3935
  year: 2021
  end-page: 3971
  ident: b41
  article-title: Almost sure convergence rates for stochastic gradient descent and stochastic heavy ball
  publication-title: Conference on learning theory
– reference: (pp. 15236–15245).
– volume: 14
  year: 2013
  ident: b42
  article-title: Stochastic dual coordinate ascent methods for regularized loss minimization
  publication-title: Journal of Machine Learning Research
– reference: .
– start-page: 1874
  year: 2019
  end-page: 1879
  ident: b59
  article-title: A stochastic variance reduced nesterov’s accelerated quasi-newton method
  publication-title: 2019 18th IEEE international conference on machine learning and applications (ICMLA)
– volume: 20
  start-page: 1
  year: 2019
  end-page: 49
  ident: b30
  article-title: New convergence aspects of stochastic gradient algorithms
  publication-title: Journal of Machine Learning Research
– volume: 30
  start-page: 3982
  year: 2017
  end-page: 3991
  ident: b39
  article-title: Nonlinear acceleration of stochastic algorithms
  publication-title: Advances in Neural Information Processing Systems
– start-page: 2663
  year: 2012
  end-page: 2671
  ident: b36
  article-title: A stochastic gradient method with an exponential convergence _rate for finite training sets
  publication-title: Advances in neural information processing systems
– reference: (pp. 3045–3051).
– reference: Xu, Y., Yuan, Z., Yang, R., & Yang, T. (2019). On the convergence of (stochastic) gradient descent with extrapolation for non-convex minimization. In
– volume: 558
  start-page: 157
  year: 2021
  end-page: 173
  ident: b58
  article-title: Accelerating mini-batch sarah by step size rules
  publication-title: Information Sciences
– year: 2015
  ident: 10.1016/j.eswa.2023.122295_b15
  article-title: ADAM: A method for stochastic optimization
– volume: 72
  start-page: 1021
  year: 2016
  ident: 10.1016/j.eswa.2023.122295_b12
  article-title: Anderson acceleration of the alternating projections method for computing the nearest correlation matrix
  publication-title: Numerical Algorithms
  doi: 10.1007/s11075-015-0078-3
– ident: 10.1016/j.eswa.2023.122295_b20
  doi: 10.24963/ijcai.2019/422
– volume: 24
  start-page: 2057
  year: 2014
  ident: 10.1016/j.eswa.2023.122295_b52
  article-title: A proximal stochastic gradient method with progressive variance reduction
  publication-title: SIAM Journal on Optimization
  doi: 10.1137/140961791
– volume: 30
  start-page: 9208
  year: 2021
  ident: 10.1016/j.eswa.2023.122295_b53
  article-title: Ds-ui: Dual-supervised mixture of gaussian mixture models for uncertainty inference in image recognition
  publication-title: IEEE Transactions on Image Processing
  doi: 10.1109/TIP.2021.3123555
– volume: 198
  year: 2020
  ident: 10.1016/j.eswa.2023.122295_b57
  article-title: An accelerated stochastic variance-reduced method for machine learning problems
  publication-title: Knowledge-Based Systems
  doi: 10.1016/j.knosys.2020.105941
– volume: 558
  start-page: 157
  year: 2021
  ident: 10.1016/j.eswa.2023.122295_b58
  article-title: Accelerating mini-batch sarah by step size rules
  publication-title: Information Sciences
  doi: 10.1016/j.ins.2020.12.075
– volume: 179
  start-page: 47
  year: 2020
  ident: 10.1016/j.eswa.2023.122295_b40
  article-title: Regularized nonlinear acceleration
  publication-title: Mathematical Programming
  doi: 10.1007/s10107-018-1319-8
– volume: 22
  start-page: 674
  year: 2012
  ident: 10.1016/j.eswa.2023.122295_b8
  article-title: Randomized smoothing for stochastic optimization
  publication-title: SIAM Journal on Optimization
  doi: 10.1137/110831659
– ident: 10.1016/j.eswa.2023.122295_b10
– start-page: 689
  year: 2018
  ident: 10.1016/j.eswa.2023.122295_b9
  article-title: SPIDER: Near-optimal non-convex optimization via stochastic path-integrated differential estimator
– volume: 10
  start-page: 685
  year: 2022
  ident: 10.1016/j.eswa.2023.122295_b13
  article-title: Distributed momentum-based frank-wolfe algorithm for stochastic optimization
  publication-title: IEEE/CAA Journal of Automatica Sinica
  doi: 10.1109/JAS.2022.105923
– volume: 77
  start-page: 653
  year: 2020
  ident: 10.1016/j.eswa.2023.122295_b21
  article-title: Momentum and stochastic momentum for stochastic gradient, newton, proximal point and subspace descent methods
  publication-title: Computational Optimization and Applications
  doi: 10.1007/s10589-020-00220-z
– year: 2020
  ident: 10.1016/j.eswa.2023.122295_b17
– start-page: 3935
  year: 2021
  ident: 10.1016/j.eswa.2023.122295_b41
  article-title: Almost sure convergence rates for stochastic gradient descent and stochastic heavy ball
– start-page: 173
  year: 2012
  ident: 10.1016/j.eswa.2023.122295_b45
  article-title: Stochastic optimization
– start-page: 2613
  year: 2017
  ident: 10.1016/j.eswa.2023.122295_b28
  article-title: SARAH: A novel method for machine learning problems using stochastic recursive gradient
– start-page: 815
  year: 2018
  ident: 10.1016/j.eswa.2023.122295_b43
  article-title: Asvrg: Accelerated proximal svrg
– year: 2004
  ident: 10.1016/j.eswa.2023.122295_b25
– volume: 32
  start-page: 9633
  year: 2019
  ident: 10.1016/j.eswa.2023.122295_b11
  article-title: Understanding the role of momentum in stochastic gradient methods
  publication-title: Advances in Neural Information Processing Systems
– ident: 10.1016/j.eswa.2023.122295_b56
  doi: 10.24963/ijcai.2019/556
– volume: 32
  start-page: 2406
  year: 2019
  ident: 10.1016/j.eswa.2023.122295_b49
  article-title: Spiderboost and momentum: Faster variance reduction algorithms
  publication-title: Advances in Neural Information Processing Systems
– volume: 30
  start-page: 838
  year: 1992
  ident: 10.1016/j.eswa.2023.122295_b33
  article-title: Acceleration of stochastic approximation by averaging
  publication-title: SIAM Journal on Control and Optimization
  doi: 10.1137/0330046
– ident: 10.1016/j.eswa.2023.122295_b6
– volume: 18
  start-page: 8194
  year: 2017
  ident: 10.1016/j.eswa.2023.122295_b1
  article-title: Katyusha: The first direct acceleration of stochastic gradient methods
  publication-title: Journal of Machine Learning Research
– ident: 10.1016/j.eswa.2023.122295_b22
– start-page: 2663
  year: 2012
  ident: 10.1016/j.eswa.2023.122295_b36
  article-title: A stochastic gradient method with an exponential convergence _rate for finite training sets
– start-page: 1
  year: 1987
  ident: 10.1016/j.eswa.2023.122295_b32
– volume: 15
  start-page: 738
  year: 2022
  ident: 10.1016/j.eswa.2023.122295_b50
  article-title: Scheduled restart momentum for accelerated stochastic gradient descent
  publication-title: SIAM Journal on Imaging Sciences
  doi: 10.1137/21M1453311
– volume: 2
  start-page: 1
  year: 2011
  ident: 10.1016/j.eswa.2023.122295_b4
  article-title: Libsvm: a library for support vector machines
  publication-title: ACM Transactions on Intelligent Systems and Technology (TIST)
  doi: 10.1145/1961189.1961199
– volume: 20
  start-page: 1
  year: 2019
  ident: 10.1016/j.eswa.2023.122295_b30
  article-title: New convergence aspects of stochastic gradient algorithms
  publication-title: Journal of Machine Learning Research
– volume: 37
  start-page: 102
  year: 2020
  ident: 10.1016/j.eswa.2023.122295_b54
  article-title: Decentralized stochastic optimization and machine learning: A unified variance-reduction framework for robust performance and fast convergence
  publication-title: IEEE Signal Processing Magazine
  doi: 10.1109/MSP.2020.2974267
– start-page: 2663
  year: 2012
  ident: 10.1016/j.eswa.2023.122295_b35
  article-title: A stochastic gradient method with an exponential convergence rate for finite training sets
– year: 2003
  ident: 10.1016/j.eswa.2023.122295_b44
– volume: 37
  start-page: 48
  year: 2013
  ident: 10.1016/j.eswa.2023.122295_b2
  article-title: Dreaming of mathematical neuroscience for half a century
  publication-title: Neural Networks
  doi: 10.1016/j.neunet.2012.09.014
– volume: 28
  start-page: 1097
  year: 1983
  ident: 10.1016/j.eswa.2023.122295_b37
  article-title: Stochastic approximation method with gradient averaging for unconstrained problems
  publication-title: IEEE Transactions on Automatic Control
  doi: 10.1109/TAC.1983.1103184
– volume: 30
  start-page: 3982
  year: 2017
  ident: 10.1016/j.eswa.2023.122295_b39
  article-title: Nonlinear acceleration of stochastic algorithms
  publication-title: Advances in Neural Information Processing Systems
– start-page: 1574
  year: 2014
  ident: 10.1016/j.eswa.2023.122295_b31
  article-title: Stochastic proximal gradient descent with acceleration techniques
– ident: 10.1016/j.eswa.2023.122295_b60
– start-page: 1646
  year: 2014
  ident: 10.1016/j.eswa.2023.122295_b7
  article-title: SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives
– volume: 39
  start-page: S47
  year: 2017
  ident: 10.1016/j.eswa.2023.122295_b46
  article-title: Local improvement results for anderson acceleration with inaccurate function evaluations
  publication-title: SIAM Journal on Scientific Computing
  doi: 10.1137/16M1080677
– start-page: 451
  year: 2020
  ident: 10.1016/j.eswa.2023.122295_b16
  article-title: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop
– volume: 82
  start-page: 561
  year: 2022
  ident: 10.1016/j.eswa.2023.122295_b27
  article-title: Finite-sum smooth optimization with SARAH
  publication-title: Computational Optimization and Applications
  doi: 10.1007/s10589-022-00375-x
– volume: 60
  start-page: 223
  year: 2018
  ident: 10.1016/j.eswa.2023.122295_b3
  article-title: Optimization methods for large-scale machine learning
  publication-title: SIAM Review
  doi: 10.1137/16M1080173
– start-page: 3222
  year: 2018
  ident: 10.1016/j.eswa.2023.122295_b26
  article-title: Iterate averaging as regularization for stochastic gradient descent
– start-page: 1874
  year: 2019
  ident: 10.1016/j.eswa.2023.122295_b59
  article-title: A stochastic variance reduced nesterov’s accelerated quasi-newton method
– volume: 196
  start-page: 266
  year: 2023
  ident: 10.1016/j.eswa.2023.122295_b55
  article-title: Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization
  publication-title: Journal of Optimization Theory and Applications
  doi: 10.1007/s10957-022-02132-w
– start-page: 10379
  year: 2021
  ident: 10.1016/j.eswa.2023.122295_b48
  article-title: SMG: A shuffling gradient-based method with momentum
– ident: 10.1016/j.eswa.2023.122295_b38
– volume: 29
  start-page: 458
  year: 2016
  ident: 10.1016/j.eswa.2023.122295_b24
  article-title: Stochastic gradient made stable: A manifold propagation approach for large-scale optimization
  publication-title: IEEE Transactions on Knowledge and Data Engineering
  doi: 10.1109/TKDE.2016.2604302
– start-page: 148
  year: 2017
  ident: 10.1016/j.eswa.2023.122295_b18
  article-title: Less than a single pass: Stochastically controlled stochastic gradient
– volume: 53
  start-page: 805
  year: 2015
  ident: 10.1016/j.eswa.2023.122295_b47
  article-title: Convergence analysis for anderson acceleration
  publication-title: SIAM Journal on Numerical Analysis
  doi: 10.1137/130919398
– ident: 10.1016/j.eswa.2023.122295_b19
  doi: 10.1145/2623330.2623612
– ident: 10.1016/j.eswa.2023.122295_b34
– year: 2017
  ident: 10.1016/j.eswa.2023.122295_b29
– volume: 14
  year: 2013
  ident: 10.1016/j.eswa.2023.122295_b42
  article-title: Stochastic dual coordinate ascent methods for regularized loss minimization
  publication-title: Journal of Machine Learning Research
– start-page: 2947
  year: 2020
  ident: 10.1016/j.eswa.2023.122295_b23
  article-title: On linear stochastic approximation: Fine-grained polyak-ruppert and non-asymptotic concentration
– volume: 34
  year: 2021
  ident: 10.1016/j.eswa.2023.122295_b51
  article-title: Stochastic anderson mixing for nonconvex stochastic optimization
  publication-title: Advances in Neural Information Processing Systems
– volume: 19
  start-page: 962
  year: 2018
  ident: 10.1016/j.eswa.2023.122295_b5
  article-title: Importance sampling for minibatches
  publication-title: Journal of Machine Learning Research
– start-page: 315
  year: 2013
  ident: 10.1016/j.eswa.2023.122295_b14
  article-title: Accelerating stochastic gradient descent using predictive variance reduction
SSID ssj0017007
Score 2.4625134
Snippet As a simple but effective way, the momentum method has been widely adopted in stochastic optimization algorithms for large-scale machine learning problems and...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 122295
SubjectTerms Adaptive step size
Machine learning
Momentum
Stochastic optimization
Variance reduction
Title SARAH-M: A fast stochastic recursive gradient descent algorithm via momentum
URI https://dx.doi.org/10.1016/j.eswa.2023.122295
Volume 238
WOSCitedRecordID wos001105380600001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  issn: 0957-4174
  databaseCode: AIEXJ
  dateStart: 19950101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.sciencedirect.com
  omitProxy: false
  ssIdentifier: ssj0017007
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT9wwELZa6KGXvisobeVDb1HQxnk45hZVVIBaVLVUWvUSOX6wu4KAkrDw8xnHj6WIonLoJYkix4n82ZPJZOb7EPrEJPgQjPJYyXSUMKMxZ42KdVNKwRpCaFOOYhP08LCcTtl3JyXXj3ICtG3Lqyt2_l-hhnMAtimdfQDcoVM4AccAOmwBdtj-E_A_qx_VXvzNlpxr3g8R-HdixnvL1ixMeGCpouNuTPYaImkJnSJ-cnzWzYfZabSc8-jUMDMMjqdhEfL1VDc48mdfFnfjB3iwIC4G_Xt2wd2L0cUVSGYSq2xlZQgQ0jhLrIaOt5XEUrE4a5eMYuB3GmIbE1hsq_7SsDuRdHvV-E_W61tvo5Aj6NPPFrXpozZ91LaPx2id0JyBDVuv9nenB-GvEZ3Y8nj_5K5Iyubz3X6Sux2RG87F0Qv0zH0V4Mqi-RI9Uu0r9NwrbmBngF-jrw7cHVxhAy1eQYsDtNhDix20OECLAVrsoX2Dfn3ZPfq8Fzs9jFikk8kQM7DNutATSamSKpNZziSstVJyokiuwRNLs1QWsMZyXqgGFh7sU67gI5UXSSHTt2itPWvVBsJCG8edJY0uVSYIaXKWE6EY50wLXchNlPjBqYUjizeaJSf132HZRFG45txSpdzbOvdjXjtnzzpxNUyhe65796C7bKGnq7n9Hq0N3YX6gJ6I5TDvu49u_lwDmUp1bQ
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SARAH-M%3A+A+fast+stochastic+recursive+gradient+descent+algorithm+via+momentum&rft.jtitle=Expert+systems+with+applications&rft.au=Yang%2C+Zhuang&rft.date=2024-03-15&rft.issn=0957-4174&rft.volume=238&rft.spage=122295&rft_id=info:doi/10.1016%2Fj.eswa.2023.122295&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_eswa_2023_122295
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0957-4174&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0957-4174&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0957-4174&client=summon