SARAH-M: A fast stochastic recursive gradient descent algorithm via momentum
As a simple but effective way, the momentum method has been widely adopted in stochastic optimization algorithms for large-scale machine learning problems and the success of stochastic optimization with the momentum term for many applications in machine learning and other related areas has been repo...
Saved in:
| Published in: | Expert systems with applications Vol. 238; p. 122295 |
|---|---|
| Main Author: | |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier Ltd
15.03.2024
|
| Subjects: | |
| ISSN: | 0957-4174 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | As a simple but effective way, the momentum method has been widely adopted in stochastic optimization algorithms for large-scale machine learning problems and the success of stochastic optimization with the momentum term for many applications in machine learning and other related areas has been reported everywhere. However, the understanding of how the momentum improves the performance of modern variance reduced stochastic gradient algorithms, e.g., the stochastic dual coordinate ascent average gradient (SDCA) method, the stochastically controlled stochastic gradient (SCSG) method, the stochastic recursive gradient algorithm (SARAH), etc., is still limited. To tackle this issue, this work studies the performance of SARAH with the momentum term theoretically and empirically, and develops a novel variance reduced stochastic gradient algorithm, termed as SARAH-M. We rigorously prove that SARAH-M attains a linear rate of convergence for minimizing the strongly convex function. We further propose an adaptive SARAH-M method (abbreviated as AdaSARAH-M) by incorporating the random Barzilai–Borwein (RBB) technique into SARAH-M, which provides an easy way to determine the step size for the original SARAH-M algorithm. The theoretical analysis that shows AdaSARAH-M with a linear convergence speed is also provided. Moreover, we show that the complexity of the proposed algorithms can outperform modern stochastic optimization algorithms. Finally, the numerical results, compared with state-of-the-art algorithms on benchmarking machine learning problems, verify the efficacy of the momentum in variance reduced stochastic gradient algorithms.
•The efficacy of the variance reduced method with momentum is verified.•An adaptive variance reduced method with momentum is proposed.•The convergence properties of the proposed methods are provided.•Experimental results show great promise in standard machine learning tasks. |
|---|---|
| AbstractList | As a simple but effective way, the momentum method has been widely adopted in stochastic optimization algorithms for large-scale machine learning problems and the success of stochastic optimization with the momentum term for many applications in machine learning and other related areas has been reported everywhere. However, the understanding of how the momentum improves the performance of modern variance reduced stochastic gradient algorithms, e.g., the stochastic dual coordinate ascent average gradient (SDCA) method, the stochastically controlled stochastic gradient (SCSG) method, the stochastic recursive gradient algorithm (SARAH), etc., is still limited. To tackle this issue, this work studies the performance of SARAH with the momentum term theoretically and empirically, and develops a novel variance reduced stochastic gradient algorithm, termed as SARAH-M. We rigorously prove that SARAH-M attains a linear rate of convergence for minimizing the strongly convex function. We further propose an adaptive SARAH-M method (abbreviated as AdaSARAH-M) by incorporating the random Barzilai–Borwein (RBB) technique into SARAH-M, which provides an easy way to determine the step size for the original SARAH-M algorithm. The theoretical analysis that shows AdaSARAH-M with a linear convergence speed is also provided. Moreover, we show that the complexity of the proposed algorithms can outperform modern stochastic optimization algorithms. Finally, the numerical results, compared with state-of-the-art algorithms on benchmarking machine learning problems, verify the efficacy of the momentum in variance reduced stochastic gradient algorithms.
•The efficacy of the variance reduced method with momentum is verified.•An adaptive variance reduced method with momentum is proposed.•The convergence properties of the proposed methods are provided.•Experimental results show great promise in standard machine learning tasks. |
| ArticleNumber | 122295 |
| Author | Yang, Zhuang |
| Author_xml | – sequence: 1 givenname: Zhuang orcidid: 0000-0002-8374-2928 surname: Yang fullname: Yang, Zhuang email: zhuangyng@163.com organization: School of Computer Science and Technology, Soochow University, Suzhou, 215006, China |
| BookMark | eNp9kMtKAzEUQLOoYKv-gKv8wIx5TGYacTMUtUJF8LEOMbnTpnQmkqQV_94MdeWiq3O5cC7cM0OTwQ-A0DUlJSW0vtmWEL91yQjjJWWMSTFBUyJFU1S0qc7RLMYtIbQhpJmi1Vv72i6L51vc4k7HhGPyZpMHZ3AAsw_RHQCvg7YOhoQtRDNS79Y-uLTp8cFp3Ps-L_f9JTrr9C7C1R8v0MfD_ftiWaxeHp8W7aownJBUSE5EV3fENg1YqGwlpJVWzK1mwERHBeMVt7U1UugaPivZZHINsia6prXlF2h-vGuCjzFAp4xLOjk_pKDdTlGixhJqq8YSaiyhjiWyyv6pX8H1Ovyclu6OEuSnDg6CiibnMGBdbpSU9e6U_gvS73vk |
| CitedBy_id | crossref_primary_10_1007_s13042_024_02514_8 crossref_primary_10_1109_TCSS_2024_3411630 crossref_primary_10_1007_s13042_024_02524_6 crossref_primary_10_1016_j_asoc_2025_113073 |
| Cites_doi | 10.1007/s11075-015-0078-3 10.24963/ijcai.2019/422 10.1137/140961791 10.1109/TIP.2021.3123555 10.1016/j.knosys.2020.105941 10.1016/j.ins.2020.12.075 10.1007/s10107-018-1319-8 10.1137/110831659 10.1109/JAS.2022.105923 10.1007/s10589-020-00220-z 10.24963/ijcai.2019/556 10.1137/0330046 10.1137/21M1453311 10.1145/1961189.1961199 10.1109/MSP.2020.2974267 10.1016/j.neunet.2012.09.014 10.1109/TAC.1983.1103184 10.1137/16M1080677 10.1007/s10589-022-00375-x 10.1137/16M1080173 10.1007/s10957-022-02132-w 10.1109/TKDE.2016.2604302 10.1137/130919398 10.1145/2623330.2623612 |
| ContentType | Journal Article |
| Copyright | 2023 Elsevier Ltd |
| Copyright_xml | – notice: 2023 Elsevier Ltd |
| DBID | AAYXX CITATION |
| DOI | 10.1016/j.eswa.2023.122295 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| ExternalDocumentID | 10_1016_j_eswa_2023_122295 S0957417423027975 |
| GroupedDBID | --K --M .DC .~1 0R~ 13V 1B1 1RT 1~. 1~5 4.4 457 4G. 5GY 5VS 7-5 71M 8P~ 9JN 9JO AAAKF AABNK AACTN AAEDT AAEDW AAIKJ AAKOC AALRI AAOAW AAQFI AARIN AAXKI AAXUO AAYFN ABBOA ABFNM ABMAC ABMVD ABUCO ACDAQ ACGFS ACHRH ACNTT ACRLP ACZNC ADBBV ADEZE ADTZH AEBSH AECPX AEKER AENEX AFJKZ AFKWA AFTJW AGHFR AGUBO AGUMN AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJOXV AKRWK ALEQD ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD APLSM AXJTR BJAXD BKOJK BLXMC BNSAS CS3 DU5 EBS EFJIC EO8 EO9 EP2 EP3 F5P FDB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ HAMUX IHE J1W JJJVA KOM LG9 LY1 LY7 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. PQQKQ Q38 RIG ROL RPZ SDF SDG SDP SDS SES SEW SPC SPCBC SSB SSD SSL SST SSV SSZ T5K TN5 ~G- 29G 9DU AAAKG AAQXK AATTM AAYWO AAYXX ABJNI ABKBG ABUFD ABWVN ABXDB ACLOT ACNNM ACRPL ACVFH ADCNI ADJOM ADMUD ADNMO AEIPS AEUPX AFPUW AGQPQ AIGII AIIUN AKBMS AKYEP ANKPU APXCP ASPBG AVWKF AZFZN CITATION EFKBS EFLBG EJD FEDTE FGOYB G-2 HLZ HVGLF HZ~ R2- SBC SET WUQ XPP ZMT ~HD |
| ID | FETCH-LOGICAL-c300t-9305f6f0d77ede4d459d9d58da2e25f152343d6dc95a6eb4975a63ae960a616d3 |
| ISICitedReferencesCount | 6 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001105380600001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0957-4174 |
| IngestDate | Tue Nov 18 21:47:17 EST 2025 Sat Nov 29 06:15:33 EST 2025 Sat Oct 19 15:54:24 EDT 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Adaptive step size Momentum Stochastic optimization Machine learning Variance reduction |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c300t-9305f6f0d77ede4d459d9d58da2e25f152343d6dc95a6eb4975a63ae960a616d3 |
| ORCID | 0000-0002-8374-2928 |
| ParticipantIDs | crossref_citationtrail_10_1016_j_eswa_2023_122295 crossref_primary_10_1016_j_eswa_2023_122295 elsevier_sciencedirect_doi_10_1016_j_eswa_2023_122295 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-03-15 |
| PublicationDateYYYYMMDD | 2024-03-15 |
| PublicationDate_xml | – month: 03 year: 2024 text: 2024-03-15 day: 15 |
| PublicationDecade | 2020 |
| PublicationTitle | Expert systems with applications |
| PublicationYear | 2024 |
| Publisher | Elsevier Ltd |
| Publisher_xml | – name: Elsevier Ltd |
| References | (pp. 15236–15245). Nguyen, van Dijk, Phan, Nguyen, Weng, Kalagnanam (b27) 2022; 82 Reddi, S. J., Kale, S., & Kumar, S. (2018). On the convergence of adam and beyond. In Toth, Ellis, Evans, Hamilton, Kelley, Pawlowski, Slattery (b46) 2017; 39 Nguyen, Nguyen, Richtárik, Scheinberg, Takáč, van Dijk (b30) 2019; 20 Li, M., Zhang, T., Chen, Y., & Smola, A. J. (2014). Efficient mini-batch training for stochastic optimization. In Amari (b2) 2013; 37 Nguyen, Liu, Scheinberg, Takáč (b28) 2017 Ruszczynski, Syski (b37) 1983; 28 Spall (b45) 2012 Lan (b17) 2020 Xu, Xu (b55) 2023; 196 Loizou, Richtárik (b21) 2020; 77 Mu, Liu, Liu, Fan (b24) 2016; 29 Wang, Nguyen, Sun, Baraniuk, Osher (b50) 2022; 15 Ma, J., & Yarats, D. (2018). Quasi-hyperbolic momentum and adam for deep learning. In Chang, Lin (b4) 2011; 2 Duchi, Bartlett, Wainwright (b8) 2012; 22 Cutkosky, A., & Orabona, F. (2019). Momentum-based variance reduction in non-convex sgd. In Schmidt, M., Babanezhad, R., Ahmed, M. O., Defazio, A., Clifton, A., & Sarkar, A. (2015). Non-uniform stochastic average gradient method for training conditional random fields. In Shalev-Shwartz, Zhang (b42) 2013; 14 Toth, Kelley (b47) 2015; 53 Wei, Bao, Liu (b51) 2021; 34 Yang, Chen, Wang (b57) 2020; 198 Liu, Y., Shang, F., & Jiao, L. (2019). Accelerated incremental gradient descent using momentum acceleration with scaling factor. In . Tran, Nguyen, Tran-Dinh (b48) 2021 Allen-Zhu (b1) 2017; 18 Hou, Zeng, Wang, Chen (b13) 2022; 10 Xu, Y., Yuan, Z., Yang, R., & Yang, T. (2019). On the convergence of (stochastic) gradient descent with extrapolation for non-convex minimization. In (pp. 4003–4009). Gitman, Lang, Zhang, Xiao (b11) 2019; 32 Zhou, K., Ding, Q., Shang, F., Cheng, J., Li, D., & Luo, Z. (2019). Direct acceleration of SAGA using sampled negative momentum. In Higham, Strabić (b12) 2016; 72 Lei, Jordan (b18) 2017 Csiba, Richtárik (b5) 2018; 19 Polyak, Juditsky (b33) 1992; 30 Wang, Ji, Zhou, Tarokh (b49) 2019; 32 Bottou, Curtis, Nocedal (b3) 2018; 60 Nitanda (b31) 2014 (pp. 1602–1610). Roux, Schmidt, Bach (b35) 2012 Scieur, d’Aspremont, Bach (b40) 2020; 179 Defazio, Bach, Lacoste-Julien (b7) 2014 Sebbouh, Gower, Defazio (b41) 2021 (pp. 661–670). (pp. 3045–3051). Kingma, Ba (b15) 2015 Kovalev, Horváth, Richtárik (b16) 2020 Xiao, Zhang (b52) 2014; 24 Neu, Rosasco (b26) 2018 Fang, Li, Lin, Zhang (b9) 2018 Yasuda, Mahboubi, Indrapriyadarsini, Ninomiya, Asai (b59) 2019 Polyak (b32) 1987 Roux, Schmidt, Bach (b36) 2012 Mou, Li, Wainwright, Bartlett, Jordan (b23) 2020 Nguyen, Liu, Scheinberg, Takáč (b29) 2017 Sidi (b44) 2003 Johnson, Zhang (b14) 2013 Yang, Chen, Wang (b58) 2021; 558 Nesterov (b25) 2004 Xin, Kar, Khan (b54) 2020; 37 Gidel, G., Berard, H., Vignoud, G., Vincent, P., & Lacoste-Julien, S. (2018). A variational inequality perspective on generative adversarial networks. In Scieur, Bach, d’Aspremont (b39) 2017; 30 Xie, Ma, Xue, Sun, Zheng, Guo (b53) 2021; 30 Shang, Jiao, Zhou, Cheng, Ren, Jin (b43) 2018 Neu (10.1016/j.eswa.2023.122295_b26) 2018 Allen-Zhu (10.1016/j.eswa.2023.122295_b1) 2017; 18 Nesterov (10.1016/j.eswa.2023.122295_b25) 2004 Higham (10.1016/j.eswa.2023.122295_b12) 2016; 72 Nguyen (10.1016/j.eswa.2023.122295_b27) 2022; 82 Sebbouh (10.1016/j.eswa.2023.122295_b41) 2021 Lei (10.1016/j.eswa.2023.122295_b18) 2017 10.1016/j.eswa.2023.122295_b34 Duchi (10.1016/j.eswa.2023.122295_b8) 2012; 22 10.1016/j.eswa.2023.122295_b38 Xin (10.1016/j.eswa.2023.122295_b54) 2020; 37 Nitanda (10.1016/j.eswa.2023.122295_b31) 2014 Nguyen (10.1016/j.eswa.2023.122295_b28) 2017 Nguyen (10.1016/j.eswa.2023.122295_b30) 2019; 20 Kingma (10.1016/j.eswa.2023.122295_b15) 2015 10.1016/j.eswa.2023.122295_b19 Loizou (10.1016/j.eswa.2023.122295_b21) 2020; 77 Xie (10.1016/j.eswa.2023.122295_b53) 2021; 30 Mu (10.1016/j.eswa.2023.122295_b24) 2016; 29 Toth (10.1016/j.eswa.2023.122295_b46) 2017; 39 10.1016/j.eswa.2023.122295_b22 10.1016/j.eswa.2023.122295_b6 Lan (10.1016/j.eswa.2023.122295_b17) 2020 Spall (10.1016/j.eswa.2023.122295_b45) 2012 10.1016/j.eswa.2023.122295_b60 Fang (10.1016/j.eswa.2023.122295_b9) 2018 Polyak (10.1016/j.eswa.2023.122295_b33) 1992; 30 10.1016/j.eswa.2023.122295_b20 Kovalev (10.1016/j.eswa.2023.122295_b16) 2020 Nguyen (10.1016/j.eswa.2023.122295_b29) 2017 Shang (10.1016/j.eswa.2023.122295_b43) 2018 Roux (10.1016/j.eswa.2023.122295_b36) 2012 Gitman (10.1016/j.eswa.2023.122295_b11) 2019; 32 Johnson (10.1016/j.eswa.2023.122295_b14) 2013 Bottou (10.1016/j.eswa.2023.122295_b3) 2018; 60 Scieur (10.1016/j.eswa.2023.122295_b40) 2020; 179 10.1016/j.eswa.2023.122295_b10 10.1016/j.eswa.2023.122295_b56 Sidi (10.1016/j.eswa.2023.122295_b44) 2003 Yasuda (10.1016/j.eswa.2023.122295_b59) 2019 Wang (10.1016/j.eswa.2023.122295_b50) 2022; 15 Wei (10.1016/j.eswa.2023.122295_b51) 2021; 34 Mou (10.1016/j.eswa.2023.122295_b23) 2020 Ruszczynski (10.1016/j.eswa.2023.122295_b37) 1983; 28 Amari (10.1016/j.eswa.2023.122295_b2) 2013; 37 Chang (10.1016/j.eswa.2023.122295_b4) 2011; 2 Scieur (10.1016/j.eswa.2023.122295_b39) 2017; 30 Csiba (10.1016/j.eswa.2023.122295_b5) 2018; 19 Polyak (10.1016/j.eswa.2023.122295_b32) 1987 Tran (10.1016/j.eswa.2023.122295_b48) 2021 Yang (10.1016/j.eswa.2023.122295_b57) 2020; 198 Toth (10.1016/j.eswa.2023.122295_b47) 2015; 53 Hou (10.1016/j.eswa.2023.122295_b13) 2022; 10 Defazio (10.1016/j.eswa.2023.122295_b7) 2014 Roux (10.1016/j.eswa.2023.122295_b35) 2012 Yang (10.1016/j.eswa.2023.122295_b58) 2021; 558 Xu (10.1016/j.eswa.2023.122295_b55) 2023; 196 Wang (10.1016/j.eswa.2023.122295_b49) 2019; 32 Shalev-Shwartz (10.1016/j.eswa.2023.122295_b42) 2013; 14 Xiao (10.1016/j.eswa.2023.122295_b52) 2014; 24 |
| References_xml | – volume: 82 start-page: 561 year: 2022 end-page: 593 ident: b27 article-title: Finite-sum smooth optimization with SARAH publication-title: Computational Optimization and Applications – reference: Li, M., Zhang, T., Chen, Y., & Smola, A. J. (2014). Efficient mini-batch training for stochastic optimization. In – reference: Zhou, K., Ding, Q., Shang, F., Cheng, J., Li, D., & Luo, Z. (2019). Direct acceleration of SAGA using sampled negative momentum. In – start-page: 2947 year: 2020 end-page: 2997 ident: b23 article-title: On linear stochastic approximation: Fine-grained polyak-ruppert and non-asymptotic concentration publication-title: Conference on learning theory – start-page: 3222 year: 2018 end-page: 3242 ident: b26 article-title: Iterate averaging as regularization for stochastic gradient descent publication-title: Conference on learning theory – start-page: 10379 year: 2021 end-page: 10389 ident: b48 article-title: SMG: A shuffling gradient-based method with momentum publication-title: International conference on machine learning – volume: 29 start-page: 458 year: 2016 end-page: 471 ident: b24 article-title: Stochastic gradient made stable: A manifold propagation approach for large-scale optimization publication-title: IEEE Transactions on Knowledge and Data Engineering – volume: 18 start-page: 8194 year: 2017 end-page: 8244 ident: b1 article-title: Katyusha: The first direct acceleration of stochastic gradient methods publication-title: Journal of Machine Learning Research – volume: 37 start-page: 102 year: 2020 end-page: 113 ident: b54 article-title: Decentralized stochastic optimization and machine learning: A unified variance-reduction framework for robust performance and fast convergence publication-title: IEEE Signal Processing Magazine – start-page: 815 year: 2018 end-page: 830 ident: b43 article-title: Asvrg: Accelerated proximal svrg publication-title: Asian conference on machine learning – volume: 34 year: 2021 ident: b51 article-title: Stochastic anderson mixing for nonconvex stochastic optimization publication-title: Advances in Neural Information Processing Systems – volume: 179 start-page: 47 year: 2020 end-page: 83 ident: b40 article-title: Regularized nonlinear acceleration publication-title: Mathematical Programming – start-page: 173 year: 2012 end-page: 201 ident: b45 article-title: Stochastic optimization publication-title: Handbook of computational statistics – start-page: 148 year: 2017 end-page: 156 ident: b18 article-title: Less than a single pass: Stochastically controlled stochastic gradient publication-title: Artificial intelligence and statistics – reference: Ma, J., & Yarats, D. (2018). Quasi-hyperbolic momentum and adam for deep learning. In – reference: (pp. 1602–1610). – volume: 24 start-page: 2057 year: 2014 end-page: 2075 ident: b52 article-title: A proximal stochastic gradient method with progressive variance reduction publication-title: SIAM Journal on Optimization – reference: (pp. 661–670). – year: 2003 ident: b44 article-title: Practical extrapolation methods: theory and applications, Vol. 10 – volume: 77 start-page: 653 year: 2020 end-page: 710 ident: b21 article-title: Momentum and stochastic momentum for stochastic gradient, newton, proximal point and subspace descent methods publication-title: Computational Optimization and Applications – year: 2017 ident: b29 article-title: Stochastic recursive gradient algorithm for nonconvex optimization – start-page: 2613 year: 2017 end-page: 2621 ident: b28 article-title: SARAH: A novel method for machine learning problems using stochastic recursive gradient publication-title: International conference on machine learning-Volume 70 – year: 2004 ident: b25 article-title: Introductory lectures on convex optimization : basic course – reference: Schmidt, M., Babanezhad, R., Ahmed, M. O., Defazio, A., Clifton, A., & Sarkar, A. (2015). Non-uniform stochastic average gradient method for training conditional random fields. In – volume: 2 start-page: 1 year: 2011 end-page: 27 ident: b4 article-title: Libsvm: a library for support vector machines publication-title: ACM Transactions on Intelligent Systems and Technology (TIST) – volume: 10 start-page: 685 year: 2022 end-page: 699 ident: b13 article-title: Distributed momentum-based frank-wolfe algorithm for stochastic optimization publication-title: IEEE/CAA Journal of Automatica Sinica – volume: 15 start-page: 738 year: 2022 end-page: 761 ident: b50 article-title: Scheduled restart momentum for accelerated stochastic gradient descent publication-title: SIAM Journal on Imaging Sciences – year: 2015 ident: b15 article-title: ADAM: A method for stochastic optimization publication-title: ICLR (Poster) – volume: 32 start-page: 2406 year: 2019 end-page: 2416 ident: b49 article-title: Spiderboost and momentum: Faster variance reduction algorithms publication-title: Advances in Neural Information Processing Systems – reference: (pp. 4003–4009). – reference: Cutkosky, A., & Orabona, F. (2019). Momentum-based variance reduction in non-convex sgd. In – volume: 19 start-page: 962 year: 2018 end-page: 982 ident: b5 article-title: Importance sampling for minibatches publication-title: Journal of Machine Learning Research – start-page: 689 year: 2018 end-page: 699 ident: b9 article-title: SPIDER: Near-optimal non-convex optimization via stochastic path-integrated differential estimator publication-title: Advances in neural information processing systems – reference: Reddi, S. J., Kale, S., & Kumar, S. (2018). On the convergence of adam and beyond. In – start-page: 1574 year: 2014 end-page: 1582 ident: b31 article-title: Stochastic proximal gradient descent with acceleration techniques publication-title: Advances in neural information processing systems – volume: 39 start-page: S47 year: 2017 end-page: S65 ident: b46 article-title: Local improvement results for anderson acceleration with inaccurate function evaluations publication-title: SIAM Journal on Scientific Computing – volume: 53 start-page: 805 year: 2015 end-page: 819 ident: b47 article-title: Convergence analysis for anderson acceleration publication-title: SIAM Journal on Numerical Analysis – start-page: 315 year: 2013 end-page: 323 ident: b14 article-title: Accelerating stochastic gradient descent using predictive variance reduction publication-title: Advances in neural information processing systems – reference: Liu, Y., Shang, F., & Jiao, L. (2019). Accelerated incremental gradient descent using momentum acceleration with scaling factor. In – volume: 22 start-page: 674 year: 2012 end-page: 701 ident: b8 article-title: Randomized smoothing for stochastic optimization publication-title: SIAM Journal on Optimization – start-page: 1646 year: 2014 end-page: 1654 ident: b7 article-title: SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives publication-title: Advances in neural information processing systems – volume: 30 start-page: 9208 year: 2021 end-page: 9219 ident: b53 article-title: Ds-ui: Dual-supervised mixture of gaussian mixture models for uncertainty inference in image recognition publication-title: IEEE Transactions on Image Processing – year: 2020 ident: b17 article-title: First-order and stochastic optimization methods for machine learning – volume: 37 start-page: 48 year: 2013 end-page: 51 ident: b2 article-title: Dreaming of mathematical neuroscience for half a century publication-title: Neural Networks – volume: 30 start-page: 838 year: 1992 end-page: 855 ident: b33 article-title: Acceleration of stochastic approximation by averaging publication-title: SIAM Journal on Control and Optimization – volume: 28 start-page: 1097 year: 1983 end-page: 1105 ident: b37 article-title: Stochastic approximation method with gradient averaging for unconstrained problems publication-title: IEEE Transactions on Automatic Control – volume: 32 start-page: 9633 year: 2019 end-page: 9643 ident: b11 article-title: Understanding the role of momentum in stochastic gradient methods publication-title: Advances in Neural Information Processing Systems – volume: 198 year: 2020 ident: b57 article-title: An accelerated stochastic variance-reduced method for machine learning problems publication-title: Knowledge-Based Systems – start-page: 451 year: 2020 end-page: 467 ident: b16 article-title: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop publication-title: Algorithmic learning theory – start-page: 1 year: 1987 ident: b32 publication-title: Introduction to optimization – volume: 60 start-page: 223 year: 2018 end-page: 311 ident: b3 article-title: Optimization methods for large-scale machine learning publication-title: SIAM Review – volume: 72 start-page: 1021 year: 2016 end-page: 1042 ident: b12 article-title: Anderson acceleration of the alternating projections method for computing the nearest correlation matrix publication-title: Numerical Algorithms – reference: Gidel, G., Berard, H., Vignoud, G., Vincent, P., & Lacoste-Julien, S. (2018). A variational inequality perspective on generative adversarial networks. In – start-page: 2663 year: 2012 end-page: 2671 ident: b35 article-title: A stochastic gradient method with an exponential convergence rate for finite training sets publication-title: Advances in neural information processing systems – volume: 196 start-page: 266 year: 2023 end-page: 297 ident: b55 article-title: Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization publication-title: Journal of Optimization Theory and Applications – start-page: 3935 year: 2021 end-page: 3971 ident: b41 article-title: Almost sure convergence rates for stochastic gradient descent and stochastic heavy ball publication-title: Conference on learning theory – reference: (pp. 15236–15245). – volume: 14 year: 2013 ident: b42 article-title: Stochastic dual coordinate ascent methods for regularized loss minimization publication-title: Journal of Machine Learning Research – reference: . – start-page: 1874 year: 2019 end-page: 1879 ident: b59 article-title: A stochastic variance reduced nesterov’s accelerated quasi-newton method publication-title: 2019 18th IEEE international conference on machine learning and applications (ICMLA) – volume: 20 start-page: 1 year: 2019 end-page: 49 ident: b30 article-title: New convergence aspects of stochastic gradient algorithms publication-title: Journal of Machine Learning Research – volume: 30 start-page: 3982 year: 2017 end-page: 3991 ident: b39 article-title: Nonlinear acceleration of stochastic algorithms publication-title: Advances in Neural Information Processing Systems – start-page: 2663 year: 2012 end-page: 2671 ident: b36 article-title: A stochastic gradient method with an exponential convergence _rate for finite training sets publication-title: Advances in neural information processing systems – reference: (pp. 3045–3051). – reference: Xu, Y., Yuan, Z., Yang, R., & Yang, T. (2019). On the convergence of (stochastic) gradient descent with extrapolation for non-convex minimization. In – volume: 558 start-page: 157 year: 2021 end-page: 173 ident: b58 article-title: Accelerating mini-batch sarah by step size rules publication-title: Information Sciences – year: 2015 ident: 10.1016/j.eswa.2023.122295_b15 article-title: ADAM: A method for stochastic optimization – volume: 72 start-page: 1021 year: 2016 ident: 10.1016/j.eswa.2023.122295_b12 article-title: Anderson acceleration of the alternating projections method for computing the nearest correlation matrix publication-title: Numerical Algorithms doi: 10.1007/s11075-015-0078-3 – ident: 10.1016/j.eswa.2023.122295_b20 doi: 10.24963/ijcai.2019/422 – volume: 24 start-page: 2057 year: 2014 ident: 10.1016/j.eswa.2023.122295_b52 article-title: A proximal stochastic gradient method with progressive variance reduction publication-title: SIAM Journal on Optimization doi: 10.1137/140961791 – volume: 30 start-page: 9208 year: 2021 ident: 10.1016/j.eswa.2023.122295_b53 article-title: Ds-ui: Dual-supervised mixture of gaussian mixture models for uncertainty inference in image recognition publication-title: IEEE Transactions on Image Processing doi: 10.1109/TIP.2021.3123555 – volume: 198 year: 2020 ident: 10.1016/j.eswa.2023.122295_b57 article-title: An accelerated stochastic variance-reduced method for machine learning problems publication-title: Knowledge-Based Systems doi: 10.1016/j.knosys.2020.105941 – volume: 558 start-page: 157 year: 2021 ident: 10.1016/j.eswa.2023.122295_b58 article-title: Accelerating mini-batch sarah by step size rules publication-title: Information Sciences doi: 10.1016/j.ins.2020.12.075 – volume: 179 start-page: 47 year: 2020 ident: 10.1016/j.eswa.2023.122295_b40 article-title: Regularized nonlinear acceleration publication-title: Mathematical Programming doi: 10.1007/s10107-018-1319-8 – volume: 22 start-page: 674 year: 2012 ident: 10.1016/j.eswa.2023.122295_b8 article-title: Randomized smoothing for stochastic optimization publication-title: SIAM Journal on Optimization doi: 10.1137/110831659 – ident: 10.1016/j.eswa.2023.122295_b10 – start-page: 689 year: 2018 ident: 10.1016/j.eswa.2023.122295_b9 article-title: SPIDER: Near-optimal non-convex optimization via stochastic path-integrated differential estimator – volume: 10 start-page: 685 year: 2022 ident: 10.1016/j.eswa.2023.122295_b13 article-title: Distributed momentum-based frank-wolfe algorithm for stochastic optimization publication-title: IEEE/CAA Journal of Automatica Sinica doi: 10.1109/JAS.2022.105923 – volume: 77 start-page: 653 year: 2020 ident: 10.1016/j.eswa.2023.122295_b21 article-title: Momentum and stochastic momentum for stochastic gradient, newton, proximal point and subspace descent methods publication-title: Computational Optimization and Applications doi: 10.1007/s10589-020-00220-z – year: 2020 ident: 10.1016/j.eswa.2023.122295_b17 – start-page: 3935 year: 2021 ident: 10.1016/j.eswa.2023.122295_b41 article-title: Almost sure convergence rates for stochastic gradient descent and stochastic heavy ball – start-page: 173 year: 2012 ident: 10.1016/j.eswa.2023.122295_b45 article-title: Stochastic optimization – start-page: 2613 year: 2017 ident: 10.1016/j.eswa.2023.122295_b28 article-title: SARAH: A novel method for machine learning problems using stochastic recursive gradient – start-page: 815 year: 2018 ident: 10.1016/j.eswa.2023.122295_b43 article-title: Asvrg: Accelerated proximal svrg – year: 2004 ident: 10.1016/j.eswa.2023.122295_b25 – volume: 32 start-page: 9633 year: 2019 ident: 10.1016/j.eswa.2023.122295_b11 article-title: Understanding the role of momentum in stochastic gradient methods publication-title: Advances in Neural Information Processing Systems – ident: 10.1016/j.eswa.2023.122295_b56 doi: 10.24963/ijcai.2019/556 – volume: 32 start-page: 2406 year: 2019 ident: 10.1016/j.eswa.2023.122295_b49 article-title: Spiderboost and momentum: Faster variance reduction algorithms publication-title: Advances in Neural Information Processing Systems – volume: 30 start-page: 838 year: 1992 ident: 10.1016/j.eswa.2023.122295_b33 article-title: Acceleration of stochastic approximation by averaging publication-title: SIAM Journal on Control and Optimization doi: 10.1137/0330046 – ident: 10.1016/j.eswa.2023.122295_b6 – volume: 18 start-page: 8194 year: 2017 ident: 10.1016/j.eswa.2023.122295_b1 article-title: Katyusha: The first direct acceleration of stochastic gradient methods publication-title: Journal of Machine Learning Research – ident: 10.1016/j.eswa.2023.122295_b22 – start-page: 2663 year: 2012 ident: 10.1016/j.eswa.2023.122295_b36 article-title: A stochastic gradient method with an exponential convergence _rate for finite training sets – start-page: 1 year: 1987 ident: 10.1016/j.eswa.2023.122295_b32 – volume: 15 start-page: 738 year: 2022 ident: 10.1016/j.eswa.2023.122295_b50 article-title: Scheduled restart momentum for accelerated stochastic gradient descent publication-title: SIAM Journal on Imaging Sciences doi: 10.1137/21M1453311 – volume: 2 start-page: 1 year: 2011 ident: 10.1016/j.eswa.2023.122295_b4 article-title: Libsvm: a library for support vector machines publication-title: ACM Transactions on Intelligent Systems and Technology (TIST) doi: 10.1145/1961189.1961199 – volume: 20 start-page: 1 year: 2019 ident: 10.1016/j.eswa.2023.122295_b30 article-title: New convergence aspects of stochastic gradient algorithms publication-title: Journal of Machine Learning Research – volume: 37 start-page: 102 year: 2020 ident: 10.1016/j.eswa.2023.122295_b54 article-title: Decentralized stochastic optimization and machine learning: A unified variance-reduction framework for robust performance and fast convergence publication-title: IEEE Signal Processing Magazine doi: 10.1109/MSP.2020.2974267 – start-page: 2663 year: 2012 ident: 10.1016/j.eswa.2023.122295_b35 article-title: A stochastic gradient method with an exponential convergence rate for finite training sets – year: 2003 ident: 10.1016/j.eswa.2023.122295_b44 – volume: 37 start-page: 48 year: 2013 ident: 10.1016/j.eswa.2023.122295_b2 article-title: Dreaming of mathematical neuroscience for half a century publication-title: Neural Networks doi: 10.1016/j.neunet.2012.09.014 – volume: 28 start-page: 1097 year: 1983 ident: 10.1016/j.eswa.2023.122295_b37 article-title: Stochastic approximation method with gradient averaging for unconstrained problems publication-title: IEEE Transactions on Automatic Control doi: 10.1109/TAC.1983.1103184 – volume: 30 start-page: 3982 year: 2017 ident: 10.1016/j.eswa.2023.122295_b39 article-title: Nonlinear acceleration of stochastic algorithms publication-title: Advances in Neural Information Processing Systems – start-page: 1574 year: 2014 ident: 10.1016/j.eswa.2023.122295_b31 article-title: Stochastic proximal gradient descent with acceleration techniques – ident: 10.1016/j.eswa.2023.122295_b60 – start-page: 1646 year: 2014 ident: 10.1016/j.eswa.2023.122295_b7 article-title: SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives – volume: 39 start-page: S47 year: 2017 ident: 10.1016/j.eswa.2023.122295_b46 article-title: Local improvement results for anderson acceleration with inaccurate function evaluations publication-title: SIAM Journal on Scientific Computing doi: 10.1137/16M1080677 – start-page: 451 year: 2020 ident: 10.1016/j.eswa.2023.122295_b16 article-title: Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop – volume: 82 start-page: 561 year: 2022 ident: 10.1016/j.eswa.2023.122295_b27 article-title: Finite-sum smooth optimization with SARAH publication-title: Computational Optimization and Applications doi: 10.1007/s10589-022-00375-x – volume: 60 start-page: 223 year: 2018 ident: 10.1016/j.eswa.2023.122295_b3 article-title: Optimization methods for large-scale machine learning publication-title: SIAM Review doi: 10.1137/16M1080173 – start-page: 3222 year: 2018 ident: 10.1016/j.eswa.2023.122295_b26 article-title: Iterate averaging as regularization for stochastic gradient descent – start-page: 1874 year: 2019 ident: 10.1016/j.eswa.2023.122295_b59 article-title: A stochastic variance reduced nesterov’s accelerated quasi-newton method – volume: 196 start-page: 266 year: 2023 ident: 10.1016/j.eswa.2023.122295_b55 article-title: Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization publication-title: Journal of Optimization Theory and Applications doi: 10.1007/s10957-022-02132-w – start-page: 10379 year: 2021 ident: 10.1016/j.eswa.2023.122295_b48 article-title: SMG: A shuffling gradient-based method with momentum – ident: 10.1016/j.eswa.2023.122295_b38 – volume: 29 start-page: 458 year: 2016 ident: 10.1016/j.eswa.2023.122295_b24 article-title: Stochastic gradient made stable: A manifold propagation approach for large-scale optimization publication-title: IEEE Transactions on Knowledge and Data Engineering doi: 10.1109/TKDE.2016.2604302 – start-page: 148 year: 2017 ident: 10.1016/j.eswa.2023.122295_b18 article-title: Less than a single pass: Stochastically controlled stochastic gradient – volume: 53 start-page: 805 year: 2015 ident: 10.1016/j.eswa.2023.122295_b47 article-title: Convergence analysis for anderson acceleration publication-title: SIAM Journal on Numerical Analysis doi: 10.1137/130919398 – ident: 10.1016/j.eswa.2023.122295_b19 doi: 10.1145/2623330.2623612 – ident: 10.1016/j.eswa.2023.122295_b34 – year: 2017 ident: 10.1016/j.eswa.2023.122295_b29 – volume: 14 year: 2013 ident: 10.1016/j.eswa.2023.122295_b42 article-title: Stochastic dual coordinate ascent methods for regularized loss minimization publication-title: Journal of Machine Learning Research – start-page: 2947 year: 2020 ident: 10.1016/j.eswa.2023.122295_b23 article-title: On linear stochastic approximation: Fine-grained polyak-ruppert and non-asymptotic concentration – volume: 34 year: 2021 ident: 10.1016/j.eswa.2023.122295_b51 article-title: Stochastic anderson mixing for nonconvex stochastic optimization publication-title: Advances in Neural Information Processing Systems – volume: 19 start-page: 962 year: 2018 ident: 10.1016/j.eswa.2023.122295_b5 article-title: Importance sampling for minibatches publication-title: Journal of Machine Learning Research – start-page: 315 year: 2013 ident: 10.1016/j.eswa.2023.122295_b14 article-title: Accelerating stochastic gradient descent using predictive variance reduction |
| SSID | ssj0017007 |
| Score | 2.4625134 |
| Snippet | As a simple but effective way, the momentum method has been widely adopted in stochastic optimization algorithms for large-scale machine learning problems and... |
| SourceID | crossref elsevier |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 122295 |
| SubjectTerms | Adaptive step size Machine learning Momentum Stochastic optimization Variance reduction |
| Title | SARAH-M: A fast stochastic recursive gradient descent algorithm via momentum |
| URI | https://dx.doi.org/10.1016/j.eswa.2023.122295 |
| Volume | 238 |
| WOSCitedRecordID | wos001105380600001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 issn: 0957-4174 databaseCode: AIEXJ dateStart: 19950101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.sciencedirect.com omitProxy: false ssIdentifier: ssj0017007 providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT9wwELZa6KGXvisobeVDb1HQxnk45hZVVIBaVLVUWvUSOX6wu4KAkrDw8xnHj6WIonLoJYkix4n82ZPJZOb7EPrEJPgQjPJYyXSUMKMxZ42KdVNKwRpCaFOOYhP08LCcTtl3JyXXj3ICtG3Lqyt2_l-hhnMAtimdfQDcoVM4AccAOmwBdtj-E_A_qx_VXvzNlpxr3g8R-HdixnvL1ixMeGCpouNuTPYaImkJnSJ-cnzWzYfZabSc8-jUMDMMjqdhEfL1VDc48mdfFnfjB3iwIC4G_Xt2wd2L0cUVSGYSq2xlZQgQ0jhLrIaOt5XEUrE4a5eMYuB3GmIbE1hsq_7SsDuRdHvV-E_W61tvo5Aj6NPPFrXpozZ91LaPx2id0JyBDVuv9nenB-GvEZ3Y8nj_5K5Iyubz3X6Sux2RG87F0Qv0zH0V4Mqi-RI9Uu0r9NwrbmBngF-jrw7cHVxhAy1eQYsDtNhDix20OECLAVrsoX2Dfn3ZPfq8Fzs9jFikk8kQM7DNutATSamSKpNZziSstVJyokiuwRNLs1QWsMZyXqgGFh7sU67gI5UXSSHTt2itPWvVBsJCG8edJY0uVSYIaXKWE6EY50wLXchNlPjBqYUjizeaJSf132HZRFG45txSpdzbOvdjXjtnzzpxNUyhe65796C7bKGnq7n9Hq0N3YX6gJ6I5TDvu49u_lwDmUp1bQ |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SARAH-M%3A+A+fast+stochastic+recursive+gradient+descent+algorithm+via+momentum&rft.jtitle=Expert+systems+with+applications&rft.au=Yang%2C+Zhuang&rft.date=2024-03-15&rft.issn=0957-4174&rft.volume=238&rft.spage=122295&rft_id=info:doi/10.1016%2Fj.eswa.2023.122295&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_eswa_2023_122295 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0957-4174&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0957-4174&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0957-4174&client=summon |