Scaling up stochastic gradient descent for non-convex optimisation
Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions and large datasets. We address the bottleneck problem arisin...
Gespeichert in:
| Veröffentlicht in: | Machine learning Jg. 111; H. 11; S. 4039 - 4079 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
New York
Springer US
01.11.2022
Springer Nature B.V |
| Schlagworte: | |
| ISSN: | 0885-6125, 1573-0565 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions and large datasets. We address the bottleneck problem arising when using both shared and distributed memory. Typically, the former is bounded by limited computation resources and bandwidth whereas the latter suffers from communication overheads. We propose a unified distributed and parallel implementation of SGD (named DPSGD) that relies on both asynchronous distribution and lock-free parallelism. By combining two strategies into a unified framework, DPSGD is able to strike a better trade-off between local computation and communication. The convergence properties of DPSGD are studied for non-convex problems such as those arising in statistical modelling and machine learning. Our theoretical analysis shows that DPSGD leads to speed-up with respect to the number of cores and number of workers while guaranteeing an asymptotic convergence rate of
O
(
1
/
T
)
given that the number of cores is bounded by
T
1
/
4
and the number of workers is bounded by
T
1
/
2
where
T
is the number of iterations. The potential gains that can be achieved by DPSGD are demonstrated empirically on a stochastic variational inference problem (Latent Dirichlet Allocation) and on a deep reinforcement learning (DRL) problem (advantage actor critic - A2C) resulting in two algorithms: DPSVI and HSA2C. Empirical results validate our theoretical findings. Comparative studies are conducted to show the performance of the proposed DPSGD against the state-of-the-art DRL algorithms. |
|---|---|
| AbstractList | Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions and large datasets. We address the bottleneck problem arising when using both shared and distributed memory. Typically, the former is bounded by limited computation resources and bandwidth whereas the latter suffers from communication overheads. We propose a unified distributed and parallel implementation of SGD (named DPSGD) that relies on both asynchronous distribution and lock-free parallelism. By combining two strategies into a unified framework, DPSGD is able to strike a better trade-off between local computation and communication. The convergence properties of DPSGD are studied for non-convex problems such as those arising in statistical modelling and machine learning. Our theoretical analysis shows that DPSGD leads to speed-up with respect to the number of cores and number of workers while guaranteeing an asymptotic convergence rate of
$$O(1/\sqrt{T})$$
O
(
1
/
T
)
given that the number of cores is bounded by
$$T^{1/4}$$
T
1
/
4
and the number of workers is bounded by
$$T^{1/2}$$
T
1
/
2
where
T
is the number of iterations. The potential gains that can be achieved by DPSGD are demonstrated empirically on a stochastic variational inference problem (Latent Dirichlet Allocation) and on a deep reinforcement learning (DRL) problem (advantage actor critic - A2C) resulting in two algorithms: DPSVI and HSA2C. Empirical results validate our theoretical findings. Comparative studies are conducted to show the performance of the proposed DPSGD against the state-of-the-art DRL algorithms. Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions and large datasets. We address the bottleneck problem arising when using both shared and distributed memory. Typically, the former is bounded by limited computation resources and bandwidth whereas the latter suffers from communication overheads. We propose a unified distributed and parallel implementation of SGD (named DPSGD) that relies on both asynchronous distribution and lock-free parallelism. By combining two strategies into a unified framework, DPSGD is able to strike a better trade-off between local computation and communication. The convergence properties of DPSGD are studied for non-convex problems such as those arising in statistical modelling and machine learning. Our theoretical analysis shows that DPSGD leads to speed-up with respect to the number of cores and number of workers while guaranteeing an asymptotic convergence rate of O ( 1 / T ) given that the number of cores is bounded by T 1 / 4 and the number of workers is bounded by T 1 / 2 where T is the number of iterations. The potential gains that can be achieved by DPSGD are demonstrated empirically on a stochastic variational inference problem (Latent Dirichlet Allocation) and on a deep reinforcement learning (DRL) problem (advantage actor critic - A2C) resulting in two algorithms: DPSVI and HSA2C. Empirical results validate our theoretical findings. Comparative studies are conducted to show the performance of the proposed DPSGD against the state-of-the-art DRL algorithms. Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions and large datasets. We address the bottleneck problem arising when using both shared and distributed memory. Typically, the former is bounded by limited computation resources and bandwidth whereas the latter suffers from communication overheads. We propose a unified distributed and parallel implementation of SGD (named DPSGD) that relies on both asynchronous distribution and lock-free parallelism. By combining two strategies into a unified framework, DPSGD is able to strike a better trade-off between local computation and communication. The convergence properties of DPSGD are studied for non-convex problems such as those arising in statistical modelling and machine learning. Our theoretical analysis shows that DPSGD leads to speed-up with respect to the number of cores and number of workers while guaranteeing an asymptotic convergence rate of O(1/T) given that the number of cores is bounded by T1/4 and the number of workers is bounded by T1/2 where T is the number of iterations. The potential gains that can be achieved by DPSGD are demonstrated empirically on a stochastic variational inference problem (Latent Dirichlet Allocation) and on a deep reinforcement learning (DRL) problem (advantage actor critic - A2C) resulting in two algorithms: DPSVI and HSA2C. Empirical results validate our theoretical findings. Comparative studies are conducted to show the performance of the proposed DPSGD against the state-of-the-art DRL algorithms. |
| Author | Alamri, Hamad Bouchachia, Abdelhamid Mohamad, Saad |
| Author_xml | – sequence: 1 givenname: Saad surname: Mohamad fullname: Mohamad, Saad organization: Department of Computing, Bournemouth University – sequence: 2 givenname: Hamad surname: Alamri fullname: Alamri, Hamad organization: WMG, Warwick University – sequence: 3 givenname: Abdelhamid orcidid: 0000-0002-1980-5517 surname: Bouchachia fullname: Bouchachia, Abdelhamid email: abouchachia@bournemouth.ac.uk organization: Department of Computing, Bournemouth University |
| BookMark | eNp9kEtLAzEUhYNUsK3-AVcDrqN5TB6z1OILCi7UdchkkprSJmOSiv57p44guOjqLO757rn3zMAkxGABOMfoEiMkrjJGTVNDRAhEnNQU0iMwxUxQiBhnEzBFUjLIMWEnYJbzGiFEuORTcPNs9MaHVbXrq1yiedO5eFOtku68DaXqbDZ7dTFVQyY0MXzYzyr2xW991sXHcAqOnd5ke_arc_B6d_uyeIDLp_vHxfUSGsppgZpb2nWGUGaQNFgL3cqaiUa4hluOkWsw6xiteU06alspuJPDuGa4bZ3jnM7Bxbi3T_F9Z3NR67hLYYhURNCGYkTY3kVGl0kx52Sd6pPf6vSlMFL7rtTYlRq6Uj9dKTpA8h9kfPl5riTtN4dROqJ5yAkrm_6uOkB9A4fNf64 |
| CitedBy_id | crossref_primary_10_1007_s10994_025_06737_w crossref_primary_10_1177_10775463241310018 crossref_primary_10_2478_ijssis_2024_0018 |
| Cites_doi | 10.1137/070704277 10.1613/jair.3912 10.1109/TCOMM.2020.3026398 10.1145/1394608.1382172 10.1038/nature14236 10.1137/120880811 10.1023/A:1007665907178 10.1214/aoms/1177729586 10.1038/nature16961 10.1137/16M1080173 10.1080/01621459.2017.1285773 10.1109/TAC.1986.1104412 10.1609/aaai.v33i01.33015693 10.1145/2783258.2783323 10.1609/aaai.v31i1.10940 10.1109/ICC47138.2019.9123209 10.1609/aaai.v31i1.10651 10.1007/978-3-319-92040-5_19 10.1109/TNN.1998.712192 10.24963/ijcai.2018/447 10.1609/aaai.v30i1.10305 10.1007/978-3-030-16841-4_31 10.7551/mitpress/7503.003.0006 10.1007/978-3-319-75931-9_1 10.1145/2740908.2741998 10.1561/9781601981851 10.1109/CDC.2012.6426626 10.1145/2640087.2644155 10.1007/978-1-4615-3618-5_2 10.1145/2623330.2623612 10.1007/978-3-030-64583-0_5 10.1007/978-3-7908-2604-3_16 10.1609/aaai.v31i1.10921 |
| ContentType | Journal Article |
| Copyright | The Author(s) 2022 The Author(s) 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| Copyright_xml | – notice: The Author(s) 2022 – notice: The Author(s) 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| DBID | C6C AAYXX CITATION 3V. 7SC 7XB 88I 8AL 8AO 8FD 8FE 8FG 8FK ABUWG AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO GNUQQ HCIFZ JQ2 K7- L7M L~C L~D M0N M2P P5Z P62 PHGZM PHGZT PKEHL PQEST PQGLB PQQKQ PQUKI PRINS Q9U |
| DOI | 10.1007/s10994-022-06243-3 |
| DatabaseName | Springer Nature OA Free Journals CrossRef ProQuest Central (Corporate) Computer and Information Systems Abstracts ProQuest Central (purchase pre-March 2016) Science Database (Alumni Edition) Computing Database (Alumni Edition) ProQuest Pharma Collection Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Central (Alumni) ProQuest Central UK/Ireland Advanced Technologies & Computer Science Collection ProQuest Central Essentials AUTh Library subscriptions: ProQuest Central Technology collection ProQuest One Community College ProQuest Central ProQuest Central Student SciTech Premium Collection ProQuest Computer Science Collection Computer Science Database Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Computing Database Science Database Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition ProQuest Central China ProQuest Central Basic |
| DatabaseTitle | CrossRef Computer Science Database ProQuest Central Student Technology Collection Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Pharma Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Central Korea ProQuest Central (New) Advanced Technologies Database with Aerospace Advanced Technologies & Aerospace Collection ProQuest Computing ProQuest Science Journals (Alumni Edition) ProQuest Central Basic ProQuest Science Journals ProQuest Computing (Alumni Edition) ProQuest One Academic Eastern Edition ProQuest Technology Collection ProQuest SciTech Collection Computer and Information Systems Abstracts Professional Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition ProQuest One Academic ProQuest Central (Alumni) ProQuest One Academic (New) |
| DatabaseTitleList | CrossRef Computer Science Database |
| Database_xml | – sequence: 1 dbid: BENPR name: ProQuest Central url: https://www.proquest.com/central sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1573-0565 |
| EndPage | 4079 |
| ExternalDocumentID | 10_1007_s10994_022_06243_3 |
| GrantInformation_xml | – fundername: Horizon 2020 Framework Programme grantid: 687691 funderid: http://dx.doi.org/10.13039/100010661 |
| GroupedDBID | -4Z -59 -5G -BR -EM -Y2 -~C -~X .4S .86 .DC .VR 06D 0R~ 0VY 199 1N0 1SB 2.D 203 28- 29M 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 30V 3V. 4.4 406 408 409 40D 40E 5GY 5QI 5VS 67Z 6NX 6TJ 78A 88I 8AO 8FE 8FG 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAEWM AAHNG AAIAL AAJBT AAJKR AANZL AAOBN AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDZT ABECU ABFTV ABHLI ABHQN ABIVO ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABUWG ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACGOD ACHSB ACHXU ACKNC ACMDZ ACMLO ACNCT ACOKC ACOMO ACPIV ACZOJ ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADMLS ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFKRA AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHKAY AHSBF AHYZX AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARAPS ARCSS ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN AZQEC B-. BA0 BBWZM BDATZ BENPR BGLVJ BGNMA BPHCQ BSONS C6C CAG CCPQU COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 DWQXO EBLON EBS EIOEI EJD ESBYG F5P FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNUQQ GNWQR GQ6 GQ7 GQ8 GXS H13 HCIFZ HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF HZ~ I-F I09 IHE IJ- IKXTQ ITG ITH ITM IWAJR IXC IZIGR IZQ I~X I~Y I~Z J-C J0Z JBSCW JCJTX JZLTJ K6V K7- KDC KOV KOW LAK LLZTM M0N M2P M4Y MA- MVM N2Q N9A NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM OVD P19 P2P P62 P9O PF- PQQKQ PROAC PT4 Q2X QF4 QM1 QN7 QO4 QOK QOS R4E R89 R9I RHV RIG RNI RNS ROL RPX RSV RZC RZE S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SCO SDH SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TAE TEORI TN5 TSG TSK TSV TUC TUS U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW VXZ W23 W48 WH7 WIP WK8 XJT YLTOR Z45 Z7R Z7S Z7U Z7V Z7W Z7X Z7Y Z7Z Z81 Z83 Z85 Z86 Z87 Z88 Z8M Z8N Z8O Z8P Z8Q Z8R Z8S Z8T Z8U Z8W Z8Z Z91 Z92 ZMTXR AAPKM AAYXX ABBRH ABDBE ABFSG ABRTQ ACSTC ADHKG ADKFA AEZWR AFDZB AFFHD AFHIU AFOHR AGQPQ AHPBZ AHWEU AIXLP AMVHM ATHPR AYFIA CITATION PHGZM PHGZT PQGLB 7SC 7XB 8AL 8FD 8FK JQ2 L7M L~C L~D PKEHL PQEST PQUKI PRINS Q9U |
| ID | FETCH-LOGICAL-c363t-a6e3ddc235c08c1a7ab845797f96e610f915d534642d3eb876f8579451bbff663 |
| IEDL.DBID | M2P |
| ISICitedReferencesCount | 3 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000864989100002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0885-6125 |
| IngestDate | Wed Nov 05 02:10:09 EST 2025 Sat Nov 29 01:43:29 EST 2025 Tue Nov 18 21:13:29 EST 2025 Fri Feb 21 02:44:02 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 11 |
| Keywords | Deep reinforcement learning Distributed and parallel computation Stochastic gradient descent Variational inference Large scale non-convex optimisation |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c363t-a6e3ddc235c08c1a7ab845797f96e610f915d534642d3eb876f8579451bbff663 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0002-1980-5517 |
| OpenAccessLink | https://link.springer.com/10.1007/s10994-022-06243-3 |
| PQID | 2739310256 |
| PQPubID | 54194 |
| PageCount | 41 |
| ParticipantIDs | proquest_journals_2739310256 crossref_primary_10_1007_s10994_022_06243_3 crossref_citationtrail_10_1007_s10994_022_06243_3 springer_journals_10_1007_s10994_022_06243_3 |
| PublicationCentury | 2000 |
| PublicationDate | 20221100 2022-11-00 20221101 |
| PublicationDateYYYYMMDD | 2022-11-01 |
| PublicationDate_xml | – month: 11 year: 2022 text: 20221100 |
| PublicationDecade | 2020 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York – name: Dordrecht |
| PublicationTitle | Machine learning |
| PublicationTitleAbbrev | Mach Learn |
| PublicationYear | 2022 |
| Publisher | Springer US Springer Nature B.V |
| Publisher_xml | – name: Springer US – name: Springer Nature B.V |
| References | Bellemare, Naddaf, Veness, Bowling (CR8) 2013; 47 Abadi, Barham, Chen, Chen, Davis, Dean, Devin, Ghemawat, Irving, Isard (CR1) 2016; 16 Mnih, Kavukcuoglu, Silver, Rusu, Veness, Bellemare, Graves, Riedmiller, Fidjeland, Ostrovski (CR45) 2015; 518 Jordan, Ghahramani, Jaakkola, Saul (CR33) 1999; 37 CR39 CR38 CR37 CR36 CR35 CR34 Tsitsiklis, Bertsekas, Athans (CR67) 1986; 31 CR77 CR32 CR76 CR31 CR75 CR74 CR73 CR71 CR70 Yu, Yang, Zhu (CR72) 2019; 33 Silver, Huang, Maddison, Guez, Sifre, Van Den Driessche, Schrittwieser, Antonoglou, Panneershelvam, Lanctot (CR60) 2016; 529 Elgabli, Park, Bedi, Issaid, Bennis, Aggarwal (CR21) 2020; 69 CR2 CR4 CR3 CR6 CR5 CR7 Dekel, Gilad-Bachrach, Shamir, Xiao (CR19) 2012; 13 CR48 CR47 CR46 CR44 CR43 Nemirovski, Juditsky, Lan, Shapiro (CR49) 2009; 19 Blei, Ng, Jordan (CR10) 2003; 3 CR42 CR41 CR40 Ipek, Mutlu, Martínez, Caruana (CR30) 2008; 36 Blei, Kucukelbir, McAuliffe (CR9) 2017; 112 CR18 CR16 CR15 Robbins, Monro (CR56) 1951; 22 CR59 CR14 CR58 CR13 CR57 CR11 CR55 CR54 CR53 CR52 Hoffman, Blei, Wang, Paisley (CR25) 2013; 14 CR51 CR50 Bottou, Curtis, Nocedal (CR12) 2018; 60 De Sa, Zhang, Olukotun, Ré (CR17) 2015; 28 CR29 CR28 CR27 CR26 CR69 CR68 CR23 CR22 CR66 CR20 CR64 CR63 CR62 CR61 Sutton, McAllester, Singh, Mansour (CR65) 2000; 12 Ghadimi, Lan (CR24) 2013; 23 6243_CR73 6243_CR71 C De Sa (6243_CR17) 2015; 28 J Tsitsiklis (6243_CR67) 1986; 31 6243_CR70 6243_CR7 6243_CR4 6243_CR3 6243_CR6 6243_CR5 RS Sutton (6243_CR65) 2000; 12 MI Jordan (6243_CR33) 1999; 37 6243_CR39 6243_CR38 6243_CR37 6243_CR36 6243_CR35 6243_CR34 6243_CR77 6243_CR32 D Silver (6243_CR60) 2016; 529 6243_CR76 6243_CR31 6243_CR75 6243_CR74 6243_CR40 V Mnih (6243_CR45) 2015; 518 6243_CR2 6243_CR48 L Bottou (6243_CR12) 2018; 60 6243_CR47 6243_CR46 M Abadi (6243_CR1) 2016; 16 6243_CR44 6243_CR43 E Ipek (6243_CR30) 2008; 36 6243_CR42 6243_CR41 6243_CR51 6243_CR50 A Nemirovski (6243_CR49) 2009; 19 6243_CR18 H Robbins (6243_CR56) 1951; 22 6243_CR16 6243_CR15 6243_CR59 6243_CR14 6243_CR58 6243_CR13 6243_CR57 DM Blei (6243_CR10) 2003; 3 6243_CR11 6243_CR55 6243_CR54 6243_CR53 6243_CR52 6243_CR62 DM Blei (6243_CR9) 2017; 112 6243_CR61 MG Bellemare (6243_CR8) 2013; 47 O Dekel (6243_CR19) 2012; 13 MD Hoffman (6243_CR25) 2013; 14 A Elgabli (6243_CR21) 2020; 69 6243_CR29 6243_CR28 6243_CR27 H Yu (6243_CR72) 2019; 33 S Ghadimi (6243_CR24) 2013; 23 6243_CR26 6243_CR69 6243_CR68 6243_CR23 6243_CR22 6243_CR66 6243_CR20 6243_CR64 6243_CR63 |
| References_xml | – ident: CR70 – ident: CR22 – volume: 19 start-page: 1574 issue: 4 year: 2009 end-page: 1609 ident: CR49 article-title: Robust stochastic approximation approach to stochastic programming publication-title: SIAM Journal on Optimization doi: 10.1137/070704277 – ident: CR68 – ident: CR74 – ident: CR4 – volume: 47 start-page: 253 year: 2013 end-page: 279 ident: CR8 article-title: The arcade learning environment: An evaluation platform for general agents publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.3912 – ident: CR39 – ident: CR16 – ident: CR51 – ident: CR35 – ident: CR29 – ident: CR54 – ident: CR61 – ident: CR77 – ident: CR58 – volume: 69 start-page: 164 year: 2020 end-page: 181 ident: CR21 article-title: Q-GADMM: Quantized group ADMM for communication efficient decentralized machine learning publication-title: IEEE Transactions on Communications doi: 10.1109/TCOMM.2020.3026398 – ident: CR42 – volume: 36 start-page: 39 year: 2008 end-page: 50 ident: CR30 article-title: Self-optimizing memory controllers: A reinforcement learning approach publication-title: ACM SIGARCH Computer Architecture News doi: 10.1145/1394608.1382172 – volume: 518 start-page: 529 issue: 7540 year: 2015 ident: CR45 article-title: Human-level control through deep reinforcement learning publication-title: Nature doi: 10.1038/nature14236 – ident: CR46 – ident: CR71 – volume: 23 start-page: 2341 year: 2013 end-page: 2368 ident: CR24 article-title: Stochastic first-and zeroth-order methods for nonconvex stochastic programming publication-title: SIAM Journal on Optimization doi: 10.1137/120880811 – volume: 37 start-page: 183 year: 1999 end-page: 233 ident: CR33 article-title: An introduction to variational methods for graphical models publication-title: Machine Learning doi: 10.1023/A:1007665907178 – volume: 22 start-page: 400 year: 1951 end-page: 407 ident: CR56 article-title: A stochastic approximation method publication-title: The Annals of Mathematical Statistics doi: 10.1214/aoms/1177729586 – ident: CR75 – ident: CR15 – ident: CR50 – volume: 529 start-page: 484 issue: 7587 year: 2016 end-page: 489 ident: CR60 article-title: Mastering the game of go with deep neural networks and tree search publication-title: Nature doi: 10.1038/nature16961 – ident: CR11 – ident: CR57 – ident: CR32 – ident: CR36 – ident: CR5 – ident: CR64 – ident: CR26 – ident: CR18 – ident: CR43 – ident: CR66 – ident: CR47 – ident: CR14 – ident: CR2 – ident: CR37 – ident: CR53 – volume: 16 start-page: 265 year: 2016 end-page: 283 ident: CR1 article-title: Tensorflow: A system for large-scale machine learning publication-title: OSDI – ident: CR6 – ident: CR40 – ident: CR63 – volume: 60 start-page: 223 issue: 2 year: 2018 end-page: 311 ident: CR12 article-title: Optimization methods for large-scale machine learning publication-title: SIAM Review doi: 10.1137/16M1080173 – volume: 112 start-page: 859 year: 2017 end-page: 877 ident: CR9 article-title: Variational inference: A review for statisticians publication-title: Journal of the American Statistical Association doi: 10.1080/01621459.2017.1285773 – volume: 28 start-page: 2656 year: 2015 ident: CR17 article-title: Taming the wild: A unified analysis of hogwild!-style algorithms publication-title: Advances in Neural Information Processing Systems – ident: CR27 – ident: CR23 – volume: 14 start-page: 1303 year: 2013 end-page: 1347 ident: CR25 article-title: Stochastic variational inference publication-title: Journal of Machine Learning Research – ident: CR69 – volume: 31 start-page: 803 issue: 9 year: 1986 end-page: 812 ident: CR67 article-title: Distributed asynchronous deterministic and stochastic gradient optimization algorithms publication-title: IEEE Transactions on Automatic Control doi: 10.1109/TAC.1986.1104412 – ident: CR44 – ident: CR48 – ident: CR73 – volume: 3 start-page: 993 year: 2003 end-page: 1022 ident: CR10 article-title: Latent dirichlet allocation publication-title: Journal of Machine Learning research – ident: CR3 – ident: CR38 – ident: CR52 – ident: CR31 – ident: CR13 – volume: 13 start-page: 165 year: 2012 end-page: 202 ident: CR19 article-title: Optimal distributed online prediction using mini-batches publication-title: Journal of Machine Learning Research – ident: CR34 – ident: CR55 – ident: CR7 – ident: CR59 – ident: CR76 – ident: CR28 – ident: CR41 – ident: CR62 – volume: 12 start-page: 1057 year: 2000 end-page: 1063 ident: CR65 article-title: Policy gradient methods for reinforcement learning with function approximation publication-title: Advances in Neural Information Processing Systems – ident: CR20 – volume: 33 start-page: 5693 year: 2019 end-page: 5700 ident: CR72 article-title: Parallel restarted SGD with faster convergence and less communication: Demystifying why model averaging works for deep learning publication-title: Proceedings of the AAAI Conference on Artificial Intelligence doi: 10.1609/aaai.v33i01.33015693 – ident: 6243_CR15 – ident: 6243_CR40 – ident: 6243_CR71 doi: 10.1145/2783258.2783323 – ident: 6243_CR28 doi: 10.1609/aaai.v31i1.10940 – ident: 6243_CR69 doi: 10.1109/ICC47138.2019.9123209 – volume: 518 start-page: 529 issue: 7540 year: 2015 ident: 6243_CR45 publication-title: Nature doi: 10.1038/nature14236 – ident: 6243_CR23 doi: 10.1609/aaai.v31i1.10651 – volume: 12 start-page: 1057 year: 2000 ident: 6243_CR65 publication-title: Advances in Neural Information Processing Systems – ident: 6243_CR3 doi: 10.1007/978-3-319-92040-5_19 – ident: 6243_CR44 – ident: 6243_CR7 – ident: 6243_CR50 – ident: 6243_CR63 doi: 10.1109/TNN.1998.712192 – ident: 6243_CR54 – ident: 6243_CR47 – ident: 6243_CR73 – ident: 6243_CR77 – volume: 16 start-page: 265 year: 2016 ident: 6243_CR1 publication-title: OSDI – volume: 33 start-page: 5693 year: 2019 ident: 6243_CR72 publication-title: Proceedings of the AAAI Conference on Artificial Intelligence doi: 10.1609/aaai.v33i01.33015693 – ident: 6243_CR76 doi: 10.24963/ijcai.2018/447 – ident: 6243_CR18 – ident: 6243_CR35 – volume: 37 start-page: 183 year: 1999 ident: 6243_CR33 publication-title: Machine Learning doi: 10.1023/A:1007665907178 – ident: 6243_CR31 – volume: 22 start-page: 400 year: 1951 ident: 6243_CR56 publication-title: The Annals of Mathematical Statistics doi: 10.1214/aoms/1177729586 – ident: 6243_CR74 doi: 10.1609/aaai.v30i1.10305 – ident: 6243_CR62 – ident: 6243_CR58 – volume: 19 start-page: 1574 issue: 4 year: 2009 ident: 6243_CR49 publication-title: SIAM Journal on Optimization doi: 10.1137/070704277 – ident: 6243_CR41 – ident: 6243_CR46 doi: 10.1007/978-3-030-16841-4_31 – ident: 6243_CR6 – ident: 6243_CR20 – ident: 6243_CR51 – ident: 6243_CR27 – volume: 112 start-page: 859 year: 2017 ident: 6243_CR9 publication-title: Journal of the American Statistical Association doi: 10.1080/01621459.2017.1285773 – ident: 6243_CR48 – volume: 31 start-page: 803 issue: 9 year: 1986 ident: 6243_CR67 publication-title: IEEE Transactions on Automatic Control doi: 10.1109/TAC.1986.1104412 – ident: 6243_CR2 doi: 10.7551/mitpress/7503.003.0006 – ident: 6243_CR34 – ident: 6243_CR55 – ident: 6243_CR13 – ident: 6243_CR4 doi: 10.1007/978-3-319-75931-9_1 – volume: 28 start-page: 2656 year: 2015 ident: 6243_CR17 publication-title: Advances in Neural Information Processing Systems – ident: 6243_CR36 – ident: 6243_CR61 – ident: 6243_CR59 – ident: 6243_CR42 – volume: 23 start-page: 2341 year: 2013 ident: 6243_CR24 publication-title: SIAM Journal on Optimization doi: 10.1137/120880811 – ident: 6243_CR26 – ident: 6243_CR66 doi: 10.1145/2740908.2741998 – ident: 6243_CR68 doi: 10.1561/9781601981851 – ident: 6243_CR5 doi: 10.1109/CDC.2012.6426626 – ident: 6243_CR52 – ident: 6243_CR37 doi: 10.1145/2640087.2644155 – ident: 6243_CR70 doi: 10.1007/978-1-4615-3618-5_2 – ident: 6243_CR38 doi: 10.1145/2623330.2623612 – ident: 6243_CR32 doi: 10.1007/978-3-030-64583-0_5 – ident: 6243_CR16 – ident: 6243_CR11 doi: 10.1007/978-3-7908-2604-3_16 – ident: 6243_CR14 – ident: 6243_CR39 – ident: 6243_CR64 – volume: 36 start-page: 39 year: 2008 ident: 6243_CR30 publication-title: ACM SIGARCH Computer Architecture News doi: 10.1145/1394608.1382172 – ident: 6243_CR43 – volume: 47 start-page: 253 year: 2013 ident: 6243_CR8 publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.3912 – volume: 60 start-page: 223 issue: 2 year: 2018 ident: 6243_CR12 publication-title: SIAM Review doi: 10.1137/16M1080173 – volume: 529 start-page: 484 issue: 7587 year: 2016 ident: 6243_CR60 publication-title: Nature doi: 10.1038/nature16961 – ident: 6243_CR22 – volume: 14 start-page: 1303 year: 2013 ident: 6243_CR25 publication-title: Journal of Machine Learning Research – ident: 6243_CR75 doi: 10.1609/aaai.v31i1.10921 – volume: 69 start-page: 164 year: 2020 ident: 6243_CR21 publication-title: IEEE Transactions on Communications doi: 10.1109/TCOMM.2020.3026398 – ident: 6243_CR53 – volume: 3 start-page: 993 year: 2003 ident: 6243_CR10 publication-title: Journal of Machine Learning research – ident: 6243_CR29 doi: 10.1609/aaai.v31i1.10940 – ident: 6243_CR57 – volume: 13 start-page: 165 year: 2012 ident: 6243_CR19 publication-title: Journal of Machine Learning Research |
| SSID | ssj0002686 |
| Score | 2.4015048 |
| Snippet | Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss... |
| SourceID | proquest crossref springer |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 4039 |
| SubjectTerms | Algorithms Artificial Intelligence Comparative studies Computation Computer Science Control Convergence Deep learning Dirichlet problem Distributed memory Empirical analysis Iterative methods Machine Learning Mechatronics Natural Language Processing (NLP) Optimization Robotics Scaling up Simulation and Modeling Special Issue: Foundations of Data Science Statistical models |
| SummonAdditionalLinks | – databaseName: Springer Standard Collection dbid: RSV link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT8MwDLZgcODCeIrBQDlwg0ht0qTNERATJ4Q0QLtVbZICEmzTHoifj5OlGyBAgnMSt7Jj57MSfwY4ZrFJSmUSGhlhaaLQFZXFvVyVnGVSR6oUxjebSK-vs15P3YSisHH92r2-kvSR-kOxm6exxeQpkizhlC_DinBsMy5H797P4y-Tvr8juo-g7vwOpTLfy_h8HC0w5pdrUX_adJr_-88NWA_okpzNtsMmLNn-FjTrzg0kOPI2nHfRNCiUTIcE0Z9-LBxdM3kY-QdgE2JmJE8EES3pD_rUv01_IwOMLy_h_c8O3HUuby-uaOimQDWXfEILabkxmnGho0zHRVqUWSJSlVZKWgRRlYqFETzBhMRwW2KUrDIcTkRcllWFwGQXGvhFuwckldoUKCayGrMpmSlc7Iq3VIToQrKsBXGt1FwHqnHX8eI5X5AkOyXlqKTcKynnLTiZrxnOiDZ-nd2ubZUHpxvnzLH7xQ7EteC0ts1i-Gdp-3-bfgBrzJnXVyS2oTEZTe0hrOrXydN4dOQ34zv7OtYR priority: 102 providerName: Springer Nature |
| Title | Scaling up stochastic gradient descent for non-convex optimisation |
| URI | https://link.springer.com/article/10.1007/s10994-022-06243-3 https://www.proquest.com/docview/2739310256 |
| Volume | 111 |
| WOSCitedRecordID | wos000864989100002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAVX databaseName: Springer Standard Collection customDbUrl: eissn: 1573-0565 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002686 issn: 0885-6125 databaseCode: RSV dateStart: 19970101 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1JT8QgFH5xO3hxN47LhIM3JbZQaDkZNRoT42TiFuOlaYGqic6Msxh_vg-GOtFEL164UCjhbR_wFoBdFpukVCahkRGWJgpFUVnk5arkLJM6UqUwvthE2mpl9_eqHS7cBsGtstaJXlGbrnZ35AfMpW6LnYU-7L1RVzXKva6GEhrTMIvIJnYuXZes_aWJmfSVHlGQBHWWPATNhNA5nxQXj2KRZAmn_LthmqDNHw-k3u6cLf53xUuwEBAnORqzyDJM2c4KLNbVHEgQ7lU4vkZy4RLIqEcQEeqnwqVwJo997xQ2JGac-IkgyiWdbod6f_UP0kWd8xp8gtbg9uz05uSchgoLVHPJh7SQlhujGRc6ynRcpEWZJSJVaaWkRWBVqVgYwRM8pBhuS9ScVYbdiYjLsqoQrKzDDP7RbgBJpTYFThNZjScsmSkc7AK6VISIQ7KsAXG9vbkO6cddFYyXfJI42ZEkR5LkniQ5b8De15jeOPnGn19v13TIgyAO8gkRGrBfU3LS_ftsm3_PtgXzzDGPj0rchplhf2R3YE6_D58H_SbMHp-22ldNmL5IadMzJbZt8YDt1fXdJ_oh5IY |
| linkProvider | ProQuest |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V3JTsMwEB1BQYILO6KsPsAJLBI7ceIDQqwCFSokQOIWEtsBJGhLW7af4hsZuwkVSHDjwNnxRM68GT_HswCsMl8HmdQB9XRoaCDRFKVBLOcZZ7FQnsxC7ZpNRPV6fHUlzwbgvcyFsWGVpU90jlo3lf1Hvsls6Tbf7tDbrUdqu0bZ29WyhUYPFjXz9oJHts7W8T7qd42xw4OLvSNadBWgigvepakwXGvFeKi8WPlplGZxEEYyyqUwSCZy6Yc65AESc81Nht4ij3E4CP0sy3PcoFHuIAwFPBbWomoR_fT8TLjOkmi4IbXMoUjSKVL1XBFePPp5ggWc8q8bYZ_dfruQdfvc4fh_-0ITMFYwarLTM4FJGDCNKRgvu1WQwnlNw-45whGXTJ5aBBmvuk1tiWpy03ZBb12ie4WtCLJ40mg2qIvHfyVN9KkPRczTDFz-yUpmoYJvNHNAIqF0imI8o_AEKWKJk23CmvSQUQkWV8Ev1Zmoory67fJxn_QLQ1sIJAiBxEEg4VVY_5zT6hUX-fXpxVLvSeFoOklf6VXYKJHTH_5Z2vzv0lZg5Oji9CQ5Oa7XFmCUWeC6DMxFqHTbT2YJhtVz967TXnYmQOD6rxH1AbbMO7A |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1LT9wwEB4tC0JcyqOt2BaoD3CiFomdOPEBVcCyAoFWKx4StzSxnYIEu9t9tPDX-HUdex1WIJUbB86JJ3H8zczneB4AmyzUUSF1RAMdGxpJVEVpEMtlwVkqVCCLWLtmE0m7nV5dyU4NHqtcGBtWWdlEZ6h1T9l_5DvMlm4LrYfeKX1YRKfZ-tH_TW0HKXvSWrXTmEDkxDz8xe3bcPe4iWu9xVjr8OLgiPoOA1RxwUc0F4ZrrRiPVZCqME_yIo3iRCalFAaJRSnDWMc8QpKuuSnQcpQpXo7isCjKEp01yp2BWXy1EDd-s_uH7c7Zkx9gwvWZRDWOqeURPmXHJ-65kry4EQwEizjlz93ilOu-OJ51Xq-1-J6_1xJ88Fyb7E2UYxlqprsCi1UfC-LN2kfYP0eg4vTJuE-QC6vr3BavJr8GLhxuRPSk5BVBfk-6vS51kfr3pIfW9s5HQ32CyzeZyWeo4xPNKpBEKJ2jmMAo3FuKVOJgm8omA-RagqUNCKulzZQvvG77f9xm05LRFg4ZwiFzcMh4A7afxvQnZUdevXutwkDmTdAwmwKgAd8rFE0v_1_al9elfYN5BFJ2etw--QoLzGLYpWauQX00GJt1mFN_RjfDwYbXBwI_3xpS_wD3MEYC |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Scaling+up+stochastic+gradient+descent+for+non-convex+optimisation&rft.jtitle=Machine+learning&rft.au=Mohamad%2C+Saad&rft.au=Alamri%2C+Hamad&rft.au=Bouchachia%2C+Abdelhamid&rft.date=2022-11-01&rft.pub=Springer+Nature+B.V&rft.issn=0885-6125&rft.eissn=1573-0565&rft.volume=111&rft.issue=11&rft.spage=4039&rft.epage=4079&rft_id=info:doi/10.1007%2Fs10994-022-06243-3&rft.externalDBID=HAS_PDF_LINK |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0885-6125&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0885-6125&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0885-6125&client=summon |