Scaling up stochastic gradient descent for non-convex optimisation

Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions and large datasets. We address the bottleneck problem arisin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Machine learning Jg. 111; H. 11; S. 4039 - 4079
Hauptverfasser: Mohamad, Saad, Alamri, Hamad, Bouchachia, Abdelhamid
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York Springer US 01.11.2022
Springer Nature B.V
Schlagworte:
ISSN:0885-6125, 1573-0565
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions and large datasets. We address the bottleneck problem arising when using both shared and distributed memory. Typically, the former is bounded by limited computation resources and bandwidth whereas the latter suffers from communication overheads. We propose a unified distributed and parallel implementation of SGD (named DPSGD) that relies on both asynchronous distribution and lock-free parallelism. By combining two strategies into a unified framework, DPSGD is able to strike a better trade-off between local computation and communication. The convergence properties of DPSGD are studied for non-convex problems such as those arising in statistical modelling and machine learning. Our theoretical analysis shows that DPSGD leads to speed-up with respect to the number of cores and number of workers while guaranteeing an asymptotic convergence rate of O ( 1 / T ) given that the number of cores is bounded by T 1 / 4 and the number of workers is bounded by T 1 / 2 where T is the number of iterations. The potential gains that can be achieved by DPSGD are demonstrated empirically on a stochastic variational inference problem (Latent Dirichlet Allocation) and on a deep reinforcement learning (DRL) problem (advantage actor critic - A2C) resulting in two algorithms: DPSVI and HSA2C. Empirical results validate our theoretical findings. Comparative studies are conducted to show the performance of the proposed DPSGD against the state-of-the-art DRL algorithms.
AbstractList Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions and large datasets. We address the bottleneck problem arising when using both shared and distributed memory. Typically, the former is bounded by limited computation resources and bandwidth whereas the latter suffers from communication overheads. We propose a unified distributed and parallel implementation of SGD (named DPSGD) that relies on both asynchronous distribution and lock-free parallelism. By combining two strategies into a unified framework, DPSGD is able to strike a better trade-off between local computation and communication. The convergence properties of DPSGD are studied for non-convex problems such as those arising in statistical modelling and machine learning. Our theoretical analysis shows that DPSGD leads to speed-up with respect to the number of cores and number of workers while guaranteeing an asymptotic convergence rate of $$O(1/\sqrt{T})$$ O ( 1 / T ) given that the number of cores is bounded by $$T^{1/4}$$ T 1 / 4 and the number of workers is bounded by $$T^{1/2}$$ T 1 / 2 where T is the number of iterations. The potential gains that can be achieved by DPSGD are demonstrated empirically on a stochastic variational inference problem (Latent Dirichlet Allocation) and on a deep reinforcement learning (DRL) problem (advantage actor critic - A2C) resulting in two algorithms: DPSVI and HSA2C. Empirical results validate our theoretical findings. Comparative studies are conducted to show the performance of the proposed DPSGD against the state-of-the-art DRL algorithms.
Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions and large datasets. We address the bottleneck problem arising when using both shared and distributed memory. Typically, the former is bounded by limited computation resources and bandwidth whereas the latter suffers from communication overheads. We propose a unified distributed and parallel implementation of SGD (named DPSGD) that relies on both asynchronous distribution and lock-free parallelism. By combining two strategies into a unified framework, DPSGD is able to strike a better trade-off between local computation and communication. The convergence properties of DPSGD are studied for non-convex problems such as those arising in statistical modelling and machine learning. Our theoretical analysis shows that DPSGD leads to speed-up with respect to the number of cores and number of workers while guaranteeing an asymptotic convergence rate of O ( 1 / T ) given that the number of cores is bounded by T 1 / 4 and the number of workers is bounded by T 1 / 2 where T is the number of iterations. The potential gains that can be achieved by DPSGD are demonstrated empirically on a stochastic variational inference problem (Latent Dirichlet Allocation) and on a deep reinforcement learning (DRL) problem (advantage actor critic - A2C) resulting in two algorithms: DPSVI and HSA2C. Empirical results validate our theoretical findings. Comparative studies are conducted to show the performance of the proposed DPSGD against the state-of-the-art DRL algorithms.
Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions and large datasets. We address the bottleneck problem arising when using both shared and distributed memory. Typically, the former is bounded by limited computation resources and bandwidth whereas the latter suffers from communication overheads. We propose a unified distributed and parallel implementation of SGD (named DPSGD) that relies on both asynchronous distribution and lock-free parallelism. By combining two strategies into a unified framework, DPSGD is able to strike a better trade-off between local computation and communication. The convergence properties of DPSGD are studied for non-convex problems such as those arising in statistical modelling and machine learning. Our theoretical analysis shows that DPSGD leads to speed-up with respect to the number of cores and number of workers while guaranteeing an asymptotic convergence rate of O(1/T) given that the number of cores is bounded by T1/4 and the number of workers is bounded by T1/2 where T is the number of iterations. The potential gains that can be achieved by DPSGD are demonstrated empirically on a stochastic variational inference problem (Latent Dirichlet Allocation) and on a deep reinforcement learning (DRL) problem (advantage actor critic - A2C) resulting in two algorithms: DPSVI and HSA2C. Empirical results validate our theoretical findings. Comparative studies are conducted to show the performance of the proposed DPSGD against the state-of-the-art DRL algorithms.
Author Alamri, Hamad
Bouchachia, Abdelhamid
Mohamad, Saad
Author_xml – sequence: 1
  givenname: Saad
  surname: Mohamad
  fullname: Mohamad, Saad
  organization: Department of Computing, Bournemouth University
– sequence: 2
  givenname: Hamad
  surname: Alamri
  fullname: Alamri, Hamad
  organization: WMG, Warwick University
– sequence: 3
  givenname: Abdelhamid
  orcidid: 0000-0002-1980-5517
  surname: Bouchachia
  fullname: Bouchachia, Abdelhamid
  email: abouchachia@bournemouth.ac.uk
  organization: Department of Computing, Bournemouth University
BookMark eNp9kEtLAzEUhYNUsK3-AVcDrqN5TB6z1OILCi7UdchkkprSJmOSiv57p44guOjqLO757rn3zMAkxGABOMfoEiMkrjJGTVNDRAhEnNQU0iMwxUxQiBhnEzBFUjLIMWEnYJbzGiFEuORTcPNs9MaHVbXrq1yiedO5eFOtku68DaXqbDZ7dTFVQyY0MXzYzyr2xW991sXHcAqOnd5ke_arc_B6d_uyeIDLp_vHxfUSGsppgZpb2nWGUGaQNFgL3cqaiUa4hluOkWsw6xiteU06alspuJPDuGa4bZ3jnM7Bxbi3T_F9Z3NR67hLYYhURNCGYkTY3kVGl0kx52Sd6pPf6vSlMFL7rtTYlRq6Uj9dKTpA8h9kfPl5riTtN4dROqJ5yAkrm_6uOkB9A4fNf64
CitedBy_id crossref_primary_10_1007_s10994_025_06737_w
crossref_primary_10_1177_10775463241310018
crossref_primary_10_2478_ijssis_2024_0018
Cites_doi 10.1137/070704277
10.1613/jair.3912
10.1109/TCOMM.2020.3026398
10.1145/1394608.1382172
10.1038/nature14236
10.1137/120880811
10.1023/A:1007665907178
10.1214/aoms/1177729586
10.1038/nature16961
10.1137/16M1080173
10.1080/01621459.2017.1285773
10.1109/TAC.1986.1104412
10.1609/aaai.v33i01.33015693
10.1145/2783258.2783323
10.1609/aaai.v31i1.10940
10.1109/ICC47138.2019.9123209
10.1609/aaai.v31i1.10651
10.1007/978-3-319-92040-5_19
10.1109/TNN.1998.712192
10.24963/ijcai.2018/447
10.1609/aaai.v30i1.10305
10.1007/978-3-030-16841-4_31
10.7551/mitpress/7503.003.0006
10.1007/978-3-319-75931-9_1
10.1145/2740908.2741998
10.1561/9781601981851
10.1109/CDC.2012.6426626
10.1145/2640087.2644155
10.1007/978-1-4615-3618-5_2
10.1145/2623330.2623612
10.1007/978-3-030-64583-0_5
10.1007/978-3-7908-2604-3_16
10.1609/aaai.v31i1.10921
ContentType Journal Article
Copyright The Author(s) 2022
The Author(s) 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: The Author(s) 2022
– notice: The Author(s) 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID C6C
AAYXX
CITATION
3V.
7SC
7XB
88I
8AL
8AO
8FD
8FE
8FG
8FK
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
GNUQQ
HCIFZ
JQ2
K7-
L7M
L~C
L~D
M0N
M2P
P5Z
P62
PHGZM
PHGZT
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
Q9U
DOI 10.1007/s10994-022-06243-3
DatabaseName Springer Nature OA Free Journals
CrossRef
ProQuest Central (Corporate)
Computer and Information Systems Abstracts
ProQuest Central (purchase pre-March 2016)
Science Database (Alumni Edition)
Computing Database (Alumni Edition)
ProQuest Pharma Collection
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Advanced Technologies & Computer Science Collection
ProQuest Central Essentials
AUTh Library subscriptions: ProQuest Central
Technology collection
ProQuest One Community College
ProQuest Central
ProQuest Central Student
SciTech Premium Collection
ProQuest Computer Science Collection
Computer Science Database
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Computing Database
Science Database
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central China
ProQuest Central Basic
DatabaseTitle CrossRef
Computer Science Database
ProQuest Central Student
Technology Collection
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Pharma Collection
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Central Korea
ProQuest Central (New)
Advanced Technologies Database with Aerospace
Advanced Technologies & Aerospace Collection
ProQuest Computing
ProQuest Science Journals (Alumni Edition)
ProQuest Central Basic
ProQuest Science Journals
ProQuest Computing (Alumni Edition)
ProQuest One Academic Eastern Edition
ProQuest Technology Collection
ProQuest SciTech Collection
Computer and Information Systems Abstracts Professional
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
ProQuest One Academic
ProQuest Central (Alumni)
ProQuest One Academic (New)
DatabaseTitleList CrossRef

Computer Science Database
Database_xml – sequence: 1
  dbid: BENPR
  name: ProQuest Central
  url: https://www.proquest.com/central
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1573-0565
EndPage 4079
ExternalDocumentID 10_1007_s10994_022_06243_3
GrantInformation_xml – fundername: Horizon 2020 Framework Programme
  grantid: 687691
  funderid: http://dx.doi.org/10.13039/100010661
GroupedDBID -4Z
-59
-5G
-BR
-EM
-Y2
-~C
-~X
.4S
.86
.DC
.VR
06D
0R~
0VY
199
1N0
1SB
2.D
203
28-
29M
2J2
2JN
2JY
2KG
2KM
2LR
2P1
2VQ
2~H
30V
3V.
4.4
406
408
409
40D
40E
5GY
5QI
5VS
67Z
6NX
6TJ
78A
88I
8AO
8FE
8FG
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAEWM
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AAOBN
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBXA
ABDZT
ABECU
ABFTV
ABHLI
ABHQN
ABIVO
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABUWG
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFS
ACGOD
ACHSB
ACHXU
ACKNC
ACMDZ
ACMLO
ACNCT
ACOKC
ACOMO
ACPIV
ACZOJ
ADHHG
ADHIR
ADIMF
ADINQ
ADKNI
ADKPE
ADMLS
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEFIE
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFBBN
AFEXP
AFGCZ
AFKRA
AFLOW
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHSBF
AHYZX
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARAPS
ARCSS
ARMRJ
ASPBG
AVWKF
AXYYD
AYJHY
AZFZN
AZQEC
B-.
BA0
BBWZM
BDATZ
BENPR
BGLVJ
BGNMA
BPHCQ
BSONS
C6C
CAG
CCPQU
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DU5
DWQXO
EBLON
EBS
EIOEI
EJD
ESBYG
F5P
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNUQQ
GNWQR
GQ6
GQ7
GQ8
GXS
H13
HCIFZ
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
I-F
I09
IHE
IJ-
IKXTQ
ITG
ITH
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Y
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
K6V
K7-
KDC
KOV
KOW
LAK
LLZTM
M0N
M2P
M4Y
MA-
MVM
N2Q
N9A
NB0
NDZJH
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
OVD
P19
P2P
P62
P9O
PF-
PQQKQ
PROAC
PT4
Q2X
QF4
QM1
QN7
QO4
QOK
QOS
R4E
R89
R9I
RHV
RIG
RNI
RNS
ROL
RPX
RSV
RZC
RZE
S16
S1Z
S26
S27
S28
S3B
SAP
SCJ
SCLPG
SCO
SDH
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TAE
TEORI
TN5
TSG
TSK
TSV
TUC
TUS
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
VXZ
W23
W48
WH7
WIP
WK8
XJT
YLTOR
Z45
Z7R
Z7S
Z7U
Z7V
Z7W
Z7X
Z7Y
Z7Z
Z81
Z83
Z85
Z86
Z87
Z88
Z8M
Z8N
Z8O
Z8P
Z8Q
Z8R
Z8S
Z8T
Z8U
Z8W
Z8Z
Z91
Z92
ZMTXR
AAPKM
AAYXX
ABBRH
ABDBE
ABFSG
ABRTQ
ACSTC
ADHKG
ADKFA
AEZWR
AFDZB
AFFHD
AFHIU
AFOHR
AGQPQ
AHPBZ
AHWEU
AIXLP
AMVHM
ATHPR
AYFIA
CITATION
PHGZM
PHGZT
PQGLB
7SC
7XB
8AL
8FD
8FK
JQ2
L7M
L~C
L~D
PKEHL
PQEST
PQUKI
PRINS
Q9U
ID FETCH-LOGICAL-c363t-a6e3ddc235c08c1a7ab845797f96e610f915d534642d3eb876f8579451bbff663
IEDL.DBID M2P
ISICitedReferencesCount 3
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000864989100002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0885-6125
IngestDate Wed Nov 05 02:10:09 EST 2025
Sat Nov 29 01:43:29 EST 2025
Tue Nov 18 21:13:29 EST 2025
Fri Feb 21 02:44:02 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 11
Keywords Deep reinforcement learning
Distributed and parallel computation
Stochastic gradient descent
Variational inference
Large scale non-convex optimisation
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c363t-a6e3ddc235c08c1a7ab845797f96e610f915d534642d3eb876f8579451bbff663
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-1980-5517
OpenAccessLink https://link.springer.com/10.1007/s10994-022-06243-3
PQID 2739310256
PQPubID 54194
PageCount 41
ParticipantIDs proquest_journals_2739310256
crossref_primary_10_1007_s10994_022_06243_3
crossref_citationtrail_10_1007_s10994_022_06243_3
springer_journals_10_1007_s10994_022_06243_3
PublicationCentury 2000
PublicationDate 20221100
2022-11-00
20221101
PublicationDateYYYYMMDD 2022-11-01
PublicationDate_xml – month: 11
  year: 2022
  text: 20221100
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
– name: Dordrecht
PublicationTitle Machine learning
PublicationTitleAbbrev Mach Learn
PublicationYear 2022
Publisher Springer US
Springer Nature B.V
Publisher_xml – name: Springer US
– name: Springer Nature B.V
References Bellemare, Naddaf, Veness, Bowling (CR8) 2013; 47
Abadi, Barham, Chen, Chen, Davis, Dean, Devin, Ghemawat, Irving, Isard (CR1) 2016; 16
Mnih, Kavukcuoglu, Silver, Rusu, Veness, Bellemare, Graves, Riedmiller, Fidjeland, Ostrovski (CR45) 2015; 518
Jordan, Ghahramani, Jaakkola, Saul (CR33) 1999; 37
CR39
CR38
CR37
CR36
CR35
CR34
Tsitsiklis, Bertsekas, Athans (CR67) 1986; 31
CR77
CR32
CR76
CR31
CR75
CR74
CR73
CR71
CR70
Yu, Yang, Zhu (CR72) 2019; 33
Silver, Huang, Maddison, Guez, Sifre, Van Den Driessche, Schrittwieser, Antonoglou, Panneershelvam, Lanctot (CR60) 2016; 529
Elgabli, Park, Bedi, Issaid, Bennis, Aggarwal (CR21) 2020; 69
CR2
CR4
CR3
CR6
CR5
CR7
Dekel, Gilad-Bachrach, Shamir, Xiao (CR19) 2012; 13
CR48
CR47
CR46
CR44
CR43
Nemirovski, Juditsky, Lan, Shapiro (CR49) 2009; 19
Blei, Ng, Jordan (CR10) 2003; 3
CR42
CR41
CR40
Ipek, Mutlu, Martínez, Caruana (CR30) 2008; 36
Blei, Kucukelbir, McAuliffe (CR9) 2017; 112
CR18
CR16
CR15
Robbins, Monro (CR56) 1951; 22
CR59
CR14
CR58
CR13
CR57
CR11
CR55
CR54
CR53
CR52
Hoffman, Blei, Wang, Paisley (CR25) 2013; 14
CR51
CR50
Bottou, Curtis, Nocedal (CR12) 2018; 60
De Sa, Zhang, Olukotun, Ré (CR17) 2015; 28
CR29
CR28
CR27
CR26
CR69
CR68
CR23
CR22
CR66
CR20
CR64
CR63
CR62
CR61
Sutton, McAllester, Singh, Mansour (CR65) 2000; 12
Ghadimi, Lan (CR24) 2013; 23
6243_CR73
6243_CR71
C De Sa (6243_CR17) 2015; 28
J Tsitsiklis (6243_CR67) 1986; 31
6243_CR70
6243_CR7
6243_CR4
6243_CR3
6243_CR6
6243_CR5
RS Sutton (6243_CR65) 2000; 12
MI Jordan (6243_CR33) 1999; 37
6243_CR39
6243_CR38
6243_CR37
6243_CR36
6243_CR35
6243_CR34
6243_CR77
6243_CR32
D Silver (6243_CR60) 2016; 529
6243_CR76
6243_CR31
6243_CR75
6243_CR74
6243_CR40
V Mnih (6243_CR45) 2015; 518
6243_CR2
6243_CR48
L Bottou (6243_CR12) 2018; 60
6243_CR47
6243_CR46
M Abadi (6243_CR1) 2016; 16
6243_CR44
6243_CR43
E Ipek (6243_CR30) 2008; 36
6243_CR42
6243_CR41
6243_CR51
6243_CR50
A Nemirovski (6243_CR49) 2009; 19
6243_CR18
H Robbins (6243_CR56) 1951; 22
6243_CR16
6243_CR15
6243_CR59
6243_CR14
6243_CR58
6243_CR13
6243_CR57
DM Blei (6243_CR10) 2003; 3
6243_CR11
6243_CR55
6243_CR54
6243_CR53
6243_CR52
6243_CR62
DM Blei (6243_CR9) 2017; 112
6243_CR61
MG Bellemare (6243_CR8) 2013; 47
O Dekel (6243_CR19) 2012; 13
MD Hoffman (6243_CR25) 2013; 14
A Elgabli (6243_CR21) 2020; 69
6243_CR29
6243_CR28
6243_CR27
H Yu (6243_CR72) 2019; 33
S Ghadimi (6243_CR24) 2013; 23
6243_CR26
6243_CR69
6243_CR68
6243_CR23
6243_CR22
6243_CR66
6243_CR20
6243_CR64
6243_CR63
References_xml – ident: CR70
– ident: CR22
– volume: 19
  start-page: 1574
  issue: 4
  year: 2009
  end-page: 1609
  ident: CR49
  article-title: Robust stochastic approximation approach to stochastic programming
  publication-title: SIAM Journal on Optimization
  doi: 10.1137/070704277
– ident: CR68
– ident: CR74
– ident: CR4
– volume: 47
  start-page: 253
  year: 2013
  end-page: 279
  ident: CR8
  article-title: The arcade learning environment: An evaluation platform for general agents
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.3912
– ident: CR39
– ident: CR16
– ident: CR51
– ident: CR35
– ident: CR29
– ident: CR54
– ident: CR61
– ident: CR77
– ident: CR58
– volume: 69
  start-page: 164
  year: 2020
  end-page: 181
  ident: CR21
  article-title: Q-GADMM: Quantized group ADMM for communication efficient decentralized machine learning
  publication-title: IEEE Transactions on Communications
  doi: 10.1109/TCOMM.2020.3026398
– ident: CR42
– volume: 36
  start-page: 39
  year: 2008
  end-page: 50
  ident: CR30
  article-title: Self-optimizing memory controllers: A reinforcement learning approach
  publication-title: ACM SIGARCH Computer Architecture News
  doi: 10.1145/1394608.1382172
– volume: 518
  start-page: 529
  issue: 7540
  year: 2015
  ident: CR45
  article-title: Human-level control through deep reinforcement learning
  publication-title: Nature
  doi: 10.1038/nature14236
– ident: CR46
– ident: CR71
– volume: 23
  start-page: 2341
  year: 2013
  end-page: 2368
  ident: CR24
  article-title: Stochastic first-and zeroth-order methods for nonconvex stochastic programming
  publication-title: SIAM Journal on Optimization
  doi: 10.1137/120880811
– volume: 37
  start-page: 183
  year: 1999
  end-page: 233
  ident: CR33
  article-title: An introduction to variational methods for graphical models
  publication-title: Machine Learning
  doi: 10.1023/A:1007665907178
– volume: 22
  start-page: 400
  year: 1951
  end-page: 407
  ident: CR56
  article-title: A stochastic approximation method
  publication-title: The Annals of Mathematical Statistics
  doi: 10.1214/aoms/1177729586
– ident: CR75
– ident: CR15
– ident: CR50
– volume: 529
  start-page: 484
  issue: 7587
  year: 2016
  end-page: 489
  ident: CR60
  article-title: Mastering the game of go with deep neural networks and tree search
  publication-title: Nature
  doi: 10.1038/nature16961
– ident: CR11
– ident: CR57
– ident: CR32
– ident: CR36
– ident: CR5
– ident: CR64
– ident: CR26
– ident: CR18
– ident: CR43
– ident: CR66
– ident: CR47
– ident: CR14
– ident: CR2
– ident: CR37
– ident: CR53
– volume: 16
  start-page: 265
  year: 2016
  end-page: 283
  ident: CR1
  article-title: Tensorflow: A system for large-scale machine learning
  publication-title: OSDI
– ident: CR6
– ident: CR40
– ident: CR63
– volume: 60
  start-page: 223
  issue: 2
  year: 2018
  end-page: 311
  ident: CR12
  article-title: Optimization methods for large-scale machine learning
  publication-title: SIAM Review
  doi: 10.1137/16M1080173
– volume: 112
  start-page: 859
  year: 2017
  end-page: 877
  ident: CR9
  article-title: Variational inference: A review for statisticians
  publication-title: Journal of the American Statistical Association
  doi: 10.1080/01621459.2017.1285773
– volume: 28
  start-page: 2656
  year: 2015
  ident: CR17
  article-title: Taming the wild: A unified analysis of hogwild!-style algorithms
  publication-title: Advances in Neural Information Processing Systems
– ident: CR27
– ident: CR23
– volume: 14
  start-page: 1303
  year: 2013
  end-page: 1347
  ident: CR25
  article-title: Stochastic variational inference
  publication-title: Journal of Machine Learning Research
– ident: CR69
– volume: 31
  start-page: 803
  issue: 9
  year: 1986
  end-page: 812
  ident: CR67
  article-title: Distributed asynchronous deterministic and stochastic gradient optimization algorithms
  publication-title: IEEE Transactions on Automatic Control
  doi: 10.1109/TAC.1986.1104412
– ident: CR44
– ident: CR48
– ident: CR73
– volume: 3
  start-page: 993
  year: 2003
  end-page: 1022
  ident: CR10
  article-title: Latent dirichlet allocation
  publication-title: Journal of Machine Learning research
– ident: CR3
– ident: CR38
– ident: CR52
– ident: CR31
– ident: CR13
– volume: 13
  start-page: 165
  year: 2012
  end-page: 202
  ident: CR19
  article-title: Optimal distributed online prediction using mini-batches
  publication-title: Journal of Machine Learning Research
– ident: CR34
– ident: CR55
– ident: CR7
– ident: CR59
– ident: CR76
– ident: CR28
– ident: CR41
– ident: CR62
– volume: 12
  start-page: 1057
  year: 2000
  end-page: 1063
  ident: CR65
  article-title: Policy gradient methods for reinforcement learning with function approximation
  publication-title: Advances in Neural Information Processing Systems
– ident: CR20
– volume: 33
  start-page: 5693
  year: 2019
  end-page: 5700
  ident: CR72
  article-title: Parallel restarted SGD with faster convergence and less communication: Demystifying why model averaging works for deep learning
  publication-title: Proceedings of the AAAI Conference on Artificial Intelligence
  doi: 10.1609/aaai.v33i01.33015693
– ident: 6243_CR15
– ident: 6243_CR40
– ident: 6243_CR71
  doi: 10.1145/2783258.2783323
– ident: 6243_CR28
  doi: 10.1609/aaai.v31i1.10940
– ident: 6243_CR69
  doi: 10.1109/ICC47138.2019.9123209
– volume: 518
  start-page: 529
  issue: 7540
  year: 2015
  ident: 6243_CR45
  publication-title: Nature
  doi: 10.1038/nature14236
– ident: 6243_CR23
  doi: 10.1609/aaai.v31i1.10651
– volume: 12
  start-page: 1057
  year: 2000
  ident: 6243_CR65
  publication-title: Advances in Neural Information Processing Systems
– ident: 6243_CR3
  doi: 10.1007/978-3-319-92040-5_19
– ident: 6243_CR44
– ident: 6243_CR7
– ident: 6243_CR50
– ident: 6243_CR63
  doi: 10.1109/TNN.1998.712192
– ident: 6243_CR54
– ident: 6243_CR47
– ident: 6243_CR73
– ident: 6243_CR77
– volume: 16
  start-page: 265
  year: 2016
  ident: 6243_CR1
  publication-title: OSDI
– volume: 33
  start-page: 5693
  year: 2019
  ident: 6243_CR72
  publication-title: Proceedings of the AAAI Conference on Artificial Intelligence
  doi: 10.1609/aaai.v33i01.33015693
– ident: 6243_CR76
  doi: 10.24963/ijcai.2018/447
– ident: 6243_CR18
– ident: 6243_CR35
– volume: 37
  start-page: 183
  year: 1999
  ident: 6243_CR33
  publication-title: Machine Learning
  doi: 10.1023/A:1007665907178
– ident: 6243_CR31
– volume: 22
  start-page: 400
  year: 1951
  ident: 6243_CR56
  publication-title: The Annals of Mathematical Statistics
  doi: 10.1214/aoms/1177729586
– ident: 6243_CR74
  doi: 10.1609/aaai.v30i1.10305
– ident: 6243_CR62
– ident: 6243_CR58
– volume: 19
  start-page: 1574
  issue: 4
  year: 2009
  ident: 6243_CR49
  publication-title: SIAM Journal on Optimization
  doi: 10.1137/070704277
– ident: 6243_CR41
– ident: 6243_CR46
  doi: 10.1007/978-3-030-16841-4_31
– ident: 6243_CR6
– ident: 6243_CR20
– ident: 6243_CR51
– ident: 6243_CR27
– volume: 112
  start-page: 859
  year: 2017
  ident: 6243_CR9
  publication-title: Journal of the American Statistical Association
  doi: 10.1080/01621459.2017.1285773
– ident: 6243_CR48
– volume: 31
  start-page: 803
  issue: 9
  year: 1986
  ident: 6243_CR67
  publication-title: IEEE Transactions on Automatic Control
  doi: 10.1109/TAC.1986.1104412
– ident: 6243_CR2
  doi: 10.7551/mitpress/7503.003.0006
– ident: 6243_CR34
– ident: 6243_CR55
– ident: 6243_CR13
– ident: 6243_CR4
  doi: 10.1007/978-3-319-75931-9_1
– volume: 28
  start-page: 2656
  year: 2015
  ident: 6243_CR17
  publication-title: Advances in Neural Information Processing Systems
– ident: 6243_CR36
– ident: 6243_CR61
– ident: 6243_CR59
– ident: 6243_CR42
– volume: 23
  start-page: 2341
  year: 2013
  ident: 6243_CR24
  publication-title: SIAM Journal on Optimization
  doi: 10.1137/120880811
– ident: 6243_CR26
– ident: 6243_CR66
  doi: 10.1145/2740908.2741998
– ident: 6243_CR68
  doi: 10.1561/9781601981851
– ident: 6243_CR5
  doi: 10.1109/CDC.2012.6426626
– ident: 6243_CR52
– ident: 6243_CR37
  doi: 10.1145/2640087.2644155
– ident: 6243_CR70
  doi: 10.1007/978-1-4615-3618-5_2
– ident: 6243_CR38
  doi: 10.1145/2623330.2623612
– ident: 6243_CR32
  doi: 10.1007/978-3-030-64583-0_5
– ident: 6243_CR16
– ident: 6243_CR11
  doi: 10.1007/978-3-7908-2604-3_16
– ident: 6243_CR14
– ident: 6243_CR39
– ident: 6243_CR64
– volume: 36
  start-page: 39
  year: 2008
  ident: 6243_CR30
  publication-title: ACM SIGARCH Computer Architecture News
  doi: 10.1145/1394608.1382172
– ident: 6243_CR43
– volume: 47
  start-page: 253
  year: 2013
  ident: 6243_CR8
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.3912
– volume: 60
  start-page: 223
  issue: 2
  year: 2018
  ident: 6243_CR12
  publication-title: SIAM Review
  doi: 10.1137/16M1080173
– volume: 529
  start-page: 484
  issue: 7587
  year: 2016
  ident: 6243_CR60
  publication-title: Nature
  doi: 10.1038/nature16961
– ident: 6243_CR22
– volume: 14
  start-page: 1303
  year: 2013
  ident: 6243_CR25
  publication-title: Journal of Machine Learning Research
– ident: 6243_CR75
  doi: 10.1609/aaai.v31i1.10921
– volume: 69
  start-page: 164
  year: 2020
  ident: 6243_CR21
  publication-title: IEEE Transactions on Communications
  doi: 10.1109/TCOMM.2020.3026398
– ident: 6243_CR53
– volume: 3
  start-page: 993
  year: 2003
  ident: 6243_CR10
  publication-title: Journal of Machine Learning research
– ident: 6243_CR29
  doi: 10.1609/aaai.v31i1.10940
– ident: 6243_CR57
– volume: 13
  start-page: 165
  year: 2012
  ident: 6243_CR19
  publication-title: Journal of Machine Learning Research
SSID ssj0002686
Score 2.4015048
Snippet Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 4039
SubjectTerms Algorithms
Artificial Intelligence
Comparative studies
Computation
Computer Science
Control
Convergence
Deep learning
Dirichlet problem
Distributed memory
Empirical analysis
Iterative methods
Machine Learning
Mechatronics
Natural Language Processing (NLP)
Optimization
Robotics
Scaling up
Simulation and Modeling
Special Issue: Foundations of Data Science
Statistical models
SummonAdditionalLinks – databaseName: Springer Standard Collection
  dbid: RSV
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT8MwDLZgcODCeIrBQDlwg0ht0qTNERATJ4Q0QLtVbZICEmzTHoifj5OlGyBAgnMSt7Jj57MSfwY4ZrFJSmUSGhlhaaLQFZXFvVyVnGVSR6oUxjebSK-vs15P3YSisHH92r2-kvSR-kOxm6exxeQpkizhlC_DinBsMy5H797P4y-Tvr8juo-g7vwOpTLfy_h8HC0w5pdrUX_adJr_-88NWA_okpzNtsMmLNn-FjTrzg0kOPI2nHfRNCiUTIcE0Z9-LBxdM3kY-QdgE2JmJE8EES3pD_rUv01_IwOMLy_h_c8O3HUuby-uaOimQDWXfEILabkxmnGho0zHRVqUWSJSlVZKWgRRlYqFETzBhMRwW2KUrDIcTkRcllWFwGQXGvhFuwckldoUKCayGrMpmSlc7Iq3VIToQrKsBXGt1FwHqnHX8eI5X5AkOyXlqKTcKynnLTiZrxnOiDZ-nd2ubZUHpxvnzLH7xQ7EteC0ts1i-Gdp-3-bfgBrzJnXVyS2oTEZTe0hrOrXydN4dOQ34zv7OtYR
  priority: 102
  providerName: Springer Nature
Title Scaling up stochastic gradient descent for non-convex optimisation
URI https://link.springer.com/article/10.1007/s10994-022-06243-3
https://www.proquest.com/docview/2739310256
Volume 111
WOSCitedRecordID wos000864989100002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAVX
  databaseName: Springer Standard Collection
  customDbUrl:
  eissn: 1573-0565
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002686
  issn: 0885-6125
  databaseCode: RSV
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1JT8QgFH5xO3hxN47LhIM3JbZQaDkZNRoT42TiFuOlaYGqic6Msxh_vg-GOtFEL164UCjhbR_wFoBdFpukVCahkRGWJgpFUVnk5arkLJM6UqUwvthE2mpl9_eqHS7cBsGtstaJXlGbrnZ35AfMpW6LnYU-7L1RVzXKva6GEhrTMIvIJnYuXZes_aWJmfSVHlGQBHWWPATNhNA5nxQXj2KRZAmn_LthmqDNHw-k3u6cLf53xUuwEBAnORqzyDJM2c4KLNbVHEgQ7lU4vkZy4RLIqEcQEeqnwqVwJo997xQ2JGac-IkgyiWdbod6f_UP0kWd8xp8gtbg9uz05uSchgoLVHPJh7SQlhujGRc6ynRcpEWZJSJVaaWkRWBVqVgYwRM8pBhuS9ScVYbdiYjLsqoQrKzDDP7RbgBJpTYFThNZjScsmSkc7AK6VISIQ7KsAXG9vbkO6cddFYyXfJI42ZEkR5LkniQ5b8De15jeOPnGn19v13TIgyAO8gkRGrBfU3LS_ftsm3_PtgXzzDGPj0rchplhf2R3YE6_D58H_SbMHp-22ldNmL5IadMzJbZt8YDt1fXdJ_oh5IY
linkProvider ProQuest
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V3JTsMwEB1BQYILO6KsPsAJLBI7ceIDQqwCFSokQOIWEtsBJGhLW7af4hsZuwkVSHDjwNnxRM68GT_HswCsMl8HmdQB9XRoaCDRFKVBLOcZZ7FQnsxC7ZpNRPV6fHUlzwbgvcyFsWGVpU90jlo3lf1Hvsls6Tbf7tDbrUdqu0bZ29WyhUYPFjXz9oJHts7W8T7qd42xw4OLvSNadBWgigvepakwXGvFeKi8WPlplGZxEEYyyqUwSCZy6Yc65AESc81Nht4ij3E4CP0sy3PcoFHuIAwFPBbWomoR_fT8TLjOkmi4IbXMoUjSKVL1XBFePPp5ggWc8q8bYZ_dfruQdfvc4fh_-0ITMFYwarLTM4FJGDCNKRgvu1WQwnlNw-45whGXTJ5aBBmvuk1tiWpy03ZBb12ie4WtCLJ40mg2qIvHfyVN9KkPRczTDFz-yUpmoYJvNHNAIqF0imI8o_AEKWKJk23CmvSQUQkWV8Ev1Zmoory67fJxn_QLQ1sIJAiBxEEg4VVY_5zT6hUX-fXpxVLvSeFoOklf6VXYKJHTH_5Z2vzv0lZg5Oji9CQ5Oa7XFmCUWeC6DMxFqHTbT2YJhtVz967TXnYmQOD6rxH1AbbMO7A
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1LT9wwEB4tC0JcyqOt2BaoD3CiFomdOPEBVcCyAoFWKx4StzSxnYIEu9t9tPDX-HUdex1WIJUbB86JJ3H8zczneB4AmyzUUSF1RAMdGxpJVEVpEMtlwVkqVCCLWLtmE0m7nV5dyU4NHqtcGBtWWdlEZ6h1T9l_5DvMlm4LrYfeKX1YRKfZ-tH_TW0HKXvSWrXTmEDkxDz8xe3bcPe4iWu9xVjr8OLgiPoOA1RxwUc0F4ZrrRiPVZCqME_yIo3iRCalFAaJRSnDWMc8QpKuuSnQcpQpXo7isCjKEp01yp2BWXy1EDd-s_uH7c7Zkx9gwvWZRDWOqeURPmXHJ-65kry4EQwEizjlz93ilOu-OJ51Xq-1-J6_1xJ88Fyb7E2UYxlqprsCi1UfC-LN2kfYP0eg4vTJuE-QC6vr3BavJr8GLhxuRPSk5BVBfk-6vS51kfr3pIfW9s5HQ32CyzeZyWeo4xPNKpBEKJ2jmMAo3FuKVOJgm8omA-RagqUNCKulzZQvvG77f9xm05LRFg4ZwiFzcMh4A7afxvQnZUdevXutwkDmTdAwmwKgAd8rFE0v_1_al9elfYN5BFJ2etw--QoLzGLYpWauQX00GJt1mFN_RjfDwYbXBwI_3xpS_wD3MEYC
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Scaling+up+stochastic+gradient+descent+for+non-convex+optimisation&rft.jtitle=Machine+learning&rft.au=Mohamad%2C+Saad&rft.au=Alamri%2C+Hamad&rft.au=Bouchachia%2C+Abdelhamid&rft.date=2022-11-01&rft.pub=Springer+Nature+B.V&rft.issn=0885-6125&rft.eissn=1573-0565&rft.volume=111&rft.issue=11&rft.spage=4039&rft.epage=4079&rft_id=info:doi/10.1007%2Fs10994-022-06243-3&rft.externalDBID=HAS_PDF_LINK
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0885-6125&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0885-6125&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0885-6125&client=summon