Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions
This paper studies the asymptotic behavior of the constant step Stochastic Gradient Descent for the minimization of an unknown function, defined as the expectation of a non convex, non smooth, locally Lipschitz random function. As the gradient may not exist, it is replaced by a certain operator: a r...
Gespeichert in:
| Veröffentlicht in: | Set-valued and variational analysis Jg. 30; H. 3; S. 1117 - 1147 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Dordrecht
Springer Netherlands
01.09.2022
Springer Nature B.V Springer |
| Schlagworte: | |
| ISSN: | 1877-0533, 1877-0541 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | This paper studies the asymptotic behavior of the constant step Stochastic Gradient Descent for the minimization of an unknown function, defined as the expectation of a non convex, non smooth, locally Lipschitz random function. As the gradient may not exist, it is replaced by a certain operator: a reasonable choice is to use an element of the Clarke subdifferential of the random function; another choice is the output of the celebrated backpropagation algorithm, which is popular amongst practioners, and whose properties have recently been studied by Bolte and Pauwels. Since the expectation of the chosen operator is not in general an element of the Clarke subdifferential of the mean function, it has been assumed in the literature that an oracle of the Clarke subdifferential of the mean function is available. As a first result, it is shown in this paper that such an oracle is not needed for almost all initialization points of the algorithm. Next, in the small step size regime, it is shown that the interpolated trajectory of the algorithm converges in probability (in the compact convergence sense) towards the set of solutions of a particular differential inclusion: the subgradient flow. Finally, viewing the iterates as a Markov chain whose transition kernel is indexed by the step size, it is shown that the invariant distribution of the kernel converge weakly to the set of invariant distribution of this differential inclusion as the step size tends to zero. These results show that when the step size is small, with large probability, the iterates eventually lie in a neighborhood of the critical points of the mean function. |
|---|---|
| AbstractList | This paper studies the asymptotic behavior of the constant step Stochastic Gradient Descent for the minimization of an unknown function, defined as the expectation of a non convex, non smooth, locally Lipschitz random function. As the gradient may not exist, it is replaced by a certain operator: a reasonable choice is to use an element of the Clarke subdifferential of the random function; another choice is the output of the celebrated backpropagation algorithm, which is popular amongst practioners, and whose properties have recently been studied by Bolte and Pauwels. Since the expectation of the chosen operator is not in general an element of the Clarke subdifferential of the mean function, it has been assumed in the literature that an oracle of the Clarke subdifferential of the mean function is available. As a first result, it is shown in this paper that such an oracle is not needed for almost all initialization points of the algorithm. Next, in the small step size regime, it is shown that the interpolated trajectory of the algorithm converges in probability (in the compact convergence sense) towards the set of solutions of a particular differential inclusion: the subgradient flow. Finally, viewing the iterates as a Markov chain whose transition kernel is indexed by the step size, it is shown that the invariant distribution of the kernel converge weakly to the set of invariant distribution of this differential inclusion as the step size tends to zero. These results show that when the step size is small, with large probability, the iterates eventually lie in a neighborhood of the critical points of the mean function. This paper studies the asymptotic behavior of the constant step Stochastic Gradient Descent for the minimization of an unknown function F , defined as the expectation of a non convex, non smooth, locally Lipschitz random function. As the gradient may not exist, it is replaced by a certain operator: a reasonable choice is to use an element of the Clarke subdifferential of the random function; an other choice is the output of the celebrated backpropagation algorithm, which is popular amongst practionners, and whose properties have recently been studied by Bolte and Pauwels [7]. Since the expectation of the chosen operator is not in general an element of the Clarke subdifferential BF of the mean function, it has been assumed in the literature that an oracle of BF is available. As a first result, it is shown in this paper that such an oracle is not needed for almost all initialization points of the algorithm. Next, in the small step size regime, it is shown that the interpolated trajectory of the algorithm converges in probability (in the compact convergence sense) towards the set of solutions of the differential inclusion. Finally, viewing the iterates as a Markov chain whose transition kernel is indexed by the step size, it is shown that the invariant distribution of the kernel converge weakly to the set of invariant distribution of this differential inclusion as the step size tends to zero. These results show that when the step size is small, with large probability, the iterates eventually lie in a neighborhood of the critical points of the mean function F . |
| Author | Schechtman, Sholom Hachem, Walid Bianchi, Pascal |
| Author_xml | – sequence: 1 givenname: Pascal surname: Bianchi fullname: Bianchi, Pascal organization: LTCI, Telecom Paris – sequence: 2 givenname: Walid surname: Hachem fullname: Hachem, Walid organization: LIGM, CNRS, Université Gustave Eiffel – sequence: 3 givenname: Sholom orcidid: 0000-0002-5390-4279 surname: Schechtman fullname: Schechtman, Sholom email: sholom.schechtman@univ-eiffel.fr organization: LIGM, CNRS, Université Gustave Eiffel |
| BackLink | https://hal.science/hal-02564349$$DView record in HAL |
| BookMark | eNp9UE1PAjEUbAwmAvoHPG3iyUO1X-x2jwQFTIge0JtJU0sLS6DFthDl19tl_Ug8kLR9r68zk-l0QMs6qwG4xOgGI1TcBowJ4RARAhHKKYf7E9DGvCgg6jHc-u0pPQOdEJaJg1CJ2-B14OxO-7m2SmfOZOkaorQxm0a9SYdTCxlipbKRl7NKp4c7HVRdjfPZo7NwunYuLg7tQesjG26tilUSOgenRq6CvviuXfAyvH8ejOHkafQw6E-goj0aITGYybeCcZ02V6VhrOAzpIqZ1Iobo8s0QporgqVRDOFcSS7TolopwiTtgutGdyFXYuOrtfSfwslKjPsTUc8Q6eWMsnJHE_aqwW68e9_qEMXSbb1N9gTJS4JpjosyoXiDUt6F4LURqoqy_lT0sloJjESdu2hyT_pEHHIX-0Ql_6g_jo6SaEMKCWzn2v-5OsL6ArzfmIQ |
| CitedBy_id | crossref_primary_10_1007_s10107_023_01936_6 crossref_primary_10_1007_s10107_023_02020_9 crossref_primary_10_1007_s10957_024_02408_3 crossref_primary_10_1007_s10915_025_02798_0 crossref_primary_10_1007_s10957_022_02093_0 crossref_primary_10_1287_moor_2022_0289 crossref_primary_10_1287_moor_2021_0194 crossref_primary_10_1137_21M1468450 crossref_primary_10_1007_s10107_025_02245_w crossref_primary_10_1016_j_camwa_2024_03_025 crossref_primary_10_1137_23M1619733 crossref_primary_10_1137_22M1479178 crossref_primary_10_1137_22M1513034 |
| Cites_doi | 10.1007/978-3-642-75894-2 10.1023/B:CASA.0000012091.84864.65 10.1137/110844192 10.1142/S0219493712500116 10.1007/BF02742069 10.1007/3-540-29587-9 10.4064/ap-54-1-85-91 10.1137/080722059 10.1090/S0002-9947-1979-0546911-1 10.1080/17442508.2018.1539086 10.1137/S0363012904439301 10.1137/060670080 10.1007/978-3-642-69512-4 10.1215/S0012-7094-96-08416-1 10.1007/s11590-020-01537-8 10.1007/s10107-020-01501-5 10.1017/CBO9780511626630 10.1007/s10208-018-09409-5 10.1007/BF01099354 |
| ContentType | Journal Article |
| Copyright | The Author(s), under exclusive licence to Springer Nature B.V. 2022 The Author(s), under exclusive licence to Springer Nature B.V. 2022. Distributed under a Creative Commons Attribution 4.0 International License |
| Copyright_xml | – notice: The Author(s), under exclusive licence to Springer Nature B.V. 2022 – notice: The Author(s), under exclusive licence to Springer Nature B.V. 2022. – notice: Distributed under a Creative Commons Attribution 4.0 International License |
| DBID | AAYXX CITATION 1XC VOOES |
| DOI | 10.1007/s11228-022-00638-z |
| DatabaseName | CrossRef Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access) |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Mathematics |
| EISSN | 1877-0541 |
| EndPage | 1147 |
| ExternalDocumentID | oai:HAL:hal-02564349v3 10_1007_s11228_022_00638_z |
| GrantInformation_xml | – fundername: Conseil Régional, Île-de-France funderid: https://doi.org/10.13039/501100003990 |
| GroupedDBID | -5D -5G -BR -EM -Y2 -~C .VR 06D 0R~ 0VY 199 1N0 203 2J2 2JN 2JY 2KG 2LR 2~H 30V 4.4 406 408 409 40D 40E 5VS 6NX 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AANZL AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDZT ABECU ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHSB ACHXU ACKNC ACMDZ ACMLO ACOKC ACOMO ACPIV ACSNA ACZOJ ADHIR ADINQ ADKNI ADKPE ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHSBF AHYZX AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AOCGG ARMRJ AXYYD AYJHY AZFZN B-. BAPOH BDATZ BGNMA BSONS CSCUP DDRTE DNIVK DPUIP DU5 EBLON EBS EIOEI EJD ESBYG FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNWQR GQ6 GQ7 GQ8 H13 HF~ HG6 HLICF HMJXF HQYDN HRMNR HVGLF HZ~ IJ- IKXTQ ITM IWAJR IZIGR I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ KOV LAK LLZTM M4Y MA- N9A NB0 NPVJJ NQJWS NU0 O9- O93 O9G O9J P9R PF0 PT4 QOS R89 R9I ROL RSV S16 S3B SAP SCLPG SDH SHX SISQX SJYHP SMT SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN TSG TSV TUC U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW W23 W48 WK8 YLTOR Z45 ZMTXR ZWQNP ~A9 AAPKM AAYXX ABBRH ABDBE ABFSG ABRTQ ACSTC ADHKG AEZWR AFDZB AFHIU AFOHR AGQPQ AHPBZ AHWEU AIXLP ATHPR AYFIA CITATION 1XC VOOES |
| ID | FETCH-LOGICAL-c353t-2f14ab748e7488c9f4478d0c7daec8ffe99f40e8c21afc4016ca8aa8a3ecc24a3 |
| IEDL.DBID | RSV |
| ISICitedReferencesCount | 23 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000781283100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1877-0533 |
| IngestDate | Sun Oct 19 06:20:28 EDT 2025 Thu Sep 25 00:47:38 EDT 2025 Sat Nov 29 01:59:32 EST 2025 Tue Nov 18 22:25:36 EST 2025 Fri Feb 21 02:45:02 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 3 |
| Keywords | Clarke subdifferential Backpropagation algorithm Differential inclusions 65K05 Stochastic approximation 65K10 Non convex and non smooth optimization 90C15 34A60 |
| Language | English |
| License | Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c353t-2f14ab748e7488c9f4478d0c7daec8ffe99f40e8c21afc4016ca8aa8a3ecc24a3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0002-5390-4279 0000-0001-8499-2761 |
| OpenAccessLink | https://hal.science/hal-02564349 |
| PQID | 2692136179 |
| PQPubID | 2044238 |
| PageCount | 31 |
| ParticipantIDs | hal_primary_oai_HAL_hal_02564349v3 proquest_journals_2692136179 crossref_citationtrail_10_1007_s11228_022_00638_z crossref_primary_10_1007_s11228_022_00638_z springer_journals_10_1007_s11228_022_00638_z |
| PublicationCentury | 2000 |
| PublicationDate | 2022-09-01 |
| PublicationDateYYYYMMDD | 2022-09-01 |
| PublicationDate_xml | – month: 09 year: 2022 text: 2022-09-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | Dordrecht |
| PublicationPlace_xml | – name: Dordrecht |
| PublicationSubtitle | Theory and Applications |
| PublicationTitle | Set-valued and variational analysis |
| PublicationTitleAbbrev | Set-Valued Var. Anal |
| PublicationYear | 2022 |
| Publisher | Springer Netherlands Springer Nature B.V Springer |
| Publisher_xml | – name: Springer Netherlands – name: Springer Nature B.V – name: Springer |
| References | BenvenisteAMétivierMPriouretPAdaptive algorithms and stochastic approximations, Applications of Mathematics (New York), vol. 221990BerlinSpringer0752.93073https://doi.org/10.1007/978-3-642-75894-2. Translated from the French by Stephen S. Wilson Ruszczyński, A.: Convergence of a stochastic subgradient method with averaging for nonsmooth nonconvex constrained optimization. Optim. Lett. 14. https://doi.org/10.1007/s11590-020-01537-8 (2020) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic Differentiation in PyTorch. In: NIPS-W (2017) Kushner, H.J., Yin, G.G.: Stochastic Approximation and Recursive Algorithms and Applications, Applications of Mathematics (New York), 2nd edn., vol. 35. Springer, New York (2003). Stochastic Modelling and Applied Probability Has’minskiı̆RZThe averaging principle for parabolic and elliptic differential equations and Markov processes with small diffusionTeor. Verojatnost. i Primenen.19638325161044 Majewski, S., Miasojedow, B., Moulines, E.: Analysis of nonsmooth stochastic approximation: the differential inclusion approach. arXiv:1805.01916(2018) Davis, D., Drusvyatskiy, D., Kakade, S., Lee, J.D.: Stochastic subgradient method converges on tame functions. Found Comput Math (20), 119–154. https://doi.org/10.1007/s10208-018-09409-5 (2020) AubinJPCellinaADifferential inclusions, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 2641984BerlinSpringerhttps://doi.org/10.1007/978-3-642-69512-4. Set-valued maps and viability theory FaureMRothGErgodic properties of weak asymptotic pseudotrajectories for set-valued dynamical systemsStoch. Dyn.20131311250011,23300724910.1142/S0219493712500116https://doi.org/10.1142/S0219493712500116 ClarkeFHLedyaevYSSternRJWolenskiPRNonsmooth Analysis and Control Theory Graduate Texts in Mathematics, vol. 1781998New YorkSpringer AliprantisCDBorderKCInfinite Dimensional Analysis: a Hitchhiker’s Guide2006BerlinSpringer1156.46001https://doi.org/10.1007/3-540-29587-9 MeynSTweedieRLMarkov Chains and Stochastic Stability20092nd edn.New YorkCambridge University Press10.1017/CBO9780511626630 ErmolievYNorkinVStochastic generalized gradient method for solving nonconvex nonsmooth stochastic optimization problemsCybern. Syst. Anal.199834219621510.1007/BF02742069https://doi.org/10.1007/BF02742069. http://pure.iiasa.ac.at/id/eprint/5415 AubinJPFrankowskaHLasotaAPoincaré’s recurrence theorem for set-valued dynamical systemsAnn. Polon. Math.19915418591113207710.4064/ap-54-1-85-91https://doi.org/10.4064/ap-54-1-85-91 van den DriesLMillerCGeometric categories and o-minimal structuresDuke. Math. J.199684249754014043370889.03025https://doi.org/10.1215/S0012-7094-96-08416-1 BolteJDaniilidisALewisAShiotaMClarke subgradients of stratifiable functionsSIAM J. Optim.2007182556572233845110.1137/060670080 NorkinVGeneralized-differentiable functionsCybern. Syst. Anal.198016101210.1007/BF01099354https://doi.org/10.1007/BF01099354 Bolte, J., Pauwels, E.: Conservative set valued fields, automatic differentiation, stochastic gradient method and deep learning. arXiv:1909.10300(2019) BenaïmMHofbauerJSorinSStochastic approximations and differential inclusionsSIAM J. Control Optim.2005441328348217715910.1137/S0363012904439301(electronic). https://doi.org/10.1137/S0363012904439301 Kakade, S., Lee, J.D.: Provably correct automatic sub-differentiation for qualified programs. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems. http://papers.nips.cc/paper/7943-provably-correct-automatic-sub-differentiation-for-qualified-programs.pdf, vol. 31, pp 7125–7135. Curran Associates, Inc (2018) Mikhalevich, V., Gupal, A., Norkin, V.: Methods of nonconvex optimization. Nauka (1987) FollandGReal Analysis: Modern Techniques and Their Applications. Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts2013HobokenWileyhttps://books.google.fr/books?id=wI4fAwAAQBAJ LebourgGGeneric differentiability of Lipschitzian functionsTransactions of the American Mathematical Society197925612514454691110.1090/S0002-9947-1979-0546911-1http://www.jstor.org/stable/1998104 IoffeADAn invitation to tame optimizationSIAM J. Optim.200919418941917248605510.1137/080722059https://doi.org/10.1137/080722059 ErmolievYMNorkinVSolution of nonconvex nonsmooth stochastic optimization problemsCybern. Syst. Anal.200339570171510.1023/B:CASA.0000012091.84864.65 BianchiPHachemWSalimAConstant step stochastic approximations involving differential inclusions: stability, long-run convergence and applicationsStochastics2019912288320389586710.1080/17442508.2018.1539086https://doi.org/10.1080/17442508.2018.1539086 RothGSandholmWHStochastic approximations with constant step size and differential inclusionsSIAM J. Control Optim.2013511525555303288610.1137/110844192https://doi.org/10.1137/110844192 AD Ioffe (638_CR17) 2009; 19 G Roth (638_CR26) 2013; 51 CD Aliprantis (638_CR1) 2006 JP Aubin (638_CR3) 1991; 54 G Folland (638_CR15) 2013 638_CR25 FH Clarke (638_CR9) 1998 638_CR23 Y Ermoliev (638_CR12) 1998; 34 638_CR21 YM Ermoliev (638_CR13) 2003; 39 M Faure (638_CR14) 2013; 13 L van den Dries (638_CR11) 1996; 84 638_CR8 638_CR27 A Benveniste (638_CR5) 1990 JP Aubin (638_CR2) 1984 G Lebourg (638_CR20) 1979; 256 S Meyn (638_CR22) 2009 RZ Has’minskiı̆ (638_CR16) 1963; 8 P Bianchi (638_CR6) 2019; 91 V Norkin (638_CR24) 1980; 16 M Benaïm (638_CR4) 2005; 44 638_CR10 J Bolte (638_CR7) 2007; 18 638_CR19 638_CR18 |
| References_xml | – reference: AubinJPCellinaADifferential inclusions, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 2641984BerlinSpringerhttps://doi.org/10.1007/978-3-642-69512-4. Set-valued maps and viability theory – reference: FaureMRothGErgodic properties of weak asymptotic pseudotrajectories for set-valued dynamical systemsStoch. Dyn.20131311250011,23300724910.1142/S0219493712500116https://doi.org/10.1142/S0219493712500116 – reference: ErmolievYMNorkinVSolution of nonconvex nonsmooth stochastic optimization problemsCybern. Syst. Anal.200339570171510.1023/B:CASA.0000012091.84864.65 – reference: MeynSTweedieRLMarkov Chains and Stochastic Stability20092nd edn.New YorkCambridge University Press10.1017/CBO9780511626630 – reference: AliprantisCDBorderKCInfinite Dimensional Analysis: a Hitchhiker’s Guide2006BerlinSpringer1156.46001https://doi.org/10.1007/3-540-29587-9 – reference: Ruszczyński, A.: Convergence of a stochastic subgradient method with averaging for nonsmooth nonconvex constrained optimization. Optim. Lett. 14. https://doi.org/10.1007/s11590-020-01537-8 (2020) – reference: Davis, D., Drusvyatskiy, D., Kakade, S., Lee, J.D.: Stochastic subgradient method converges on tame functions. Found Comput Math (20), 119–154. https://doi.org/10.1007/s10208-018-09409-5 (2020) – reference: Bolte, J., Pauwels, E.: Conservative set valued fields, automatic differentiation, stochastic gradient method and deep learning. arXiv:1909.10300(2019) – reference: BolteJDaniilidisALewisAShiotaMClarke subgradients of stratifiable functionsSIAM J. Optim.2007182556572233845110.1137/060670080 – reference: AubinJPFrankowskaHLasotaAPoincaré’s recurrence theorem for set-valued dynamical systemsAnn. Polon. Math.19915418591113207710.4064/ap-54-1-85-91https://doi.org/10.4064/ap-54-1-85-91 – reference: NorkinVGeneralized-differentiable functionsCybern. Syst. Anal.198016101210.1007/BF01099354https://doi.org/10.1007/BF01099354 – reference: RothGSandholmWHStochastic approximations with constant step size and differential inclusionsSIAM J. Control Optim.2013511525555303288610.1137/110844192https://doi.org/10.1137/110844192 – reference: ClarkeFHLedyaevYSSternRJWolenskiPRNonsmooth Analysis and Control Theory Graduate Texts in Mathematics, vol. 1781998New YorkSpringer – reference: ErmolievYNorkinVStochastic generalized gradient method for solving nonconvex nonsmooth stochastic optimization problemsCybern. Syst. Anal.199834219621510.1007/BF02742069https://doi.org/10.1007/BF02742069. http://pure.iiasa.ac.at/id/eprint/5415/ – reference: FollandGReal Analysis: Modern Techniques and Their Applications. Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts2013HobokenWileyhttps://books.google.fr/books?id=wI4fAwAAQBAJ – reference: IoffeADAn invitation to tame optimizationSIAM J. Optim.200919418941917248605510.1137/080722059https://doi.org/10.1137/080722059 – reference: Majewski, S., Miasojedow, B., Moulines, E.: Analysis of nonsmooth stochastic approximation: the differential inclusion approach. arXiv:1805.01916(2018) – reference: Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic Differentiation in PyTorch. In: NIPS-W (2017) – reference: van den DriesLMillerCGeometric categories and o-minimal structuresDuke. Math. J.199684249754014043370889.03025https://doi.org/10.1215/S0012-7094-96-08416-1 – reference: Mikhalevich, V., Gupal, A., Norkin, V.: Methods of nonconvex optimization. Nauka (1987) – reference: BenvenisteAMétivierMPriouretPAdaptive algorithms and stochastic approximations, Applications of Mathematics (New York), vol. 221990BerlinSpringer0752.93073https://doi.org/10.1007/978-3-642-75894-2. Translated from the French by Stephen S. Wilson – reference: BenaïmMHofbauerJSorinSStochastic approximations and differential inclusionsSIAM J. Control Optim.2005441328348217715910.1137/S0363012904439301(electronic). https://doi.org/10.1137/S0363012904439301 – reference: Kushner, H.J., Yin, G.G.: Stochastic Approximation and Recursive Algorithms and Applications, Applications of Mathematics (New York), 2nd edn., vol. 35. Springer, New York (2003). Stochastic Modelling and Applied Probability – reference: Kakade, S., Lee, J.D.: Provably correct automatic sub-differentiation for qualified programs. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems. http://papers.nips.cc/paper/7943-provably-correct-automatic-sub-differentiation-for-qualified-programs.pdf, vol. 31, pp 7125–7135. Curran Associates, Inc (2018) – reference: BianchiPHachemWSalimAConstant step stochastic approximations involving differential inclusions: stability, long-run convergence and applicationsStochastics2019912288320389586710.1080/17442508.2018.1539086https://doi.org/10.1080/17442508.2018.1539086 – reference: Has’minskiı̆RZThe averaging principle for parabolic and elliptic differential equations and Markov processes with small diffusionTeor. Verojatnost. i Primenen.19638325161044 – reference: LebourgGGeneric differentiability of Lipschitzian functionsTransactions of the American Mathematical Society197925612514454691110.1090/S0002-9947-1979-0546911-1http://www.jstor.org/stable/1998104 – volume-title: Adaptive algorithms and stochastic approximations, Applications of Mathematics (New York), vol. 22 year: 1990 ident: 638_CR5 doi: 10.1007/978-3-642-75894-2 – volume: 39 start-page: 701 issue: 5 year: 2003 ident: 638_CR13 publication-title: Cybern. Syst. Anal. doi: 10.1023/B:CASA.0000012091.84864.65 – volume: 51 start-page: 525 issue: 1 year: 2013 ident: 638_CR26 publication-title: SIAM J. Control Optim. doi: 10.1137/110844192 – volume: 13 start-page: 1250011,23 issue: 1 year: 2013 ident: 638_CR14 publication-title: Stoch. Dyn. doi: 10.1142/S0219493712500116 – volume: 34 start-page: 196 issue: 2 year: 1998 ident: 638_CR12 publication-title: Cybern. Syst. Anal. doi: 10.1007/BF02742069 – volume-title: Infinite Dimensional Analysis: a Hitchhiker’s Guide year: 2006 ident: 638_CR1 doi: 10.1007/3-540-29587-9 – volume: 54 start-page: 85 issue: 1 year: 1991 ident: 638_CR3 publication-title: Ann. Polon. Math. doi: 10.4064/ap-54-1-85-91 – volume: 19 start-page: 1894 issue: 4 year: 2009 ident: 638_CR17 publication-title: SIAM J. Optim. doi: 10.1137/080722059 – volume: 256 start-page: 125 year: 1979 ident: 638_CR20 publication-title: Transactions of the American Mathematical Society doi: 10.1090/S0002-9947-1979-0546911-1 – volume-title: Nonsmooth Analysis and Control Theory Graduate Texts in Mathematics, vol. 178 year: 1998 ident: 638_CR9 – volume: 91 start-page: 288 issue: 2 year: 2019 ident: 638_CR6 publication-title: Stochastics doi: 10.1080/17442508.2018.1539086 – volume-title: Real Analysis: Modern Techniques and Their Applications. Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts year: 2013 ident: 638_CR15 – ident: 638_CR18 – ident: 638_CR23 – volume: 44 start-page: 328 issue: 1 year: 2005 ident: 638_CR4 publication-title: SIAM J. Control Optim. doi: 10.1137/S0363012904439301 – volume: 18 start-page: 556 issue: 2 year: 2007 ident: 638_CR7 publication-title: SIAM J. Optim. doi: 10.1137/060670080 – volume-title: Differential inclusions, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 264 year: 1984 ident: 638_CR2 doi: 10.1007/978-3-642-69512-4 – volume: 84 start-page: 497 issue: 2 year: 1996 ident: 638_CR11 publication-title: Duke. Math. J. doi: 10.1215/S0012-7094-96-08416-1 – ident: 638_CR21 – ident: 638_CR25 – ident: 638_CR27 doi: 10.1007/s11590-020-01537-8 – ident: 638_CR8 doi: 10.1007/s10107-020-01501-5 – volume-title: Markov Chains and Stochastic Stability year: 2009 ident: 638_CR22 doi: 10.1017/CBO9780511626630 – ident: 638_CR10 doi: 10.1007/s10208-018-09409-5 – volume: 16 start-page: 10 year: 1980 ident: 638_CR24 publication-title: Cybern. Syst. Anal. doi: 10.1007/BF01099354 – volume: 8 start-page: 3 year: 1963 ident: 638_CR16 publication-title: Teor. Verojatnost. i Primenen. – ident: 638_CR19 |
| SSID | ssj0070091 |
| Score | 2.5110884 |
| Snippet | This paper studies the asymptotic behavior of the constant step Stochastic Gradient Descent for the minimization of an unknown function, defined as the... This paper studies the asymptotic behavior of the constant step Stochastic Gradient Descent for the minimization of an unknown function F , defined as the... |
| SourceID | hal proquest crossref springer |
| SourceType | Open Access Repository Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 1117 |
| SubjectTerms | Algorithms Analysis Asymptotic properties Back propagation Convergence Critical point Invariants Kernels Markov chains Mathematics Mathematics and Statistics Numerical Analysis Operators (mathematics) Optimization Optimization and Control Probability theory |
| Title | Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions |
| URI | https://link.springer.com/article/10.1007/s11228-022-00638-z https://www.proquest.com/docview/2692136179 https://hal.science/hal-02564349 |
| Volume | 30 |
| WOSCitedRecordID | wos000781283100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAVX databaseName: Springer Journals New Starts & Take-Overs Collection customDbUrl: eissn: 1877-0541 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0070091 issn: 1877-0533 databaseCode: RSV dateStart: 20090601 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3dS8MwED90-qAPfovziyC-aaBNsjZ5HNPpgw7BD3wQSpYPJugqWxXxrzfJ2k1FBYW2lDZNy9019zty9wvAPiFdwqSwWOgowqxrOJbcWmxiI7huECOZCotNpJ0Ov70VF2VR2LDKdq-mJMNIPSl2i4lnU3bBU_Cz-G0aZhqebcbH6Jc31fibOtQQwiyepthXmpalMt_38ckdTfd8MuQHpPllcjT4nPbi_752CRZKjImaI6NYhinTX4H58zFB63AV7lo-2zwUXhqUW9QawcQC-awvd8hVT3oGZ3QyCDlhBToa8T4hB3JRJ-_jy8fcKTmchr5eUdu5yGDFa3DdPr5qneJyoQWsaIMWmNiYyW7KuHE7V8IylnIdqVRLo5zmjHCXIsMViaVVLiJLlOTSbdQZgFM1XYdaP--bDUANzWlMFNUxl8wIyhNttEi45lFirCV1iCt5Z6pkIfeLYTxkE_5kL7nMSS4Lksve6nAwfuZpxMHxa-s9p8ZxQ0-ffdo8y_w1j-8YZeKF1mG70nJW_rTDjCSCxNRBOlGHw0qrk9s_v3Lzb823YI4Ew_CZattQKwbPZgdm1UtxPxzsBmN-BwyT7xg |
| linkProvider | Springer Nature |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3da9swED-WbtDuofvoyrJlmxh7WwW2pNjSY0mbZSwJg2QjDwWh6IMMWnskXin96yspdtKNrbCCbYwsyeburLtDd78D-EDInDAlHBYmSTCbW44Vdw7b1ApuusQqpmOxiXw85rOZ-Fonha2aaPdmSzKu1Ntkt5QENGXvPEU9i69b8JCFMjvBR598b9bf3FsN0c3ieY5DpmmdKvP3OX5TR61FCIa8ZWn-sTkadU7_yf2-9ins1zYmOl4LxTN4YIvn8Hi0AWhdHcBZL0Sbx8RLi0qHemszsUIh6stfSr1QAcEZfVrGmLAKnaxxn5A3ctG4LPDkovRMjrdxrivU9yoySvEL-NY_nfYGuC60gDXt0goTlzI1zxm3_uRaOMZybhKdG2W155wVvimxXJNUOe09skwrrvxBvQB4VtND2CnKwr4E1DWcpkRTk3LFrKA8M9aIjBueZNY50oa0obfUNQp5KIZxLrf4yYFy0lNORsrJ6zZ83Iz5ucbguLP3e8_GTccAnz04HsrQFuw7Rpm4pG3oNFyW9U-7kiQTJKXepBNtOGq4un3871e--r_u72B3MB0N5fDz-Mtr2CNRSELUWgd2quUv-wYe6cvqx2r5Ngr2DUk_8fw |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3da9RAEB_aKqIP9avFq1UX8U2XJrt7ye5jufZasR7FqvRBWPb2gxY0KXexlP71zm6SuyoqiJCEsNlswswkM8P-5rcArxibMmFUoMplGRVTL6mRIVCfeyXdkHkjbFpsopxM5OmpOr5RxZ_Q7v2UZFvTEFmaqmbnwoWdZeFbziKzMiZSyefS61W4JTCTiaCuDyef-39xiRFESrlkWdJYddqVzfx-jJ9c0-pZBEbeiDp_mShN_md8___f_AGsd7En2W2N5SGs-OoR3Hu_IG6dP4Yvo4hCTwWZntSBjNrwsSERDYaH2p6ZyOxMDmYJK9aQvZYPimDwSyZ1RU--1aj8dJrGuiJjdJ3Jujfg03j_4-iQdgswUMuHvKEs5MJMSyE97tKqIEQpXWZLZ7xFjXqFTZmXluUmWMzUCmukwY2jYaAJ8E1Yq-rKPwEydJLnzHKXSyO84rJw3qlCOpkVPgQ2gLyXvbYdO3lcJOOrXvIqR8lplJxOktPXA3i9uOei5eb4a--XqNJFx0irfbh7pGNbjPsEF-qSD2C717juPua5ZoViOcdQTw3gTa_h5eU_P3Lr37q_gDvHe2N99Hby7incZclGIphtG9aa2Xf_DG7by-Z8PnuebPwHd4r64A |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Convergence+of+Constant+Step+Stochastic+Gradient+Descent+for+Non-Smooth+Non-Convex+Functions&rft.jtitle=Set-valued+and+variational+analysis&rft.au=Bianchi%2C+Pascal&rft.au=Hachem%2C+Walid&rft.au=Schechtman%2C+Sholom&rft.date=2022-09-01&rft.pub=Springer+Netherlands&rft.issn=1877-0533&rft.eissn=1877-0541&rft.volume=30&rft.issue=3&rft.spage=1117&rft.epage=1147&rft_id=info:doi/10.1007%2Fs11228-022-00638-z&rft.externalDocID=10_1007_s11228_022_00638_z |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1877-0533&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1877-0533&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1877-0533&client=summon |