A deterministic gradient-based approach to avoid saddle points
Loss functions with a large number of saddle points are one of the major obstacles for training modern machine learning (ML) models efficiently. First-order methods such as gradient descent (GD) are usually the methods of choice for training ML models. However, these methods converge to saddle point...
Gespeichert in:
| Veröffentlicht in: | European journal of applied mathematics Jg. 34; H. 4; S. 738 - 757 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
United States
Cambridge University Press
01.08.2023
|
| Schlagworte: | |
| ISSN: | 0956-7925, 1469-4425 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Loss functions with a large number of saddle points are one of the major obstacles for training modern machine learning (ML) models efficiently. First-order methods such as gradient descent (GD) are usually the methods of choice for training ML models. However, these methods converge to saddle points for certain choices of initial guesses. In this paper, we propose a modification of the recently proposed Laplacian smoothing gradient descent (LSGD) [Osher et al.,
arXiv:1806.06317
], called modified LSGD (mLSGD), and demonstrate its potential to avoid saddle points without sacrificing the convergence rate. Our analysis is based on the attraction region, formed by all starting points for which the considered numerical scheme converges to a saddle point. We investigate the attraction region’s dimension both analytically and numerically. For a canonical class of quadratic functions, we show that the dimension of the attraction region for mLSGD is
$\lfloor (n-1)/2\rfloor$
, and hence it is significantly smaller than that of GD whose dimension is
$n-1$
. |
|---|---|
| AbstractList | Loss functions with a large number of saddle points are one of the major obstacles for training modern machine learning (ML) models efficiently. First-order methods such as gradient descent (GD) are usually the methods of choice for training ML models. However, these methods converge to saddle points for certain choices of initial guesses. In this paper, we propose a modification of the recently proposed Laplacian smoothing gradient descent (LSGD) [Osher et al.,
arXiv:1806.06317
], called modified LSGD (mLSGD), and demonstrate its potential to avoid saddle points without sacrificing the convergence rate. Our analysis is based on the attraction region, formed by all starting points for which the considered numerical scheme converges to a saddle point. We investigate the attraction region’s dimension both analytically and numerically. For a canonical class of quadratic functions, we show that the dimension of the attraction region for mLSGD is
$\lfloor (n-1)/2\rfloor$
, and hence it is significantly smaller than that of GD whose dimension is
$n-1$
. Abstract Loss functions with a large number of saddle points are one of the major obstacles for training modern machine learning (ML) models efficiently. First-order methods such as gradient descent (GD) are usually the methods of choice for training ML models. However, these methods converge to saddle points for certain choices of initial guesses. In this paper, we propose a modification of the recently proposed Laplacian smoothing gradient descent (LSGD) [Osher et al., arXiv:1806.06317 ], called modified LSGD (mLSGD), and demonstrate its potential to avoid saddle points without sacrificing the convergence rate. Our analysis is based on the attraction region, formed by all starting points for which the considered numerical scheme converges to a saddle point. We investigate the attraction region’s dimension both analytically and numerically. For a canonical class of quadratic functions, we show that the dimension of the attraction region for mLSGD is<inline-graphic href='S0956792522000316_inline1.png' mime-subtype='png'/>$\lfloor (n-1)/2\rfloor$, and hence it is significantly smaller than that of GD whose dimension is<inline-graphic href='S0956792522000316_inline2.png' mime-subtype='png'/>$n-1$. |
| Author | Wang, B. Kreusser, L. M. Osher, S. J. |
| Author_xml | – sequence: 1 givenname: L. M. orcidid: 0000-0002-1131-1125 surname: Kreusser fullname: Kreusser, L. M. – sequence: 2 givenname: S. J. surname: Osher fullname: Osher, S. J. – sequence: 3 givenname: B. surname: Wang fullname: Wang, B. |
| BackLink | https://www.osti.gov/biblio/2419645$$D View this record in Osti.gov |
| BookMark | eNp9kE1LAzEURYNUsK3-AHfB_WheJpk0G6EUv6DgQl0Pb5KMjUyTIQmC_94pdaXg6i3uOZfHXZBZiMERcgnsGhiomxemZaM0l5wzxmpoTsgcRKMrIbickfkhrg75GVnk_MEY1EzpObldU-uKS3sffC7e0PeE1rtQqg6zsxTHMUU0O1oixc_oLc1o7eDoGH0o-Zyc9jhkd_Fzl-Tt_u5181htnx-eNuttZbjipVIry0SNiCClsaiNXkmDyuoeEDRI0TW2410tGoas6bhVru-QA7haip739ZJcHXvj9GSbjS_O7EwMwZnScgG6EXKC1BEyKeacXN9OHBYfQ0nohxZYe9iq_bPVZMIvc0x-j-nrH-cbgxFsqg |
| CitedBy_id | crossref_primary_10_26634_jmat_13_2_21164 |
| Cites_doi | 10.1038/323533a0 10.1561/2200000006 10.1145/3055399.3055464 10.1007/978-981-15-5232-8_47 10.1007/s10107-019-01374-3 10.1007/s10107-006-0706-8 10.1007/978-981-15-5232-8_14 10.1137/17M1113898 10.1137/19M1294356 10.1109/CVPR.2016.90 10.1007/s10208-017-9365-9 10.1007/s10107-016-1026-2 10.1137/17M1150116 10.1007/s10107-018-1335-8 10.1007/s11263-015-0816-y |
| ContentType | Journal Article |
| CorporateAuthor | Purdue Univ., West Lafayette, IN (United States) Hysitron, Inc., Minneapolis, MN (United States) |
| CorporateAuthor_xml | – name: Hysitron, Inc., Minneapolis, MN (United States) – name: Purdue Univ., West Lafayette, IN (United States) |
| DBID | AAYXX CITATION OTOTI |
| DOI | 10.1017/S0956792522000316 |
| DatabaseName | CrossRef OSTI.GOV |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Mathematics |
| EISSN | 1469-4425 |
| EndPage | 757 |
| ExternalDocumentID | 2419645 10_1017_S0956792522000316 |
| GroupedDBID | -1D -1F -2P -2V -E. -~6 -~N .DC .FH 09C 09E 0E1 0R~ 29G 4.4 5GY 5VS 6~7 74X 74Y 7~V 88I 8FE 8FG 8R4 8R5 9M5 AAAZR AABES AABWE AACJH AAGFV AAKNA AAKTX AAMNQ AANRG AARAB AASVR AATMM AAUIS AAUKB AAYXX ABBXD ABBZL ABEFU ABGDZ ABITZ ABJCF ABJNI ABKKG ABMWE ABQTM ABQWD ABROB ABTCQ ABUWG ABVFV ABVKB ABVZP ABXAU ABXHF ABZCX ABZUI ACAJB ACBMC ACDLN ACEJA ACETC ACGFO ACGFS ACGOD ACIMK ACIWK ACRPL ACUIJ ACYZP ACZBM ACZUX ACZWT ADCGK ADDNB ADFEC ADFRT ADKIL ADNMO ADOVH ADOVT ADVJH AEBAK AEBPU AEHGV AEMFK AEMTW AENCP AENEX AENGE AFFHD AFFUJ AFKQG AFKRA AFLOS AFLVW AFUTZ AFZFC AGABE AGBYD AGJUD AGLWM AGQPQ AHQXX AHRGI AI. AIGNW AIHIV AIOIP AISIE AJ7 AJCYY AJPFC AJQAS AKMAY AKZCZ ALMA_UNASSIGNED_HOLDINGS ALWZO AMVHM ANOYL AQJOH ARABE ARAPS ARZZG ATUCA AUXHV AYIQA AZQEC BBLKV BCGOX BENPR BESQT BGHMG BGLVJ BJBOZ BLZWO BMAJL BPHCQ BQFHP C0O CAG CBIIA CCPQU CCQAD CCUQV CDIZJ CFAFE CFBFF CGQII CHEAL CITATION CJCSC COF CS3 DC4 DOHLZ DU5 DWQXO EBS EGQIC EJD F5P GNUQQ GROUPED_DOAJ HCIFZ HG- HST HZ~ I.6 I.7 I.9 IH6 IOEEP IOO IPYYG IS6 I~P J36 J38 J3A JHPGK JQKCU K6V K7- KAFGG KCGVB KFECR L6V L98 LHUNA LW7 M-V M2P M7S M7~ M8. NIKVX NMFBF NZEOI O9- OYBOY P2P P62 PHGZM PHGZT PQGLB PQQKQ PROAC PTHSS PYCCK Q2X RAMDC RCA ROL RR0 S0W S6- S6U SAAAG T9M UT1 VH1 VOH WFFJZ WQ3 WXU WYP ZDLDU ZJOSE ZMEZD ZYDXJ ~V1 AEYYC OTOTI |
| ID | FETCH-LOGICAL-c272t-78d043aaa155cda9c985ca7d9f1a19154b6db2b3460a06b2d7efba211e354f2f3 |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000880467400001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0956-7925 |
| IngestDate | Mon Aug 12 05:46:45 EDT 2024 Sat Nov 29 02:25:01 EST 2025 Tue Nov 18 20:45:13 EST 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 4 |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c272t-78d043aaa155cda9c985ca7d9f1a19154b6db2b3460a06b2d7efba211e354f2f3 |
| Notes | USDOE Office of Science (SC) SC0002722; SC0021142 |
| ORCID | 0000-0002-1131-1125 0000000211311125 |
| PageCount | 20 |
| ParticipantIDs | osti_scitechconnect_2419645 crossref_citationtrail_10_1017_S0956792522000316 crossref_primary_10_1017_S0956792522000316 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-08-01 |
| PublicationDateYYYYMMDD | 2023-08-01 |
| PublicationDate_xml | – month: 08 year: 2023 text: 2023-08-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | European journal of applied mathematics |
| PublicationYear | 2023 |
| Publisher | Cambridge University Press |
| Publisher_xml | – name: Cambridge University Press |
| References | Paternain (S0956792522000316_ref25) 2019; 29 S0956792522000316_ref30 S0956792522000316_ref32 S0956792522000316_ref10 Sun (S0956792522000316_ref28) 2018; 18 S0956792522000316_ref31 S0956792522000316_ref12 S0956792522000316_ref11 S0956792522000316_ref33 Nesterov (S0956792522000316_ref22) 2006; 108 S0956792522000316_ref24 S0956792522000316_ref27 S0956792522000316_ref26 S0956792522000316_ref29 S0956792522000316_ref21 S0956792522000316_ref20 S0956792522000316_ref23 S0956792522000316_ref2 S0956792522000316_ref4 S0956792522000316_ref5 S0956792522000316_ref1 S0956792522000316_ref14 S0956792522000316_ref13 S0956792522000316_ref16 Carmon (S0956792522000316_ref3) 2019; 29 S0956792522000316_ref18 S0956792522000316_ref6 S0956792522000316_ref7 S0956792522000316_ref17 Lee (S0956792522000316_ref15) 2019; 176 S0956792522000316_ref8 S0956792522000316_ref9 S0956792522000316_ref19 |
| References_xml | – ident: S0956792522000316_ref30 – ident: S0956792522000316_ref26 doi: 10.1038/323533a0 – ident: S0956792522000316_ref2 doi: 10.1561/2200000006 – ident: S0956792522000316_ref32 – ident: S0956792522000316_ref1 doi: 10.1145/3055399.3055464 – ident: S0956792522000316_ref12 doi: 10.1007/978-981-15-5232-8_47 – ident: S0956792522000316_ref13 – ident: S0956792522000316_ref18 – ident: S0956792522000316_ref16 – ident: S0956792522000316_ref14 – ident: S0956792522000316_ref20 – volume: 176 start-page: 311 year: 2019 ident: S0956792522000316_ref15 article-title: First-order methods almost always avoid strict saddle points publication-title: Math. Program. doi: 10.1007/s10107-019-01374-3 – volume: 108 start-page: 177 year: 2006 ident: S0956792522000316_ref22 article-title: Cubic regularization of newton method and its global performance publication-title: Math. Program. doi: 10.1007/s10107-006-0706-8 – ident: S0956792522000316_ref29 doi: 10.1007/978-981-15-5232-8_14 – volume: 29 start-page: 2146 year: 2019 ident: S0956792522000316_ref3 article-title: Gradient descent finds the cubic-regularized nonconvex Newton step publication-title: SIAM J. Optim. doi: 10.1137/17M1113898 – ident: S0956792522000316_ref8 – ident: S0956792522000316_ref24 – ident: S0956792522000316_ref6 – ident: S0956792522000316_ref31 – ident: S0956792522000316_ref33 doi: 10.1137/19M1294356 – ident: S0956792522000316_ref10 – ident: S0956792522000316_ref19 – ident: S0956792522000316_ref11 doi: 10.1109/CVPR.2016.90 – volume: 18 start-page: 1131 year: 2018 ident: S0956792522000316_ref28 article-title: A geometric analysis of phase retrieval publication-title: Found. Comput. Math. doi: 10.1007/s10208-017-9365-9 – ident: S0956792522000316_ref17 – ident: S0956792522000316_ref5 doi: 10.1007/s10107-016-1026-2 – volume: 29 start-page: 343 year: 2019 ident: S0956792522000316_ref25 article-title: A Newton-based method for nonconvex optimization with fast evasion of saddle points publication-title: SIAM J. Optim. doi: 10.1137/17M1150116 – ident: S0956792522000316_ref21 – ident: S0956792522000316_ref4 doi: 10.1007/s10107-018-1335-8 – ident: S0956792522000316_ref23 – ident: S0956792522000316_ref9 – ident: S0956792522000316_ref7 – ident: S0956792522000316_ref27 doi: 10.1007/s11263-015-0816-y |
| SSID | ssj0013079 |
| Score | 2.3193293 |
| Snippet | Loss functions with a large number of saddle points are one of the major obstacles for training modern machine learning (ML) models efficiently. First-order... Abstract Loss functions with a large number of saddle points are one of the major obstacles for training modern machine learning (ML) models efficiently.... |
| SourceID | osti crossref |
| SourceType | Open Access Repository Enrichment Source Index Database |
| StartPage | 738 |
| SubjectTerms | Mathematics |
| Title | A deterministic gradient-based approach to avoid saddle points |
| URI | https://www.osti.gov/biblio/2419645 |
| Volume | 34 |
| WOSCitedRecordID | wos000880467400001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVPQU databaseName: Advanced Technologies & Aerospace Database customDbUrl: eissn: 1469-4425 dateEnd: 20241207 omitProxy: false ssIdentifier: ssj0013079 issn: 0956-7925 databaseCode: P5Z dateStart: 20010201 isFulltext: true titleUrlDefault: https://search.proquest.com/hightechjournals providerName: ProQuest – providerCode: PRVPQU databaseName: Computer Science Database customDbUrl: eissn: 1469-4425 dateEnd: 20241207 omitProxy: false ssIdentifier: ssj0013079 issn: 0956-7925 databaseCode: K7- dateStart: 20010201 isFulltext: true titleUrlDefault: http://search.proquest.com/compscijour providerName: ProQuest – providerCode: PRVPQU databaseName: Engineering Database customDbUrl: eissn: 1469-4425 dateEnd: 20241207 omitProxy: false ssIdentifier: ssj0013079 issn: 0956-7925 databaseCode: M7S dateStart: 20010201 isFulltext: true titleUrlDefault: http://search.proquest.com providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: eissn: 1469-4425 dateEnd: 20241207 omitProxy: false ssIdentifier: ssj0013079 issn: 0956-7925 databaseCode: BENPR dateStart: 20010201 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: Science Database customDbUrl: eissn: 1469-4425 dateEnd: 20241207 omitProxy: false ssIdentifier: ssj0013079 issn: 0956-7925 databaseCode: M2P dateStart: 20010201 isFulltext: true titleUrlDefault: https://search.proquest.com/sciencejournals providerName: ProQuest |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3BatwwEBXbbQ7toaRtQtK0QYeeamy8km1Zl8K2NITChkBTmpsZW3JYSOxl4yz5_I4s2d5N0pAcehFGyGPQG49Go9EbQj6nkUpKLrQvUkOqzXPtQ1wmvtBKJwIXYAZlW2xCnJyk5-fydDRadXdhVpeiqtLbW7n4r1BjH4Jtrs4-A-5eKHbgM4KOLcKO7ZOAn3rKpbi0HMzexbLN6mp8s2CpnkTcOJ2wqufKuwYTo_AW9dzROj0YqXdeKziv9aqnex1Og5b6xhzut1v9wJsFfQC3q67-K_B-9r1_XKT6W7AeeWC8z3sbQogIqrQXlwNtDShut31EOF63sC5cOV8PH7TmUlhmF7fyCktVfc-oOyYo8znzNWZuF_HJAwTadxa2Pt3QZrKJ7J6IF-QlE7E01nDGTofTp3DgaDTDu9Pwlmr8jogNf2ZcI7Br_snZNnnjNhZ0ahXiLRnp6h15PRtgek--TumGatBN1aCdatCmpq1qUKsa1KrGDvl99OPs-7Hv6mf4BROswd9PhREHAPQZCwWykGlcgFCynABu0-MoN8XEch4lIYRJzpTQZQ5sMtE8jkpW8l0yrupK7xHKY8FCxVBCLtHFhzRPylSB0pHMeQqwT8JuFrLCkcubGieX2T_nfp986V9ZWGaVxwYfmKnN0C003MaFSQIrmgzdT5lE8YfniDogrwZl_kjGzfJGfyJbxaqZXy8PW0X4C47DcqM |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+deterministic+gradient-based+approach+to+avoid+saddle+points&rft.jtitle=European+journal+of+applied+mathematics&rft.au=Kreusser%2C+L.+M.&rft.au=Osher%2C+S.+J.&rft.au=Wang%2C+B.&rft.date=2023-08-01&rft.issn=0956-7925&rft.eissn=1469-4425&rft.volume=34&rft.issue=4&rft.spage=738&rft.epage=757&rft_id=info:doi/10.1017%2FS0956792522000316&rft.externalDBID=n%2Fa&rft.externalDocID=10_1017_S0956792522000316 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0956-7925&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0956-7925&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0956-7925&client=summon |