A deterministic gradient-based approach to avoid saddle points

Loss functions with a large number of saddle points are one of the major obstacles for training modern machine learning (ML) models efficiently. First-order methods such as gradient descent (GD) are usually the methods of choice for training ML models. However, these methods converge to saddle point...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:European journal of applied mathematics Jg. 34; H. 4; S. 738 - 757
Hauptverfasser: Kreusser, L. M., Osher, S. J., Wang, B.
Format: Journal Article
Sprache:Englisch
Veröffentlicht: United States Cambridge University Press 01.08.2023
Schlagworte:
ISSN:0956-7925, 1469-4425
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Loss functions with a large number of saddle points are one of the major obstacles for training modern machine learning (ML) models efficiently. First-order methods such as gradient descent (GD) are usually the methods of choice for training ML models. However, these methods converge to saddle points for certain choices of initial guesses. In this paper, we propose a modification of the recently proposed Laplacian smoothing gradient descent (LSGD) [Osher et al., arXiv:1806.06317 ], called modified LSGD (mLSGD), and demonstrate its potential to avoid saddle points without sacrificing the convergence rate. Our analysis is based on the attraction region, formed by all starting points for which the considered numerical scheme converges to a saddle point. We investigate the attraction region’s dimension both analytically and numerically. For a canonical class of quadratic functions, we show that the dimension of the attraction region for mLSGD is $\lfloor (n-1)/2\rfloor$ , and hence it is significantly smaller than that of GD whose dimension is $n-1$ .
AbstractList Loss functions with a large number of saddle points are one of the major obstacles for training modern machine learning (ML) models efficiently. First-order methods such as gradient descent (GD) are usually the methods of choice for training ML models. However, these methods converge to saddle points for certain choices of initial guesses. In this paper, we propose a modification of the recently proposed Laplacian smoothing gradient descent (LSGD) [Osher et al., arXiv:1806.06317 ], called modified LSGD (mLSGD), and demonstrate its potential to avoid saddle points without sacrificing the convergence rate. Our analysis is based on the attraction region, formed by all starting points for which the considered numerical scheme converges to a saddle point. We investigate the attraction region’s dimension both analytically and numerically. For a canonical class of quadratic functions, we show that the dimension of the attraction region for mLSGD is $\lfloor (n-1)/2\rfloor$ , and hence it is significantly smaller than that of GD whose dimension is $n-1$ .
Abstract Loss functions with a large number of saddle points are one of the major obstacles for training modern machine learning (ML) models efficiently. First-order methods such as gradient descent (GD) are usually the methods of choice for training ML models. However, these methods converge to saddle points for certain choices of initial guesses. In this paper, we propose a modification of the recently proposed Laplacian smoothing gradient descent (LSGD) [Osher et al., arXiv:1806.06317 ], called modified LSGD (mLSGD), and demonstrate its potential to avoid saddle points without sacrificing the convergence rate. Our analysis is based on the attraction region, formed by all starting points for which the considered numerical scheme converges to a saddle point. We investigate the attraction region’s dimension both analytically and numerically. For a canonical class of quadratic functions, we show that the dimension of the attraction region for mLSGD is<inline-graphic href='S0956792522000316_inline1.png' mime-subtype='png'/>$\lfloor (n-1)/2\rfloor$, and hence it is significantly smaller than that of GD whose dimension is<inline-graphic href='S0956792522000316_inline2.png' mime-subtype='png'/>$n-1$.
Author Wang, B.
Kreusser, L. M.
Osher, S. J.
Author_xml – sequence: 1
  givenname: L. M.
  orcidid: 0000-0002-1131-1125
  surname: Kreusser
  fullname: Kreusser, L. M.
– sequence: 2
  givenname: S. J.
  surname: Osher
  fullname: Osher, S. J.
– sequence: 3
  givenname: B.
  surname: Wang
  fullname: Wang, B.
BackLink https://www.osti.gov/biblio/2419645$$D View this record in Osti.gov
BookMark eNp9kE1LAzEURYNUsK3-AHfB_WheJpk0G6EUv6DgQl0Pb5KMjUyTIQmC_94pdaXg6i3uOZfHXZBZiMERcgnsGhiomxemZaM0l5wzxmpoTsgcRKMrIbickfkhrg75GVnk_MEY1EzpObldU-uKS3sffC7e0PeE1rtQqg6zsxTHMUU0O1oixc_oLc1o7eDoGH0o-Zyc9jhkd_Fzl-Tt_u5181htnx-eNuttZbjipVIry0SNiCClsaiNXkmDyuoeEDRI0TW2410tGoas6bhVru-QA7haip739ZJcHXvj9GSbjS_O7EwMwZnScgG6EXKC1BEyKeacXN9OHBYfQ0nohxZYe9iq_bPVZMIvc0x-j-nrH-cbgxFsqg
CitedBy_id crossref_primary_10_26634_jmat_13_2_21164
Cites_doi 10.1038/323533a0
10.1561/2200000006
10.1145/3055399.3055464
10.1007/978-981-15-5232-8_47
10.1007/s10107-019-01374-3
10.1007/s10107-006-0706-8
10.1007/978-981-15-5232-8_14
10.1137/17M1113898
10.1137/19M1294356
10.1109/CVPR.2016.90
10.1007/s10208-017-9365-9
10.1007/s10107-016-1026-2
10.1137/17M1150116
10.1007/s10107-018-1335-8
10.1007/s11263-015-0816-y
ContentType Journal Article
CorporateAuthor Purdue Univ., West Lafayette, IN (United States)
Hysitron, Inc., Minneapolis, MN (United States)
CorporateAuthor_xml – name: Hysitron, Inc., Minneapolis, MN (United States)
– name: Purdue Univ., West Lafayette, IN (United States)
DBID AAYXX
CITATION
OTOTI
DOI 10.1017/S0956792522000316
DatabaseName CrossRef
OSTI.GOV
DatabaseTitle CrossRef
DatabaseTitleList CrossRef

DeliveryMethod fulltext_linktorsrc
Discipline Mathematics
EISSN 1469-4425
EndPage 757
ExternalDocumentID 2419645
10_1017_S0956792522000316
GroupedDBID -1D
-1F
-2P
-2V
-E.
-~6
-~N
.DC
.FH
09C
09E
0E1
0R~
29G
4.4
5GY
5VS
6~7
74X
74Y
7~V
88I
8FE
8FG
8R4
8R5
9M5
AAAZR
AABES
AABWE
AACJH
AAGFV
AAKNA
AAKTX
AAMNQ
AANRG
AARAB
AASVR
AATMM
AAUIS
AAUKB
AAYXX
ABBXD
ABBZL
ABEFU
ABGDZ
ABITZ
ABJCF
ABJNI
ABKKG
ABMWE
ABQTM
ABQWD
ABROB
ABTCQ
ABUWG
ABVFV
ABVKB
ABVZP
ABXAU
ABXHF
ABZCX
ABZUI
ACAJB
ACBMC
ACDLN
ACEJA
ACETC
ACGFO
ACGFS
ACGOD
ACIMK
ACIWK
ACRPL
ACUIJ
ACYZP
ACZBM
ACZUX
ACZWT
ADCGK
ADDNB
ADFEC
ADFRT
ADKIL
ADNMO
ADOVH
ADOVT
ADVJH
AEBAK
AEBPU
AEHGV
AEMFK
AEMTW
AENCP
AENEX
AENGE
AFFHD
AFFUJ
AFKQG
AFKRA
AFLOS
AFLVW
AFUTZ
AFZFC
AGABE
AGBYD
AGJUD
AGLWM
AGQPQ
AHQXX
AHRGI
AI.
AIGNW
AIHIV
AIOIP
AISIE
AJ7
AJCYY
AJPFC
AJQAS
AKMAY
AKZCZ
ALMA_UNASSIGNED_HOLDINGS
ALWZO
AMVHM
ANOYL
AQJOH
ARABE
ARAPS
ARZZG
ATUCA
AUXHV
AYIQA
AZQEC
BBLKV
BCGOX
BENPR
BESQT
BGHMG
BGLVJ
BJBOZ
BLZWO
BMAJL
BPHCQ
BQFHP
C0O
CAG
CBIIA
CCPQU
CCQAD
CCUQV
CDIZJ
CFAFE
CFBFF
CGQII
CHEAL
CITATION
CJCSC
COF
CS3
DC4
DOHLZ
DU5
DWQXO
EBS
EGQIC
EJD
F5P
GNUQQ
GROUPED_DOAJ
HCIFZ
HG-
HST
HZ~
I.6
I.7
I.9
IH6
IOEEP
IOO
IPYYG
IS6
I~P
J36
J38
J3A
JHPGK
JQKCU
K6V
K7-
KAFGG
KCGVB
KFECR
L6V
L98
LHUNA
LW7
M-V
M2P
M7S
M7~
M8.
NIKVX
NMFBF
NZEOI
O9-
OYBOY
P2P
P62
PHGZM
PHGZT
PQGLB
PQQKQ
PROAC
PTHSS
PYCCK
Q2X
RAMDC
RCA
ROL
RR0
S0W
S6-
S6U
SAAAG
T9M
UT1
VH1
VOH
WFFJZ
WQ3
WXU
WYP
ZDLDU
ZJOSE
ZMEZD
ZYDXJ
~V1
AEYYC
OTOTI
ID FETCH-LOGICAL-c272t-78d043aaa155cda9c985ca7d9f1a19154b6db2b3460a06b2d7efba211e354f2f3
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000880467400001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0956-7925
IngestDate Mon Aug 12 05:46:45 EDT 2024
Sat Nov 29 02:25:01 EST 2025
Tue Nov 18 20:45:13 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 4
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c272t-78d043aaa155cda9c985ca7d9f1a19154b6db2b3460a06b2d7efba211e354f2f3
Notes USDOE Office of Science (SC)
SC0002722; SC0021142
ORCID 0000-0002-1131-1125
0000000211311125
PageCount 20
ParticipantIDs osti_scitechconnect_2419645
crossref_citationtrail_10_1017_S0956792522000316
crossref_primary_10_1017_S0956792522000316
PublicationCentury 2000
PublicationDate 2023-08-01
PublicationDateYYYYMMDD 2023-08-01
PublicationDate_xml – month: 08
  year: 2023
  text: 2023-08-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle European journal of applied mathematics
PublicationYear 2023
Publisher Cambridge University Press
Publisher_xml – name: Cambridge University Press
References Paternain (S0956792522000316_ref25) 2019; 29
S0956792522000316_ref30
S0956792522000316_ref32
S0956792522000316_ref10
Sun (S0956792522000316_ref28) 2018; 18
S0956792522000316_ref31
S0956792522000316_ref12
S0956792522000316_ref11
S0956792522000316_ref33
Nesterov (S0956792522000316_ref22) 2006; 108
S0956792522000316_ref24
S0956792522000316_ref27
S0956792522000316_ref26
S0956792522000316_ref29
S0956792522000316_ref21
S0956792522000316_ref20
S0956792522000316_ref23
S0956792522000316_ref2
S0956792522000316_ref4
S0956792522000316_ref5
S0956792522000316_ref1
S0956792522000316_ref14
S0956792522000316_ref13
S0956792522000316_ref16
Carmon (S0956792522000316_ref3) 2019; 29
S0956792522000316_ref18
S0956792522000316_ref6
S0956792522000316_ref7
S0956792522000316_ref17
Lee (S0956792522000316_ref15) 2019; 176
S0956792522000316_ref8
S0956792522000316_ref9
S0956792522000316_ref19
References_xml – ident: S0956792522000316_ref30
– ident: S0956792522000316_ref26
  doi: 10.1038/323533a0
– ident: S0956792522000316_ref2
  doi: 10.1561/2200000006
– ident: S0956792522000316_ref32
– ident: S0956792522000316_ref1
  doi: 10.1145/3055399.3055464
– ident: S0956792522000316_ref12
  doi: 10.1007/978-981-15-5232-8_47
– ident: S0956792522000316_ref13
– ident: S0956792522000316_ref18
– ident: S0956792522000316_ref16
– ident: S0956792522000316_ref14
– ident: S0956792522000316_ref20
– volume: 176
  start-page: 311
  year: 2019
  ident: S0956792522000316_ref15
  article-title: First-order methods almost always avoid strict saddle points
  publication-title: Math. Program.
  doi: 10.1007/s10107-019-01374-3
– volume: 108
  start-page: 177
  year: 2006
  ident: S0956792522000316_ref22
  article-title: Cubic regularization of newton method and its global performance
  publication-title: Math. Program.
  doi: 10.1007/s10107-006-0706-8
– ident: S0956792522000316_ref29
  doi: 10.1007/978-981-15-5232-8_14
– volume: 29
  start-page: 2146
  year: 2019
  ident: S0956792522000316_ref3
  article-title: Gradient descent finds the cubic-regularized nonconvex Newton step
  publication-title: SIAM J. Optim.
  doi: 10.1137/17M1113898
– ident: S0956792522000316_ref8
– ident: S0956792522000316_ref24
– ident: S0956792522000316_ref6
– ident: S0956792522000316_ref31
– ident: S0956792522000316_ref33
  doi: 10.1137/19M1294356
– ident: S0956792522000316_ref10
– ident: S0956792522000316_ref19
– ident: S0956792522000316_ref11
  doi: 10.1109/CVPR.2016.90
– volume: 18
  start-page: 1131
  year: 2018
  ident: S0956792522000316_ref28
  article-title: A geometric analysis of phase retrieval
  publication-title: Found. Comput. Math.
  doi: 10.1007/s10208-017-9365-9
– ident: S0956792522000316_ref17
– ident: S0956792522000316_ref5
  doi: 10.1007/s10107-016-1026-2
– volume: 29
  start-page: 343
  year: 2019
  ident: S0956792522000316_ref25
  article-title: A Newton-based method for nonconvex optimization with fast evasion of saddle points
  publication-title: SIAM J. Optim.
  doi: 10.1137/17M1150116
– ident: S0956792522000316_ref21
– ident: S0956792522000316_ref4
  doi: 10.1007/s10107-018-1335-8
– ident: S0956792522000316_ref23
– ident: S0956792522000316_ref9
– ident: S0956792522000316_ref7
– ident: S0956792522000316_ref27
  doi: 10.1007/s11263-015-0816-y
SSID ssj0013079
Score 2.3193293
Snippet Loss functions with a large number of saddle points are one of the major obstacles for training modern machine learning (ML) models efficiently. First-order...
Abstract Loss functions with a large number of saddle points are one of the major obstacles for training modern machine learning (ML) models efficiently....
SourceID osti
crossref
SourceType Open Access Repository
Enrichment Source
Index Database
StartPage 738
SubjectTerms Mathematics
Title A deterministic gradient-based approach to avoid saddle points
URI https://www.osti.gov/biblio/2419645
Volume 34
WOSCitedRecordID wos000880467400001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVPQU
  databaseName: Advanced Technologies & Aerospace Database
  customDbUrl:
  eissn: 1469-4425
  dateEnd: 20241207
  omitProxy: false
  ssIdentifier: ssj0013079
  issn: 0956-7925
  databaseCode: P5Z
  dateStart: 20010201
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/hightechjournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Computer Science Database
  customDbUrl:
  eissn: 1469-4425
  dateEnd: 20241207
  omitProxy: false
  ssIdentifier: ssj0013079
  issn: 0956-7925
  databaseCode: K7-
  dateStart: 20010201
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/compscijour
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Engineering Database
  customDbUrl:
  eissn: 1469-4425
  dateEnd: 20241207
  omitProxy: false
  ssIdentifier: ssj0013079
  issn: 0956-7925
  databaseCode: M7S
  dateStart: 20010201
  isFulltext: true
  titleUrlDefault: http://search.proquest.com
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl:
  eissn: 1469-4425
  dateEnd: 20241207
  omitProxy: false
  ssIdentifier: ssj0013079
  issn: 0956-7925
  databaseCode: BENPR
  dateStart: 20010201
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Science Database
  customDbUrl:
  eissn: 1469-4425
  dateEnd: 20241207
  omitProxy: false
  ssIdentifier: ssj0013079
  issn: 0956-7925
  databaseCode: M2P
  dateStart: 20010201
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/sciencejournals
  providerName: ProQuest
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3BatwwEBXbbQ7toaRtQtK0QYeeamy8km1Zl8K2NITChkBTmpsZW3JYSOxl4yz5_I4s2d5N0pAcehFGyGPQG49Go9EbQj6nkUpKLrQvUkOqzXPtQ1wmvtBKJwIXYAZlW2xCnJyk5-fydDRadXdhVpeiqtLbW7n4r1BjH4Jtrs4-A-5eKHbgM4KOLcKO7ZOAn3rKpbi0HMzexbLN6mp8s2CpnkTcOJ2wqufKuwYTo_AW9dzROj0YqXdeKziv9aqnex1Og5b6xhzut1v9wJsFfQC3q67-K_B-9r1_XKT6W7AeeWC8z3sbQogIqrQXlwNtDShut31EOF63sC5cOV8PH7TmUlhmF7fyCktVfc-oOyYo8znzNWZuF_HJAwTadxa2Pt3QZrKJ7J6IF-QlE7E01nDGTofTp3DgaDTDu9Pwlmr8jogNf2ZcI7Br_snZNnnjNhZ0ahXiLRnp6h15PRtgek--TumGatBN1aCdatCmpq1qUKsa1KrGDvl99OPs-7Hv6mf4BROswd9PhREHAPQZCwWykGlcgFCynABu0-MoN8XEch4lIYRJzpTQZQ5sMtE8jkpW8l0yrupK7xHKY8FCxVBCLtHFhzRPylSB0pHMeQqwT8JuFrLCkcubGieX2T_nfp986V9ZWGaVxwYfmKnN0C003MaFSQIrmgzdT5lE8YfniDogrwZl_kjGzfJGfyJbxaqZXy8PW0X4C47DcqM
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+deterministic+gradient-based+approach+to+avoid+saddle+points&rft.jtitle=European+journal+of+applied+mathematics&rft.au=Kreusser%2C+L.+M.&rft.au=Osher%2C+S.+J.&rft.au=Wang%2C+B.&rft.date=2023-08-01&rft.issn=0956-7925&rft.eissn=1469-4425&rft.volume=34&rft.issue=4&rft.spage=738&rft.epage=757&rft_id=info:doi/10.1017%2FS0956792522000316&rft.externalDBID=n%2Fa&rft.externalDocID=10_1017_S0956792522000316
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0956-7925&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0956-7925&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0956-7925&client=summon