Boosting sharpness-aware training with dynamic neighborhood

Learning algorithms motivated by minimizing the sharpness of loss surface is a hot research topic in improving generalization. The existing methods usually solve a constrained min–max problem to minimize sharpness and find flat minima. However, most constraints (i.e., the neighborhood of the sharpne...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition Jg. 153; S. 110496
Hauptverfasser: Chen, Junhong, Li, Hong, Chen, C.L. Philip
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier Ltd 01.09.2024
Schlagworte:
ISSN:0031-3203, 1873-5142
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Learning algorithms motivated by minimizing the sharpness of loss surface is a hot research topic in improving generalization. The existing methods usually solve a constrained min–max problem to minimize sharpness and find flat minima. However, most constraints (i.e., the neighborhood of the sharpness) are inappropriate, leading to sub-optimal results. This paper theoretically explores the optimal neighborhood from the view of Probably Approximately Correct-Bayesian (PAC-Bayesian) framework. A closed form of the optimal neighborhood is provided. This neighborhood is determined by the Hessian matrix and the scales of parameters. Then a generalization bound is derived that serves as a guiding principle in the design of the sharpness minimization algorithm. The Dynamic neighborhood-based Sharpness-Aware Minimization algorithm is proposed, which can adaptively adjust the neighborhood during the training process to gain better performance. Also, the algorithm is proved can convergent at the rate O(logT/T). Experimental results demonstrate that the proposed algorithm outperforms the other methods (e.g., accuracy +2.86% over baseline on CIFAR-100 for VGG-16). •A novel sharpness minimization algorithm is proposed to find flat minima.•The method can adaptively adjust the neighborhood of sharpness during the training.•Convergence analyses are deduced to guarantee the feasibility of the algorithm.•The algorithm outperforms other state-of-the-art baselines.•A new insight into the generalization theory and optimization algorithms is provided.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2024.110496