Boosting sharpness-aware training with dynamic neighborhood
Learning algorithms motivated by minimizing the sharpness of loss surface is a hot research topic in improving generalization. The existing methods usually solve a constrained min–max problem to minimize sharpness and find flat minima. However, most constraints (i.e., the neighborhood of the sharpne...
Gespeichert in:
| Veröffentlicht in: | Pattern recognition Jg. 153; S. 110496 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Elsevier Ltd
01.09.2024
|
| Schlagworte: | |
| ISSN: | 0031-3203, 1873-5142 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | Learning algorithms motivated by minimizing the sharpness of loss surface is a hot research topic in improving generalization. The existing methods usually solve a constrained min–max problem to minimize sharpness and find flat minima. However, most constraints (i.e., the neighborhood of the sharpness) are inappropriate, leading to sub-optimal results. This paper theoretically explores the optimal neighborhood from the view of Probably Approximately Correct-Bayesian (PAC-Bayesian) framework. A closed form of the optimal neighborhood is provided. This neighborhood is determined by the Hessian matrix and the scales of parameters. Then a generalization bound is derived that serves as a guiding principle in the design of the sharpness minimization algorithm. The Dynamic neighborhood-based Sharpness-Aware Minimization algorithm is proposed, which can adaptively adjust the neighborhood during the training process to gain better performance. Also, the algorithm is proved can convergent at the rate O(logT/T). Experimental results demonstrate that the proposed algorithm outperforms the other methods (e.g., accuracy +2.86% over baseline on CIFAR-100 for VGG-16).
•A novel sharpness minimization algorithm is proposed to find flat minima.•The method can adaptively adjust the neighborhood of sharpness during the training.•Convergence analyses are deduced to guarantee the feasibility of the algorithm.•The algorithm outperforms other state-of-the-art baselines.•A new insight into the generalization theory and optimization algorithms is provided. |
|---|---|
| ISSN: | 0031-3203 1873-5142 |
| DOI: | 10.1016/j.patcog.2024.110496 |