Fast Test Error Rates for Gradient-Based Algorithms on Separable Data
In recent research aimed at understanding the strong generalization performance of simple gradient-based methods on overparameterized models, it has been demonstrated that when training a linear predictor on separable data with an exponentially-tailed loss function, the predictor converges towards t...
Saved in:
| Published in: | Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) pp. 7440 - 7444 |
|---|---|
| Main Authors: | , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
14.04.2024
|
| Subjects: | |
| ISSN: | 2379-190X |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | In recent research aimed at understanding the strong generalization performance of simple gradient-based methods on overparameterized models, it has been demonstrated that when training a linear predictor on separable data with an exponentially-tailed loss function, the predictor converges towards the max-margin classifier direction, explaining its resistance to overfitting asymptotically. Moreover, recent findings have shown that overfitting is not a concern even in finite-time scenarios (non-asymptotically), as finite-time generalization bounds have been derived for gradient flow, gradient descent (GD), and stochastic GD. In this work, we extend this line of research and obtain new finite-time generalization bounds for other popular first-order methods, namely normalized GD and Nesterov's accelerated GD. Our results reveal that these methods, as they converge more rapidly in terms of training loss, also exhibit enhanced generalization performance in terms of test error. |
|---|---|
| ISSN: | 2379-190X |
| DOI: | 10.1109/ICASSP48485.2024.10447312 |