Novel and Efficient Approximations for Zero-One and Ranking Losses of Linear Classifiers
The predictive quality of machine learning models is typically measured in terms of their (approximate) expected prediction accuracy or the so-called Area Under the Curve (AUC). Minimizing the reciprocals of these measures – the expected risk or the ranking loss – is goal of supervised learning. How...
Saved in:
| Published in: | Vietnam journal of mathematics Vol. 53; no. 4; pp. 815 - 834 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Heidelberg
Springer Nature B.V
01.10.2025
|
| Subjects: | |
| ISSN: | 2305-221X, 2305-2228 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The predictive quality of machine learning models is typically measured in terms of their (approximate) expected prediction accuracy or the so-called Area Under the Curve (AUC). Minimizing the reciprocals of these measures – the expected risk or the ranking loss – is goal of supervised learning. However, when the models are constructed by means of empirical risk minimization (ERM), surrogate functions such as the logistic loss or hinge loss are optimized instead. This is done because empirical approximations of the expected error and the ranking loss are step functions that have zero derivatives almost everywhere. In this work, we show that in the case of linear predictors, the expected error and the expected ranking loss can be effectively approximated by smooth functions whose closed form expressions and those of their first (and second) order derivatives depend on the mean vector and covariance matrix of the data distribution, which can be precomputed. Hence, the complexity of an optimization algorithm applied to these functions does not depend on the number of samples in the training dataset. These approximation functions are derived under the assumption that the output of the linear classifier for a given dataset has an approximately normal distribution. We present empirical evidence that this assumption is significantly weaker than the Gaussian assumption on the data itself and we support this claim by demonstrating that our new approximation is quite accurate on datasets that are not necessarily Gaussian. We present computational results that show that our proposed approximations and related optimization algorithms can produce linear classifiers with similar or better test accuracy or AUC than those obtained using state-of-the-art approaches, in a fraction of the time. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 2305-221X 2305-2228 |
| DOI: | 10.1007/s10013-025-00767-6 |