Accuracy and explainability of statistical and machine learning xG models in football
Uloženo v:
| Název: | Accuracy and explainability of statistical and machine learning xG models in football |
|---|---|
| Autoři: | Cefis, Mattia, Carpita, Maurizio |
| Zdroj: | Statistics. 59:426-445 |
| Informace o vydavateli: | Informa UK Limited, 2024. |
| Rok vydání: | 2024 |
| Témata: | Expected goal, Shapley and rank graduation metrics, predictive accuracy, features explainability, unbalanced binary classification |
| Popis: | This study aims to propose an original approach to the interpretability of the explanatory variables (features) in the well-known expected goals (xG) model for shot analysis in football. To do this, a new original sample of 7801 shots from Italy’s Serie A (1 binary outcome and 26 features) for the 2022/2023 and 2023/2024 seasons were used, in which 8 new features of various types were introduced, integrating event data, performance data, and tracking data. Specifically, the performance of 8 statistical and machine learning (algorithmic) classifiers was compared. The focus was on two key aspects related to the field of explainable Artificial Intelligence (xAI), ‘accuracy’ and ‘explainability’, assessed using some appropriate metrics. Considering the accuracy metrics, among the statistical classifiers Binary Regression (BR) with the cloglog link function is the most effective. In contrast, among the algorithmic classifiers, xGBoost has the best performance but is slightly lower than the BR-cloglog. Regarding explainability, the primary contribution to the xG consistently comes from a small set of variables across all classifiers. The most influential features are the proximity to the goal, the shooting angle, and the shooter’s visual angle. |
| Druh dokumentu: | Article |
| Jazyk: | English |
| ISSN: | 1029-4910 0233-1888 |
| DOI: | 10.1080/02331888.2024.2445305 |
| Přístupové číslo: | edsair.doi.dedup.....33d5884be7db9d97313bfaa1a72cea07 |
| Databáze: | OpenAIRE |
| Abstrakt: | This study aims to propose an original approach to the interpretability of the explanatory variables (features) in the well-known expected goals (xG) model for shot analysis in football. To do this, a new original sample of 7801 shots from Italy’s Serie A (1 binary outcome and 26 features) for the 2022/2023 and 2023/2024 seasons were used, in which 8 new features of various types were introduced, integrating event data, performance data, and tracking data. Specifically, the performance of 8 statistical and machine learning (algorithmic) classifiers was compared. The focus was on two key aspects related to the field of explainable Artificial Intelligence (xAI), ‘accuracy’ and ‘explainability’, assessed using some appropriate metrics. Considering the accuracy metrics, among the statistical classifiers Binary Regression (BR) with the cloglog link function is the most effective. In contrast, among the algorithmic classifiers, xGBoost has the best performance but is slightly lower than the BR-cloglog. Regarding explainability, the primary contribution to the xG consistently comes from a small set of variables across all classifiers. The most influential features are the proximity to the goal, the shooting angle, and the shooter’s visual angle. |
|---|---|
| ISSN: | 10294910 02331888 |
| DOI: | 10.1080/02331888.2024.2445305 |
Full Text Finder
Nájsť tento článok vo Web of Science