Accuracy and explainability of statistical and machine learning xG models in football

Uloženo v:
Podrobná bibliografie
Název: Accuracy and explainability of statistical and machine learning xG models in football
Autoři: Cefis, Mattia, Carpita, Maurizio
Zdroj: Statistics. 59:426-445
Informace o vydavateli: Informa UK Limited, 2024.
Rok vydání: 2024
Témata: Expected goal, Shapley and rank graduation metrics, predictive accuracy, features explainability, unbalanced binary classification
Popis: This study aims to propose an original approach to the interpretability of the explanatory variables (features) in the well-known expected goals (xG) model for shot analysis in football. To do this, a new original sample of 7801 shots from Italy’s Serie A (1 binary outcome and 26 features) for the 2022/2023 and 2023/2024 seasons were used, in which 8 new features of various types were introduced, integrating event data, performance data, and tracking data. Specifically, the performance of 8 statistical and machine learning (algorithmic) classifiers was compared. The focus was on two key aspects related to the field of explainable Artificial Intelligence (xAI), ‘accuracy’ and ‘explainability’, assessed using some appropriate metrics. Considering the accuracy metrics, among the statistical classifiers Binary Regression (BR) with the cloglog link function is the most effective. In contrast, among the algorithmic classifiers, xGBoost has the best performance but is slightly lower than the BR-cloglog. Regarding explainability, the primary contribution to the xG consistently comes from a small set of variables across all classifiers. The most influential features are the proximity to the goal, the shooting angle, and the shooter’s visual angle.
Druh dokumentu: Article
Jazyk: English
ISSN: 1029-4910
0233-1888
DOI: 10.1080/02331888.2024.2445305
Přístupové číslo: edsair.doi.dedup.....33d5884be7db9d97313bfaa1a72cea07
Databáze: OpenAIRE
Popis
Abstrakt:This study aims to propose an original approach to the interpretability of the explanatory variables (features) in the well-known expected goals (xG) model for shot analysis in football. To do this, a new original sample of 7801 shots from Italy’s Serie A (1 binary outcome and 26 features) for the 2022/2023 and 2023/2024 seasons were used, in which 8 new features of various types were introduced, integrating event data, performance data, and tracking data. Specifically, the performance of 8 statistical and machine learning (algorithmic) classifiers was compared. The focus was on two key aspects related to the field of explainable Artificial Intelligence (xAI), ‘accuracy’ and ‘explainability’, assessed using some appropriate metrics. Considering the accuracy metrics, among the statistical classifiers Binary Regression (BR) with the cloglog link function is the most effective. In contrast, among the algorithmic classifiers, xGBoost has the best performance but is slightly lower than the BR-cloglog. Regarding explainability, the primary contribution to the xG consistently comes from a small set of variables across all classifiers. The most influential features are the proximity to the goal, the shooting angle, and the shooter’s visual angle.
ISSN:10294910
02331888
DOI:10.1080/02331888.2024.2445305