Data-driven prediction of daily Cryptosporidium river concentrations for water resource management: Use of catchment-averaged vs spatially distributed features in a Bagging-XGBoost model

Cryptosporidium is a waterborne pathogen which poses a major challenge to water utilities because of its resistance to chlorination and its infectivity at very low concentrations. The ability to make predictions of Cryptosporidium concentrations in rivers would aid significantly in abstraction-based...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:The Science of the total environment Ročník 991; s. 179794
Hlavní autoři: Smalley, Alan L., Douterelo, Isabel, Chipps, Michael, Shucksmith, James D.
Médium: Journal Article
Jazyk:angličtina
Vydáno: Netherlands Elsevier B.V 20.08.2025
Témata:
ISSN:0048-9697, 1879-1026, 1879-1026
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Cryptosporidium is a waterborne pathogen which poses a major challenge to water utilities because of its resistance to chlorination and its infectivity at very low concentrations. The ability to make predictions of Cryptosporidium concentrations in rivers would aid significantly in abstraction-based risk management of water resources, but current models are inappropriate for making predictions at the temporal resolutions required to inform abstraction decision-making. This study utilises Cryptosporidium data collected over 7 years at a major river abstraction site in South East England, alongside publicly-available remote sensing data, to train a Bagging-XGBoost model for Cryptosporidium predictive applications at daily timescales. Different combinations of catchment-averaged and spatially distributed datasets were trialled as model inputs. The highest-performing models predicted 69–75 % of >1 oocysts L−1 exceedances, and they also predicted the timing of 78–89 % of higher (>2 oocysts L−1) exceedances. Interpretation of predictions using SHapley Additive exPlanations analysis indicated that sources near (<30 km) to the intake were the most important and identified catchment-averaged rainfall at 1 and 2-day lag time and antecedent Cryptosporidium measurements as significant inputs. The study demonstrates the potential of such models when an unparsimonious approach to feature selection is taken, because of their ability to discern non-linear trends and their resistance to multicollinearity and redundancy in the input data. Such models could improve the ability of water utilities to predict Cryptosporidium peaks and aid abstraction decision-making, thereby reducing the loadings of this pathogen to reservoirs and water treatment works. [Display omitted] •Cryptosporidium river concentrations were modelled in a large, complex catchment.•A range of potential inputs were trialled to explore effects on model performance.•Final models predicted 69–75 % of >1 oocysts L−1 exceedances.•Explainable AI methods revealed importance of both animal and human/urban sources.•Models can inform abstraction strategy to reduce Cryptosporidium loadings to WTWs.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0048-9697
1879-1026
1879-1026
DOI:10.1016/j.scitotenv.2025.179794