Data-driven prediction of daily Cryptosporidium river concentrations for water resource management: Use of catchment-averaged vs spatially distributed features in a Bagging-XGBoost model
Cryptosporidium is a waterborne pathogen which poses a major challenge to water utilities because of its resistance to chlorination and its infectivity at very low concentrations. The ability to make predictions of Cryptosporidium concentrations in rivers would aid significantly in abstraction-based...
Uloženo v:
| Vydáno v: | The Science of the total environment Ročník 991; s. 179794 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Netherlands
Elsevier B.V
20.08.2025
|
| Témata: | |
| ISSN: | 0048-9697, 1879-1026, 1879-1026 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Cryptosporidium is a waterborne pathogen which poses a major challenge to water utilities because of its resistance to chlorination and its infectivity at very low concentrations. The ability to make predictions of Cryptosporidium concentrations in rivers would aid significantly in abstraction-based risk management of water resources, but current models are inappropriate for making predictions at the temporal resolutions required to inform abstraction decision-making. This study utilises Cryptosporidium data collected over 7 years at a major river abstraction site in South East England, alongside publicly-available remote sensing data, to train a Bagging-XGBoost model for Cryptosporidium predictive applications at daily timescales. Different combinations of catchment-averaged and spatially distributed datasets were trialled as model inputs. The highest-performing models predicted 69–75 % of >1 oocysts L−1 exceedances, and they also predicted the timing of 78–89 % of higher (>2 oocysts L−1) exceedances. Interpretation of predictions using SHapley Additive exPlanations analysis indicated that sources near (<30 km) to the intake were the most important and identified catchment-averaged rainfall at 1 and 2-day lag time and antecedent Cryptosporidium measurements as significant inputs. The study demonstrates the potential of such models when an unparsimonious approach to feature selection is taken, because of their ability to discern non-linear trends and their resistance to multicollinearity and redundancy in the input data. Such models could improve the ability of water utilities to predict Cryptosporidium peaks and aid abstraction decision-making, thereby reducing the loadings of this pathogen to reservoirs and water treatment works.
[Display omitted]
•Cryptosporidium river concentrations were modelled in a large, complex catchment.•A range of potential inputs were trialled to explore effects on model performance.•Final models predicted 69–75 % of >1 oocysts L−1 exceedances.•Explainable AI methods revealed importance of both animal and human/urban sources.•Models can inform abstraction strategy to reduce Cryptosporidium loadings to WTWs. |
|---|---|
| Bibliografie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 0048-9697 1879-1026 1879-1026 |
| DOI: | 10.1016/j.scitotenv.2025.179794 |