Credibility-driven identification of cropland runoff source in surface waters using ANN-XGBoost model ensemble powered by microbial fingerprints

•Five microbial fingerprints for cropland runoff source were identified.•ANN and XGBoost emerged as optimal classifiers using fingerprint data.•An ANN-XGBoost ensemble model significantly enhanced prediction performance.•A five-tier credibility classification system was developed for predictions. Hi...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Water research (Oxford) Ročník 288; číslo Pt B; s. 124692
Hlavní autoři: Li, Pengcheng, Dong, Lu, Li, Liping, Xue, Mengzhu, Xia, Guohui, Wang, Kening, Zhang, Xin, Liu, Peng, Zhang, Cheng, Cui, Baoshan, Bai, Junhong, Liu, Xinhui
Médium: Journal Article
Jazyk:angličtina
Vydáno: England Elsevier Ltd 01.01.2026
Témata:
ISSN:0043-1354, 1879-2448, 1879-2448
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:•Five microbial fingerprints for cropland runoff source were identified.•ANN and XGBoost emerged as optimal classifiers using fingerprint data.•An ANN-XGBoost ensemble model significantly enhanced prediction performance.•A five-tier credibility classification system was developed for predictions. High-pollution-load cropland runoff threatens surface water quality globally. Conventional pollution identification using pollutant-specific fingerprints limits applicability to non-target contaminants. Here, we developed microbial fingerprinting coupled with machine learning to identify cropland runoff source. Through high-throughput sequencing of 386 samples (aquaculture wastewater, domestic sewage, cropland and orchard runoff), we screened five obligate anaerobic taxa (f_Desulfuromonadaceae, g_Geobacter, f_AKAU3564_sediment_group, o_Dehalococcoidales, and g_Citrifermentans) as microbial fingerprints for cropland runoff source, exhibiting high sensitivity (0.50–0.62) and specificity (0.81–1.00). Machine learning optimization based on simulated sink datasets identified Artificial Neural Networks (ANN) and eXtreme Gradient Boosting (XGBoost) as optimal models for fingerprint presence and relative abundance data, with accuracies of 0.8133 ± 0.0006 and 0.8261 ± 0.0029, respectively. The ANN-XGBoost model ensemble using logical rule “or” achieved 0.8400 ± 0.0292 accuracy, outperforming fingerprint-detection method by 14.69 % and individual classifiers by 2.05 %-3.36 %. Prediction uncertainty was stratified into five credibility tiers (VHC/HC/MC/LC/NC) based on confidence interval bounds at the 80 %, 90 %, 95 %, and 99 % levels, and their associated coverage properties. This study provides a reliable method for identifying pollution source using microbial fingerprints data, unrestricted by specific pollutants. [Display omitted]
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0043-1354
1879-2448
1879-2448
DOI:10.1016/j.watres.2025.124692