Credibility-driven identification of cropland runoff source in surface waters using ANN-XGBoost model ensemble powered by microbial fingerprints

•Five microbial fingerprints for cropland runoff source were identified.•ANN and XGBoost emerged as optimal classifiers using fingerprint data.•An ANN-XGBoost ensemble model significantly enhanced prediction performance.•A five-tier credibility classification system was developed for predictions. Hi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Water research (Oxford) Jg. 288; H. Pt B; S. 124692
Hauptverfasser: Li, Pengcheng, Dong, Lu, Li, Liping, Xue, Mengzhu, Xia, Guohui, Wang, Kening, Zhang, Xin, Liu, Peng, Zhang, Cheng, Cui, Baoshan, Bai, Junhong, Liu, Xinhui
Format: Journal Article
Sprache:Englisch
Veröffentlicht: England Elsevier Ltd 01.01.2026
Schlagworte:
ISSN:0043-1354, 1879-2448, 1879-2448
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Five microbial fingerprints for cropland runoff source were identified.•ANN and XGBoost emerged as optimal classifiers using fingerprint data.•An ANN-XGBoost ensemble model significantly enhanced prediction performance.•A five-tier credibility classification system was developed for predictions. High-pollution-load cropland runoff threatens surface water quality globally. Conventional pollution identification using pollutant-specific fingerprints limits applicability to non-target contaminants. Here, we developed microbial fingerprinting coupled with machine learning to identify cropland runoff source. Through high-throughput sequencing of 386 samples (aquaculture wastewater, domestic sewage, cropland and orchard runoff), we screened five obligate anaerobic taxa (f_Desulfuromonadaceae, g_Geobacter, f_AKAU3564_sediment_group, o_Dehalococcoidales, and g_Citrifermentans) as microbial fingerprints for cropland runoff source, exhibiting high sensitivity (0.50–0.62) and specificity (0.81–1.00). Machine learning optimization based on simulated sink datasets identified Artificial Neural Networks (ANN) and eXtreme Gradient Boosting (XGBoost) as optimal models for fingerprint presence and relative abundance data, with accuracies of 0.8133 ± 0.0006 and 0.8261 ± 0.0029, respectively. The ANN-XGBoost model ensemble using logical rule “or” achieved 0.8400 ± 0.0292 accuracy, outperforming fingerprint-detection method by 14.69 % and individual classifiers by 2.05 %-3.36 %. Prediction uncertainty was stratified into five credibility tiers (VHC/HC/MC/LC/NC) based on confidence interval bounds at the 80 %, 90 %, 95 %, and 99 % levels, and their associated coverage properties. This study provides a reliable method for identifying pollution source using microbial fingerprints data, unrestricted by specific pollutants. [Display omitted]
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0043-1354
1879-2448
1879-2448
DOI:10.1016/j.watres.2025.124692