Improving prediction of water quality indices using novel hybrid machine-learning algorithms

River water quality assessment is one of the most important tasks to enhance water resources management plans. A water quality index (WQI) considers several water quality variables simultaneously. Traditionally WQI calculations consume time and are often fraught with errors during derivations of sub...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:The Science of the total environment Ročník 721; s. 137612
Hlavní autoři: Bui, Duie Tien, Khosravi, Khabat, Tiefenbacher, John, Nguyen, Hoang, Kazakis, Nerantzis
Médium: Journal Article
Jazyk:angličtina
Vydáno: Netherlands Elsevier B.V 15.06.2020
Témata:
ISSN:0048-9697, 1879-1026, 1879-1026
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:River water quality assessment is one of the most important tasks to enhance water resources management plans. A water quality index (WQI) considers several water quality variables simultaneously. Traditionally WQI calculations consume time and are often fraught with errors during derivations of sub-indices. In this study, 4 standalone (random forest (RF), M5P, random tree (RT), and reduced error pruning tree (REPT)) and 12 hybrid data-mining algorithms (combinations of standalones with bagging (BA), CV parameter selection (CVPS) and randomizable filtered classification (RFC)) were used to create Iran WQI (IRWQIsc) predictions. Six years (2012 to 2018) of monthly data from two water quality monitoring stations within the Talar catchment were compiled. Using Pearson correlation coefficients, 10 different input combinations were constructed. The data were divided into two groups (ratio 70:30) for model building (training dataset) and model validation (testing dataset) using a 10-fold cross-validation technique. The models were evaluated using several statistical and visual evaluation metrics. Result show that fecal coliform (FC) and total solids (TS) had the greatest and least effect on the prediction of IRWQIsc. The best input combinations varied among the algorithms; generally variables with very low correlations displayed weaker performance. Hybrid algorithms improved the prediction power of several of the standalone models, but not all. Hybrid BA-RT outperformed the other models (R2 = 0.941, RMSE = 2.71, MAE = 1.87, NSE = 0.941, PBIAS = 0.500). PBIAS indicated that all algorithms, with the exceptions of RT, BA-RT and CVPS-REPT, overestimated WQI values. [Display omitted] •16 novel hybrid data mining algorithm applied for WQI prediction•BA-RT algorithm outperformed while RFC-RT has the lowest prediction power.•Fecal coliform was the most effective predictor on WQI estimation.•The best input combination is not the same for all models.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0048-9697
1879-1026
1879-1026
DOI:10.1016/j.scitotenv.2020.137612