Comparative analysis of surface water quality prediction performance and identification of key water parameters using different machine learning models based on big data

The water quality prediction performance of machine learning models may be not only dependent on the models, but also dependent on the parameters in data set chosen for training the learning models. Moreover, the key water parameters should also be identified by the learning models, in order to furt...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Water research (Oxford) Ročník 171; s. 115454
Hlavní autoři:	Chen, Kangyang, Chen, Hexia, Zhou, Chuanlong, Huang, Yichao, Qi, Xiangyang, Shen, Ruqin, Liu, Fengrui, Zuo, Min, Zou, Xinyi, Wang, Jinfeng, Zhang, Yan, Chen, Da, Chen, Xingguo, Deng, Yongfeng, Ren, Hongqiang
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	England Elsevier Ltd 15.03.2020
Témata:	ammonium nitrogen Big Data China data collection decision support systems Deep cascade forest Ensemble methods forests Machine Learning Machine learning models prediction surface water The key water parameters Water Water Quality Water quality prediction China The key water parameters Water quality prediction Machine learning models Ensemble methods Deep cascade forest
ISSN:	0043-1354, 1879-2448, 1879-2448
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	The water quality prediction performance of machine learning models may be not only dependent on the models, but also dependent on the parameters in data set chosen for training the learning models. Moreover, the key water parameters should also be identified by the learning models, in order to further reduce prediction costs and improve prediction efficiency. Here we endeavored for the first time to compare the water quality prediction performance of 10 learning models (7 traditional and 3 ensemble models) using big data (33,612 observations) from the major rivers and lakes in China from 2012 to 2018, based on the precision, recall, F1-score, weighted F1-score, and explore the potential key water parameters for future model prediction. Our results showed that the bigger data could improve the performance of learning models in prediction of water quality. Compared to other 7 models, decision tree (DT), random forest (RF) and deep cascade forest (DCF) trained by data sets of pH, DO, CODMn, and NH3–N had significantly better performance in prediction of all 6 Levels of water quality recommended by Chinese government. Moreover, two key water parameter sets (DO, CODMn, and NH3–N; CODMn, and NH3–N) were identified and validated by DT, RF and DCF to be high specificities for perdition water quality. Therefore, DT, RF and DCF with selected key water parameters could be prioritized for future water quality monitoring and providing timely water quality warning. [Display omitted] •Big data could improve the water quality prediction performance of models.•DCF with best performance was identified for future water quality prediction.•Two key water parameter sets were identified for future rapid water monitoring.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0043-1354 1879-2448 1879-2448
DOI:	10.1016/j.watres.2019.115454