High-Precision Prediction of Total Nitrogen Based on Distance Correlation and Machine Learning Models—A Case Study of Dongjiang River, China
Excessive total nitrogen (TN) in water bodies leads to eutrophication, algal blooms, and hypoxia, which pose significant risks to aquatic ecosystems and human health. Accurate real-time TN prediction is crucial for effective water quality management. This study presents an innovative approach that c...
Uloženo v:
| Vydáno v: | Water (Basel) Ročník 17; číslo 8; s. 1131 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Basel
MDPI AG
10.04.2025
|
| Témata: | |
| ISSN: | 2073-4441, 2073-4441 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Excessive total nitrogen (TN) in water bodies leads to eutrophication, algal blooms, and hypoxia, which pose significant risks to aquatic ecosystems and human health. Accurate real-time TN prediction is crucial for effective water quality management. This study presents an innovative approach that combines the distance correlation coefficient (DCC) for feature selection with a coupled Attention-Convolutional Neural Network-Bidirectional Long Short-Term Memory (At-CBiLSTM) model to predict TN concentrations in the Dongjiang River in China. A dataset of 28,922 time-series data points was collected from seven sampling sites along the Dongjiang River, spanning from November 2020 to February 2023. The DCC method identified conductivity, Permanganate Index (CODMn), and total phosphorus as the most significant predictors for TN levels. The At-CBiLSTM model, optimized with a time step of three, outperformed other models, including standalone Long Short-Term Memory (LSTM), Bi-directional LSTM (Bi-LSTM), Convolutional Neural Network LSTM (CNN-LSTM), and Attention-LSTM variants, achieving excellent performance with the following metrics: mean absolute error (MAE) = 0.032, mean squared error (MSE) = 0.005, mean absolute percentage error (MAPE) = 0.218, and root mean squared error (RMSE) = 0.045. Importantly, increasing the number of input features beyond three variables led to a decline in model accuracy, underscoring the importance of DCC-driven feature selection. The results highlight that combining DCC with deep learning models, particularly At-CBiLSTM, effectively captures nonlinear temporal dependencies and improves prediction accuracy. This approach provides a solid foundation for real-time water quality monitoring and can inform targeted pollution control strategies in river ecosystems. |
|---|---|
| Bibliografie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 2073-4441 2073-4441 |
| DOI: | 10.3390/w17081131 |