Comparative analysis of seven machine learning algorithms and five empirical models to estimate soil thermal conductivity
•The accuracy of 7 ML algorithms and 5 thermal conductivity models was evaluated.•The average RMSE values of ML algorithms were 66%∼82% of the empirical model values.•Soil moisture and bulk density together had > 80% of the feature importance value. Soil thermal conductivity (λ) is an important t...
Uloženo v:
| Vydáno v: | Agricultural and forest meteorology Ročník 323; s. 109080 |
|---|---|
| Hlavní autoři: | , , , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier B.V
15.08.2022
|
| Témata: | |
| ISSN: | 0168-1923, 1873-2240 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | •The accuracy of 7 ML algorithms and 5 thermal conductivity models was evaluated.•The average RMSE values of ML algorithms were 66%∼82% of the empirical model values.•Soil moisture and bulk density together had > 80% of the feature importance value.
Soil thermal conductivity (λ) is an important thermal property that is crucial for surface energy balance and water balance studies. 1602 measured soil thermal conductivity values representing 189 soils were used to evaluate five empirical models (i.e., de Vries (1963) model (de Vries 1963), Campbell (1985) model (Campbell1985), Johansen (1975) model (Johansen 1975), Côté and Konrad (2005) model (Côté and Konrad 2005), and Lu et al. (2007) model (Lu 2007)) and seven machine learning (ML) algorithms (i.e., Decision Tree (DT), Random Forest (RF), Gradient Boosting Decision Tree (GBDT), Linear Regression (LR), K-Nearest Neighbors (KNN), Neural Network (NN), and Gaussian Process (GP)) to estimate λ. Our results demonstrated that the average root mean squared error (RMSE) values of ML were 66% and 82% of the empirical model values on validation and test sets respectively. The three best ML algorithms (GBDT, NN, RF) performed significantly better than the three best empirical models (Lu 2007, Côté and Konrad 2005, Johansen 1975): 0.183 < RMSE < 0.259 (W m−1 K−1) for ML algorithms and 0.293 < RMSE < 0.320 (W m−1 K−1) for empirical models. For ML, we recommend the GBDT, NN and RF algorithms. For empirical models, we recommend to use three normalized models (Lu 2007, Côté and Konrad 2005, Johansen 1975) over the physically-based model (DV1963) and the regression model (CG1985). The feature importance rankings performed by the RF and GBDT algorithms show that soil moisture content and soil bulk density are the most critical factors affecting λ. Soil moisture content and soil bulk density together account for more than 80% of the influence importance value of λ. RF gives more consistent feature importance ranking results than GBDT, therefore, we recommend the use of RF for selecting features. |
|---|---|
| Bibliografie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 0168-1923 1873-2240 |
| DOI: | 10.1016/j.agrformet.2022.109080 |