Landslide Susceptibility Assessment via Imbalanced Data Augmentation with Tabular Variational Autoencoder and Quality–Diversity Post-Selection
Landslides are among the most common geological hazards in mountainous regions, posing significant threats to resident safety and infrastructure stability. Due to the complexity of terrain and the difficulty of field surveys, landslide samples in these areas often suffer from class imbalance, which...
Uložené v:
| Vydané v: | Applied sciences Ročník 15; číslo 22; s. 11965 |
|---|---|
| Hlavní autori: | , , , , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Basel
MDPI AG
01.11.2025
|
| Predmet: | |
| ISSN: | 2076-3417, 2076-3417 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | Landslides are among the most common geological hazards in mountainous regions, posing significant threats to resident safety and infrastructure stability. Due to the complexity of terrain and the difficulty of field surveys, landslide samples in these areas often suffer from class imbalance, which undermines the accuracy of susceptibility models. To address this issue, this study constructed a multi-factor landslide database and employed a Tabular Variational Autoencoder (TVAE) to generate synthetic samples. A Quality–Diversity (QD) screening strategy was further integrated to enhance the representativeness and diversity of the augmented data. Experimental results demonstrate that the proposed TVAE–QD method improves model performance, with generated samples showing distributions closer to real data. Compared with the Synthetic Minority Over-sampling Technique (SMOTE) and unfiltered TVAE, the TVAE–QD method achieved higher predictive accuracy and exhibited greater robustness under progressive data augmentation. In the Random Forest (RF) model, the TVAE–QD achieved its best performance at a scale of 350, with an Area Under the Curve (AUC) of 0.923 and a Precision–Recall AUC (PR–AUC) of 0.907, outperforming TVAE and SMOTE. In the Light Gradient Boosting Machine (LightGBM) model, the AUC peaked at 0.911 at a scale of 450, while the PR–AUC reached its maximum of 0.896 at a scale of 200. Shapley Additive Explanations (SHAP) analysis confirmed that data augmentation preserved interpretability: dominant factors such as elevation, rainfall, and the Normalized Difference Vegetation Index (NDVI) remained stable, with only minor adjustments among secondary variables. Overall, the TVAE–QD framework effectively mitigates class imbalance and offers a promising technical solution for landslide risk assessment in mountainous regions. |
|---|---|
| Bibliografia: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 2076-3417 2076-3417 |
| DOI: | 10.3390/app152211965 |