Landslide Susceptibility Assessment via Imbalanced Data Augmentation with Tabular Variational Autoencoder and Quality–Diversity Post-Selection

Landslides are among the most common geological hazards in mountainous regions, posing significant threats to resident safety and infrastructure stability. Due to the complexity of terrain and the difficulty of field surveys, landslide samples in these areas often suffer from class imbalance, which...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Applied sciences Ročník 15; číslo 22; s. 11965
Hlavní autoři: Xu, Zhengyang, Wang, Shitai, Yin, Min, Zhang, Xiaoyu, Lu, Zengyang, Yu, Songchao, Huang, Junjun
Médium: Journal Article
Jazyk:angličtina
Vydáno: Basel MDPI AG 01.11.2025
Témata:
ISSN:2076-3417, 2076-3417
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Landslides are among the most common geological hazards in mountainous regions, posing significant threats to resident safety and infrastructure stability. Due to the complexity of terrain and the difficulty of field surveys, landslide samples in these areas often suffer from class imbalance, which undermines the accuracy of susceptibility models. To address this issue, this study constructed a multi-factor landslide database and employed a Tabular Variational Autoencoder (TVAE) to generate synthetic samples. A Quality–Diversity (QD) screening strategy was further integrated to enhance the representativeness and diversity of the augmented data. Experimental results demonstrate that the proposed TVAE–QD method improves model performance, with generated samples showing distributions closer to real data. Compared with the Synthetic Minority Over-sampling Technique (SMOTE) and unfiltered TVAE, the TVAE–QD method achieved higher predictive accuracy and exhibited greater robustness under progressive data augmentation. In the Random Forest (RF) model, the TVAE–QD achieved its best performance at a scale of 350, with an Area Under the Curve (AUC) of 0.923 and a Precision–Recall AUC (PR–AUC) of 0.907, outperforming TVAE and SMOTE. In the Light Gradient Boosting Machine (LightGBM) model, the AUC peaked at 0.911 at a scale of 450, while the PR–AUC reached its maximum of 0.896 at a scale of 200. Shapley Additive Explanations (SHAP) analysis confirmed that data augmentation preserved interpretability: dominant factors such as elevation, rainfall, and the Normalized Difference Vegetation Index (NDVI) remained stable, with only minor adjustments among secondary variables. Overall, the TVAE–QD framework effectively mitigates class imbalance and offers a promising technical solution for landslide risk assessment in mountainous regions.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2076-3417
2076-3417
DOI:10.3390/app152211965