Landslide Susceptibility Assessment via Imbalanced Data Augmentation with Tabular Variational Autoencoder and Quality–Diversity Post-Selection

Landslides are among the most common geological hazards in mountainous regions, posing significant threats to resident safety and infrastructure stability. Due to the complexity of terrain and the difficulty of field surveys, landslide samples in these areas often suffer from class imbalance, which...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied sciences Jg. 15; H. 22; S. 11965
Hauptverfasser: Xu, Zhengyang, Wang, Shitai, Yin, Min, Zhang, Xiaoyu, Lu, Zengyang, Yu, Songchao, Huang, Junjun
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Basel MDPI AG 01.11.2025
Schlagworte:
ISSN:2076-3417, 2076-3417
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Landslides are among the most common geological hazards in mountainous regions, posing significant threats to resident safety and infrastructure stability. Due to the complexity of terrain and the difficulty of field surveys, landslide samples in these areas often suffer from class imbalance, which undermines the accuracy of susceptibility models. To address this issue, this study constructed a multi-factor landslide database and employed a Tabular Variational Autoencoder (TVAE) to generate synthetic samples. A Quality–Diversity (QD) screening strategy was further integrated to enhance the representativeness and diversity of the augmented data. Experimental results demonstrate that the proposed TVAE–QD method improves model performance, with generated samples showing distributions closer to real data. Compared with the Synthetic Minority Over-sampling Technique (SMOTE) and unfiltered TVAE, the TVAE–QD method achieved higher predictive accuracy and exhibited greater robustness under progressive data augmentation. In the Random Forest (RF) model, the TVAE–QD achieved its best performance at a scale of 350, with an Area Under the Curve (AUC) of 0.923 and a Precision–Recall AUC (PR–AUC) of 0.907, outperforming TVAE and SMOTE. In the Light Gradient Boosting Machine (LightGBM) model, the AUC peaked at 0.911 at a scale of 450, while the PR–AUC reached its maximum of 0.896 at a scale of 200. Shapley Additive Explanations (SHAP) analysis confirmed that data augmentation preserved interpretability: dominant factors such as elevation, rainfall, and the Normalized Difference Vegetation Index (NDVI) remained stable, with only minor adjustments among secondary variables. Overall, the TVAE–QD framework effectively mitigates class imbalance and offers a promising technical solution for landslide risk assessment in mountainous regions.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2076-3417
2076-3417
DOI:10.3390/app152211965