Advancing LightGBM with data augmentation for predicting the residual strength of corroded pipelines

Machine learning methods have been widely applied in predicting the residual strength of corroded pipelines due to their powerful predictive capabilities. However, the effective application of these techniques is constrained by the limited availability of high-quality data, as traditional pipeline b...

Full description

Saved in:
Bibliographic Details
Published in:Npj Materials degradation Vol. 9; no. 1; pp. 128 - 12
Main Authors: Wang, Qiankun, Lu, Hongfang, Li, Fan, Cheng, Y. Frank
Format: Journal Article
Language:English
Published: London Nature Publishing Group UK 22.10.2025
Nature Publishing Group
Nature Portfolio
Subjects:
ISSN:2397-2106, 2397-2106
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Machine learning methods have been widely applied in predicting the residual strength of corroded pipelines due to their powerful predictive capabilities. However, the effective application of these techniques is constrained by the limited availability of high-quality data, as traditional pipeline burst tests are both costly and time-consuming. This study addresses the challenge of data limitations by applying and comparing three advanced data augmentation models—Tabular Variational Autoencoder (TVAE), Copula Generative Adversarial Network (CopulaGAN), and conditional tabular generative adversarial network (CTGAN)—to enhance the corroded pipeline dataset. The augmented datasets were used to train a LightGBM model for residual strength prediction. Among the three, the CopulaGAN-LightGBM data augmentation yielded the best improvement, increasing the model’s R 2 by 4.46%. Additionally, SHapley Additive exPlanations (SHAP) analysis was conducted on the CopulaGAN-LightGBM model to interpret feature importance, identifying wall thickness, defect depth, and pipe diameter as the most influential factors affecting residual strength. Finally, a practical online platform implementing the proposed model has been developed to enable real-time residual strength prediction. The results demonstrate that combining LightGBM with effective data augmentation techniques provides a reliable solution to overcome data limitations in pipeline corrosion assessment.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2397-2106
2397-2106
DOI:10.1038/s41529-025-00673-9