Ensemble learning predicts glass-forming ability under imbalanced datasets

The flowchart of this paper is depicted in the figure. In the data preprocessing stage, two data enhancement strategies, WERCS and SMOGN, were employed and the strategies were evaluated by PCD. For model construction, 12 commonly used ML models were collected and screened by the R2 and CI. Further,...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Computational materials science Ročník 248; s. 113601
Hlavní autoři: Cheng, Duan-jie, Liang, Yong-chao, Pu, Yuan-wei, Chen, Qian
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.02.2025
Témata:
ISSN:0927-0256
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The flowchart of this paper is depicted in the figure. In the data preprocessing stage, two data enhancement strategies, WERCS and SMOGN, were employed and the strategies were evaluated by PCD. For model construction, 12 commonly used ML models were collected and screened by the R2 and CI. Further, the composition of MLS models was optimized using BOA. The results show that the R2 and RMSE of the MLS models on the test set are 0.79 and 4.29, respectively. In addition, the generalization ability of the MLS models was verified in the Cu-Mg-Ca alloy system. [Display omitted] •Multi-layer stacking ensemble learning with multi-model fusion for predicting GFA.•WERCS and SMOGN data enhancement strategies to address data imbalance.•Utilized Bayesian optimization algorithm to select the composition of models.•Achieved a prediction accuracy of 0.79, surpassing other mentioned models. With the development of artificial intelligence, machine learning (ML) is widely used to predict glass-forming ability (GFA). However, GFA experimental data usually exhibits a long-tailed distribution, and the similarity between the enhanced dataset and the original dataset is unclear. In terms of modeling, although model fusion provides better prediction results than individual learners, it also faces the risk of overfitting. Therefore, two preprocessing methods designed for regression problems WEighted Relevance-based Combination Strategy (WERCS) and Synthetic Minority Over-sampling technique with Gaussian Noise (SMOGN) are employed. The best strategy is selected by Pairwise correlation difference (PCD). Based on the screening results, this paper further proposes a multi-layer stacking ensemble learning model (MLS) for predicting GFA. Considering model accuracy and diversity together, the base model and meta-model combinations are optimized by Bayesian optimization algorithm (BOA). The results show that MLS achieves R2 = 0.79 in prediction accuracy, which is better than other models and criteria discussed in this paper. In addition, the generalization ability of the MLS model is verified in the Cu-Mg-Ca alloy system. To explain the MLS model, SHapley Additive exPlanation (SHAP) is introduced. With the help of MLS and SHAP methods, the formation law of bulk metallic glasses (BMGs) is revealed, and the BMGs of Zr-Cu-Al-Ag series alloys are successfully designed.
ISSN:0927-0256
DOI:10.1016/j.commatsci.2024.113601