A granular XGBoost classification algorithm

This paper proposes the first integration of multi-distance granular computing with XGBoost, significantly improving generalization in small-sample scenarios through enriched feature representations. XGBoost (eXtreme Gradient Boosting) is a machine learning algorithm primarily utilized to address cl...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied intelligence (Dordrecht, Netherlands) Jg. 55; H. 13; S. 895
Hauptverfasser: Lan, Biyun, Chen, Yumin, Wu, Keshou
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York Springer US 01.08.2025
Springer Nature B.V
Schlagworte:
ISSN:0924-669X, 1573-7497
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper proposes the first integration of multi-distance granular computing with XGBoost, significantly improving generalization in small-sample scenarios through enriched feature representations. XGBoost (eXtreme Gradient Boosting) is a machine learning algorithm primarily utilized to address classification and regression problems. It typically relies on the original features of the data or basic feature engineering, which does not fully exploit the internal structures and correlations inherent in the data. Meanwhile, due to the problems of data sparsity, category imbalance, and the presence of noise in small sample datasets, its performance often deteriorates, leading to a decrease in the model’s generalization ability. In this paper, we propose a granular XGBoost classification algorithm designed to enhance its classification accuracy in small-sample datasets. The algorithm initially extends the dimensionality of the dataset by combining multiple features and subsequently employs various distance metrics for granulation processing, generating multiple granularity levels of data representation, resulting in richer and more diverse feature representations. The feature representations at different granularity levels are then fused separately and fed into the XGBoost classifier for training and prediction. This approach to mitigate the bias and error associated with a single metric, thereby achieving a better balance in the representations of each category within the dataset. Experimental results indicate that the classification accuracy of the granular XGBoost classification algorithm, which utilizes the multi-distance metric method, is significantly improved across various datasets when compared to the traditional XGBoost algorithm but also reinforces its robustness and generalization capability across different datasets and feature distributions, providing novel insights and methodologies to address the overfitting challenges associated with small sample data.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0924-669X
1573-7497
DOI:10.1007/s10489-025-06762-1