A granular XGBoost classification algorithm
This paper proposes the first integration of multi-distance granular computing with XGBoost, significantly improving generalization in small-sample scenarios through enriched feature representations. XGBoost (eXtreme Gradient Boosting) is a machine learning algorithm primarily utilized to address cl...
Saved in:
| Published in: | Applied intelligence (Dordrecht, Netherlands) Vol. 55; no. 13; p. 895 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York
Springer US
01.08.2025
Springer Nature B.V |
| Subjects: | |
| ISSN: | 0924-669X, 1573-7497 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This paper proposes the first integration of multi-distance granular computing with XGBoost, significantly improving generalization in small-sample scenarios through enriched feature representations. XGBoost (eXtreme Gradient Boosting) is a machine learning algorithm primarily utilized to address classification and regression problems. It typically relies on the original features of the data or basic feature engineering, which does not fully exploit the internal structures and correlations inherent in the data. Meanwhile, due to the problems of data sparsity, category imbalance, and the presence of noise in small sample datasets, its performance often deteriorates, leading to a decrease in the model’s generalization ability. In this paper, we propose a granular XGBoost classification algorithm designed to enhance its classification accuracy in small-sample datasets. The algorithm initially extends the dimensionality of the dataset by combining multiple features and subsequently employs various distance metrics for granulation processing, generating multiple granularity levels of data representation, resulting in richer and more diverse feature representations. The feature representations at different granularity levels are then fused separately and fed into the XGBoost classifier for training and prediction. This approach to mitigate the bias and error associated with a single metric, thereby achieving a better balance in the representations of each category within the dataset. Experimental results indicate that the classification accuracy of the granular XGBoost classification algorithm, which utilizes the multi-distance metric method, is significantly improved across various datasets when compared to the traditional XGBoost algorithm but also reinforces its robustness and generalization capability across different datasets and feature distributions, providing novel insights and methodologies to address the overfitting challenges associated with small sample data. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0924-669X 1573-7497 |
| DOI: | 10.1007/s10489-025-06762-1 |