A granular XGBoost classification algorithm

This paper proposes the first integration of multi-distance granular computing with XGBoost, significantly improving generalization in small-sample scenarios through enriched feature representations. XGBoost (eXtreme Gradient Boosting) is a machine learning algorithm primarily utilized to address cl...

Full description

Saved in:

Bibliographic Details
Published in:	Applied intelligence (Dordrecht, Netherlands) Vol. 55; no. 13; p. 895
Main Authors:	Lan, Biyun, Chen, Yumin, Wu, Keshou
Format:	Journal Article
Language:	English
Published:	New York Springer US 01.08.2025 Springer Nature B.V
Subjects:	Accuracy Algorithms Approximation Artificial Intelligence Classification Computer Science Data processing Data science Datasets Fault diagnosis Information processing Machine learning Machines Manufacturing Mechanical Engineering Processes R&D Representations Research & development Multiple distances Granular computing XGBoost Classification
ISSN:	0924-669X, 1573-7497
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper proposes the first integration of multi-distance granular computing with XGBoost, significantly improving generalization in small-sample scenarios through enriched feature representations. XGBoost (eXtreme Gradient Boosting) is a machine learning algorithm primarily utilized to address classification and regression problems. It typically relies on the original features of the data or basic feature engineering, which does not fully exploit the internal structures and correlations inherent in the data. Meanwhile, due to the problems of data sparsity, category imbalance, and the presence of noise in small sample datasets, its performance often deteriorates, leading to a decrease in the model’s generalization ability. In this paper, we propose a granular XGBoost classification algorithm designed to enhance its classification accuracy in small-sample datasets. The algorithm initially extends the dimensionality of the dataset by combining multiple features and subsequently employs various distance metrics for granulation processing, generating multiple granularity levels of data representation, resulting in richer and more diverse feature representations. The feature representations at different granularity levels are then fused separately and fed into the XGBoost classifier for training and prediction. This approach to mitigate the bias and error associated with a single metric, thereby achieving a better balance in the representations of each category within the dataset. Experimental results indicate that the classification accuracy of the granular XGBoost classification algorithm, which utilizes the multi-distance metric method, is significantly improved across various datasets when compared to the traditional XGBoost algorithm but also reinforces its robustness and generalization capability across different datasets and feature distributions, providing novel insights and methodologies to address the overfitting challenges associated with small sample data.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0924-669X 1573-7497
DOI:	10.1007/s10489-025-06762-1