Evaluation of Logistic Regression and Advanced Machine Learning Algorithms to Differentiate between Type 1 and Type 2 Diabetes in India
Abstract Aim: We attempted to determine whether machine learning (ML) models outperform logistic regression (LR), a traditional prediction method, in distinguishing type 1 diabetes (T1D) from type 2 diabetes (T2D). Materials and Methods: Utilizing data from individuals of Indian origin diagnosed wit...
Saved in:
| Published in: | Journal of diabetology Vol. 16; no. 3; pp. 231 - 239 |
|---|---|
| Main Authors: | , , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
India
Wolters Kluwer - Medknow
01.07.2025
Wolters Kluwer Medknow Publications |
| Edition: | 3 |
| Subjects: | |
| ISSN: | 2543-3288, 2078-7685 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract
Aim:
We attempted to determine whether machine learning (ML) models outperform logistic regression (LR), a traditional prediction method, in distinguishing type 1 diabetes (T1D) from type 2 diabetes (T2D).
Materials and Methods:
Utilizing data from individuals of Indian origin diagnosed with diabetes between the ages of 10 and 30 years (n = 3531), we evaluated the ability of seven supervised ML algorithms (LR, gradient boosting [GB], decision tree, k-nearest neighbors, random forest [RF], support vector machine [SVM], and Naïve Bayes) to distinguish between T1D and T2D based on eight predictor variables: age at diagnosis, body mass index, total cholesterol, triglycerides, high-density lipoprotein, glycated hemoglobin, parental history, and glutamic acid decarboxylase antibody status. The dataset was split into training (70%) and testing (30%) subsets, and a grid search approach was employed for hyperparameter tuning to optimize model performance.
Results:
All fine-tuned ML algorithms demonstrated excellent discriminative ability, with high receiver operating characteristic (ROC) area under the curve (AUC) values (>0.95). GBM (AUC = 0.9700), LR (AUC = 0.9691), and SVM (AUC = 0.9686) emerged as the top-performing models, showing similar and superior performance in distinguishing between T1D and T2D. These algorithms also exhibited strong correlations in their predictions (LR-SVM: 1.000; LR-GB: 0.979; SVM-GB: 0.980). Additionally, LR, SVM, GB, and RF provided the highest net benefit across a wide range of threshold probabilities, highlighting their clinical utility for decision-making.
Conclusion:
In diabetes classification, the classic LR model proved comparable performance to advanced ML algorithms. |
|---|---|
| ISSN: | 2543-3288 2078-7685 |
| DOI: | 10.4103/jod.jod_12_25 |