Evaluation of Logistic Regression and Advanced Machine Learning Algorithms to Differentiate between Type 1 and Type 2 Diabetes in India
Abstract Aim: We attempted to determine whether machine learning (ML) models outperform logistic regression (LR), a traditional prediction method, in distinguishing type 1 diabetes (T1D) from type 2 diabetes (T2D). Materials and Methods: Utilizing data from individuals of Indian origin diagnosed wit...
Uložené v:
| Vydané v: | Journal of diabetology Ročník 16; číslo 3; s. 231 - 239 |
|---|---|
| Hlavní autori: | , , , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
India
Wolters Kluwer - Medknow
01.07.2025
Wolters Kluwer Medknow Publications |
| Vydanie: | 3 |
| Predmet: | |
| ISSN: | 2543-3288, 2078-7685 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | Abstract
Aim:
We attempted to determine whether machine learning (ML) models outperform logistic regression (LR), a traditional prediction method, in distinguishing type 1 diabetes (T1D) from type 2 diabetes (T2D).
Materials and Methods:
Utilizing data from individuals of Indian origin diagnosed with diabetes between the ages of 10 and 30 years (n = 3531), we evaluated the ability of seven supervised ML algorithms (LR, gradient boosting [GB], decision tree, k-nearest neighbors, random forest [RF], support vector machine [SVM], and Naïve Bayes) to distinguish between T1D and T2D based on eight predictor variables: age at diagnosis, body mass index, total cholesterol, triglycerides, high-density lipoprotein, glycated hemoglobin, parental history, and glutamic acid decarboxylase antibody status. The dataset was split into training (70%) and testing (30%) subsets, and a grid search approach was employed for hyperparameter tuning to optimize model performance.
Results:
All fine-tuned ML algorithms demonstrated excellent discriminative ability, with high receiver operating characteristic (ROC) area under the curve (AUC) values (>0.95). GBM (AUC = 0.9700), LR (AUC = 0.9691), and SVM (AUC = 0.9686) emerged as the top-performing models, showing similar and superior performance in distinguishing between T1D and T2D. These algorithms also exhibited strong correlations in their predictions (LR-SVM: 1.000; LR-GB: 0.979; SVM-GB: 0.980). Additionally, LR, SVM, GB, and RF provided the highest net benefit across a wide range of threshold probabilities, highlighting their clinical utility for decision-making.
Conclusion:
In diabetes classification, the classic LR model proved comparable performance to advanced ML algorithms. |
|---|---|
| ISSN: | 2543-3288 2078-7685 |
| DOI: | 10.4103/jod.jod_12_25 |