Evaluation of Logistic Regression and Advanced Machine Learning Algorithms to Differentiate between Type 1 and Type 2 Diabetes in India

Abstract Aim: We attempted to determine whether machine learning (ML) models outperform logistic regression (LR), a traditional prediction method, in distinguishing type 1 diabetes (T1D) from type 2 diabetes (T2D). Materials and Methods: Utilizing data from individuals of Indian origin diagnosed wit...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Journal of diabetology Ročník 16; číslo 3; s. 231 - 239
Hlavní autori: Venkatesan, Ulagamadesan, Amutha, Anandakumar, Anjana, Ranjit Mohan, Unnikrishnan, Ranjit, Mappillairaju, Bagavandas, Mohan, Viswanathan
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: India Wolters Kluwer - Medknow 01.07.2025
Wolters Kluwer Medknow Publications
Vydanie:3
Predmet:
ISSN:2543-3288, 2078-7685
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Abstract Aim: We attempted to determine whether machine learning (ML) models outperform logistic regression (LR), a traditional prediction method, in distinguishing type 1 diabetes (T1D) from type 2 diabetes (T2D). Materials and Methods: Utilizing data from individuals of Indian origin diagnosed with diabetes between the ages of 10 and 30 years (n = 3531), we evaluated the ability of seven supervised ML algorithms (LR, gradient boosting [GB], decision tree, k-nearest neighbors, random forest [RF], support vector machine [SVM], and Naïve Bayes) to distinguish between T1D and T2D based on eight predictor variables: age at diagnosis, body mass index, total cholesterol, triglycerides, high-density lipoprotein, glycated hemoglobin, parental history, and glutamic acid decarboxylase antibody status. The dataset was split into training (70%) and testing (30%) subsets, and a grid search approach was employed for hyperparameter tuning to optimize model performance. Results: All fine-tuned ML algorithms demonstrated excellent discriminative ability, with high receiver operating characteristic (ROC) area under the curve (AUC) values (>0.95). GBM (AUC = 0.9700), LR (AUC = 0.9691), and SVM (AUC = 0.9686) emerged as the top-performing models, showing similar and superior performance in distinguishing between T1D and T2D. These algorithms also exhibited strong correlations in their predictions (LR-SVM: 1.000; LR-GB: 0.979; SVM-GB: 0.980). Additionally, LR, SVM, GB, and RF provided the highest net benefit across a wide range of threshold probabilities, highlighting their clinical utility for decision-making. Conclusion: In diabetes classification, the classic LR model proved comparable performance to advanced ML algorithms.
ISSN:2543-3288
2078-7685
DOI:10.4103/jod.jod_12_25