Evaluation of Logistic Regression and Advanced Machine Learning Algorithms to Differentiate between Type 1 and Type 2 Diabetes in India

Abstract Aim: We attempted to determine whether machine learning (ML) models outperform logistic regression (LR), a traditional prediction method, in distinguishing type 1 diabetes (T1D) from type 2 diabetes (T2D). Materials and Methods: Utilizing data from individuals of Indian origin diagnosed wit...

Full description

Saved in:
Bibliographic Details
Published in:Journal of diabetology Vol. 16; no. 3; pp. 231 - 239
Main Authors: Venkatesan, Ulagamadesan, Amutha, Anandakumar, Anjana, Ranjit Mohan, Unnikrishnan, Ranjit, Mappillairaju, Bagavandas, Mohan, Viswanathan
Format: Journal Article
Language:English
Published: India Wolters Kluwer - Medknow 01.07.2025
Wolters Kluwer Medknow Publications
Edition:3
Subjects:
ISSN:2543-3288, 2078-7685
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Aim: We attempted to determine whether machine learning (ML) models outperform logistic regression (LR), a traditional prediction method, in distinguishing type 1 diabetes (T1D) from type 2 diabetes (T2D). Materials and Methods: Utilizing data from individuals of Indian origin diagnosed with diabetes between the ages of 10 and 30 years (n = 3531), we evaluated the ability of seven supervised ML algorithms (LR, gradient boosting [GB], decision tree, k-nearest neighbors, random forest [RF], support vector machine [SVM], and Naïve Bayes) to distinguish between T1D and T2D based on eight predictor variables: age at diagnosis, body mass index, total cholesterol, triglycerides, high-density lipoprotein, glycated hemoglobin, parental history, and glutamic acid decarboxylase antibody status. The dataset was split into training (70%) and testing (30%) subsets, and a grid search approach was employed for hyperparameter tuning to optimize model performance. Results: All fine-tuned ML algorithms demonstrated excellent discriminative ability, with high receiver operating characteristic (ROC) area under the curve (AUC) values (>0.95). GBM (AUC = 0.9700), LR (AUC = 0.9691), and SVM (AUC = 0.9686) emerged as the top-performing models, showing similar and superior performance in distinguishing between T1D and T2D. These algorithms also exhibited strong correlations in their predictions (LR-SVM: 1.000; LR-GB: 0.979; SVM-GB: 0.980). Additionally, LR, SVM, GB, and RF provided the highest net benefit across a wide range of threshold probabilities, highlighting their clinical utility for decision-making. Conclusion: In diabetes classification, the classic LR model proved comparable performance to advanced ML algorithms.
ISSN:2543-3288
2078-7685
DOI:10.4103/jod.jod_12_25