Podrobná bibliografie
| Název: |
Predicting lung cancer risk based on artificial intelligence: Leveraging multifactorial inputs for early detection. |
| Autoři: |
Guldogan, Emek, Yagin, Fatma Hilal, Karaaslan, Erol |
| Zdroj: |
Medicine Science; Dec2025, Vol. 14 Issue 4, p975-982, 8p |
| Témata: |
LUNG cancer, MACHINE learning, RISK assessment, SYMPTOMS, EARLY diagnosis, MEDICAL care, ARTIFICIAL intelligence, COMORBIDITY |
| Abstrakt: |
Lung cancer remains the leading cause of cancer-related mortality worldwide, largely because most cases are detected at advanced stages. This study develops and validates multifactorial machine-learning models that integrate demographic, behavioural, psychological, symptom-based and comorbidity variables to identify individuals at high risk of lung cancer. An anonymised dataset of 13.000 subjects (74% lung-cancer positive) obtained from the public "Lung Cancer Patient Records" repository was pre-processed through recoding, one-hot encoding and stratified train/test partitioning. To address class imbalance the training subset was balanced with Synthetic Minority Oversampling Technique (SMOTE). Three supervised algorithms-Logistic Regression, Random Forest and Extreme Gradient Boosting (XGBoost)-were tuned via grid search with five-fold stratified cross-validation optimising area under the receiver-operating-characteristic curve (AUC). On the independent hold-out set XGBoost achieved superior discrimination (AUC=0.93), sensitivity (0.95) and F1-score (0.93), followed closely by Random Forest (AUC=0.91). Univariate analyses confirmed significant associations (p<0.001) between lung cancer status and all candidate predictors, with the strongest effect sizes observed for yellow fingers, persistent cough, wheezing, fatigue and peer-pressure–related smoking. The findings demonstrate that incorporating easily elicited clinical symptoms and psychosocial factors alongside traditional risk markers markedly improves early-detection performance over age–smoking models alone. Because all inputs are non-invasive and low-cost, the proposed model can be embedded in electronic-health-record decision support or mobile triage applications, particularly benefiting resource-limited settings. Future work will focus on external validation across diverse populations, temporal modelling of symptom trajectories and cost-effectiveness analyses to inform risk-tailored low-dose CT screening protocols. [ABSTRACT FROM AUTHOR] |
|
Copyright of Medicine Science is the property of Society of Turaz Bilim and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) |
| Databáze: |
Biomedical Index |