A Light Gradient-Boosting Machine algorithm with Tree-Structured Parzen Estimator for breast cancer diagnosis

Breast cancer is a common and potentially life-threatening disease. Early and accurate diagnosis of breast cancer is crucial for effective treatment and improved patient outcomes. This study proposed using the Light Gradient-Boosting Machine (LightGBM) algorithm, Borderline- Synthetic Minority Overs...

Full description

Saved in:
Bibliographic Details
Published in:Healthcare analytics (New York, N.Y.) Vol. 4; p. 100218
Main Authors: Omotehinwa, Temidayo Oluwatosin, Oyewola, David Opeoluwa, Dada, Emmanuel Gbenga
Format: Journal Article
Language:English
Published: Elsevier Inc 01.12.2023
Elsevier
Subjects:
ISSN:2772-4425, 2772-4425
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Breast cancer is a common and potentially life-threatening disease. Early and accurate diagnosis of breast cancer is crucial for effective treatment and improved patient outcomes. This study proposed using the Light Gradient-Boosting Machine (LightGBM) algorithm, Borderline- Synthetic Minority Oversampling Technique (SMOTE), and the Tree-Structured Parzen Estimator (TPE) for hyperparameter tuning to enhance the effectiveness of the Machine Learning (ML) model for diagnosing breast cancer. A 10-fold cross-validated TPE optimized Borderline-SMOTE LightGBM classifier was modelled on the Wisconsin Diagnostic Breast Cancer (WDBC) Dataset and evaluated for its performance compared to a baseline LightGBM model. The TPE-optimized Borderline-SMOTE LightGBM model exhibited a significant improvement in performance over the baseline model, achieving an average accuracy of 99.12%, specificity of 100%, precision of 100%, recall of 97.62%, F1-score of 98.80%, and a Mathews Correlation Coefficient of 98.12%. Compared to previous studies, the TPE-optimized Borderline-SMOTE LightGBM model performed exceptionally well. The study demonstrates the effectiveness of using data augmentation and hyperparameter optimization techniques to improve the performance of ML models for breast cancer diagnosis, which has significant implications for the medical field where the accurate and efficient diagnosis of breast cancer is critical. •Propose a light gradient-boosting machine algorithm with a tree-structured Parzen estimator for breast cancer diagnosis.•The proposed model achieved an accuracy of 99.12%.•The proposed model performed exceptionally well compared to previous studies.•The model has the potential to support physicians in breast cancer diagnosis.•The study’s contributions have implications for breast cancer diagnosis and treatment.
ISSN:2772-4425
2772-4425
DOI:10.1016/j.health.2023.100218