Comparison of spatial prediction models from Machine Learning of cholangiocarcinoma incidence in Thailand

Gespeichert in:
Bibliographische Detailangaben
Titel: Comparison of spatial prediction models from Machine Learning of cholangiocarcinoma incidence in Thailand
Autoren: Oraya Sahat, Supot Kamsa-ard, Apiradee Lim, Siriporn Kamsa-ard, Matias Garcia-Constantino, Idongesit Ekerete
Quelle: BMC Public Health, Vol 25, Iss 1, Pp 1-12 (2025)
Verlagsinformationen: BMC, 2025.
Publikationsjahr: 2025
Bestand: LCC:Public aspects of medicine
Schlagwörter: Cholangiocarcinoma, Spatial Predictions, Prediction Models, Machine Learning, Population-based cancer registries, Thailand, Public aspects of medicine, RA1-1270
Beschreibung: Abstract Background Cholangiocarcinoma (CCA) poses a significant public health challenge in Thailand, with notably high incidence rates. This study aimed to compare the performance of spatial prediction models using Machine Learning techniques to analyze the occurrence of CCA across Thailand. Methods This retrospective cohort study analyzed CCA cases from four population-based cancer registries in Thailand, diagnosed between January 1, 2012, and December 31, 2021. The study employed Machine Learning models (Linear Regression, Random Forest, Neural Network, and Extreme Gradient Boosting (XGBoost)) to predict Age-Standardized Rates (ASR) of CCA based on spatial variables. Model performance was evaluated using Root Mean Square Error (RMSE) and R2 with 70:30 train-test validation. Results The study included 6,379 CCA cases, with a male predominance (4,075 cases; 63.9%) and a mean age of 66.2 years (standard deviation = 11.1 years). The northeastern region accounted for most of the cases (3,898 cases; 61.1%). The overall ASR of CCA was 8.9 per 100,000 person-years (95% CI: 8.7 to 9.2), with the northeastern region showing the highest incidence (ASR = 13.4 per 100,000 person-years; 95% CI: 12.9 to 13.8). In the overall dataset, the Random Forest model demonstrated better prediction performance in both the training (R2 = 72.07%) and testing datasets (R2 = 71.66%). Regional variations in model performance were observed, with Random Forest performing best in the northern, northeastern regions, while XGBoost excelled in the central and southern regions. The most important spatial predictors for CCA were elevation and distance from water sources. Conclusion The Random Forest model demonstrated the highest efficiency in predicting CCA incidence rates in Thailand, though predictive performance varied across regions. Spatial factors effectively predicted ASR of CCA, providing valuable insights for national-level disease surveillance and targeted public health interventions. These findings support the development of region-specific approaches for CCA control using spatial epidemiology and machine learning techniques.
Publikationsart: article
Dateibeschreibung: electronic resource
Sprache: English
ISSN: 1471-2458
Relation: https://doaj.org/toc/1471-2458
DOI: 10.1186/s12889-025-23119-y
Zugangs-URL: https://doaj.org/article/3fb6cc2676ba45c39712acb4a70e23de
Dokumentencode: edsdoj.3fb6cc2676ba45c39712acb4a70e23de
Datenbank: Directory of Open Access Journals
Beschreibung
Abstract:Abstract Background Cholangiocarcinoma (CCA) poses a significant public health challenge in Thailand, with notably high incidence rates. This study aimed to compare the performance of spatial prediction models using Machine Learning techniques to analyze the occurrence of CCA across Thailand. Methods This retrospective cohort study analyzed CCA cases from four population-based cancer registries in Thailand, diagnosed between January 1, 2012, and December 31, 2021. The study employed Machine Learning models (Linear Regression, Random Forest, Neural Network, and Extreme Gradient Boosting (XGBoost)) to predict Age-Standardized Rates (ASR) of CCA based on spatial variables. Model performance was evaluated using Root Mean Square Error (RMSE) and R2 with 70:30 train-test validation. Results The study included 6,379 CCA cases, with a male predominance (4,075 cases; 63.9%) and a mean age of 66.2 years (standard deviation = 11.1 years). The northeastern region accounted for most of the cases (3,898 cases; 61.1%). The overall ASR of CCA was 8.9 per 100,000 person-years (95% CI: 8.7 to 9.2), with the northeastern region showing the highest incidence (ASR = 13.4 per 100,000 person-years; 95% CI: 12.9 to 13.8). In the overall dataset, the Random Forest model demonstrated better prediction performance in both the training (R2 = 72.07%) and testing datasets (R2 = 71.66%). Regional variations in model performance were observed, with Random Forest performing best in the northern, northeastern regions, while XGBoost excelled in the central and southern regions. The most important spatial predictors for CCA were elevation and distance from water sources. Conclusion The Random Forest model demonstrated the highest efficiency in predicting CCA incidence rates in Thailand, though predictive performance varied across regions. Spatial factors effectively predicted ASR of CCA, providing valuable insights for national-level disease surveillance and targeted public health interventions. These findings support the development of region-specific approaches for CCA control using spatial epidemiology and machine learning techniques.
ISSN:14712458
DOI:10.1186/s12889-025-23119-y