Machine learning prediction of osteoarthritis risk from volatile organic compound exposure using SHAP interpretation in US adults

Saved in:
Bibliographic Details
Title: Machine learning prediction of osteoarthritis risk from volatile organic compound exposure using SHAP interpretation in US adults
Authors: Shanbin Zheng, Jiaqing Zhu, Xun Cao, Zhiyuan Chen, Chao Zhang, Tianwei Xia, Jirong Shen
Source: Scientific Reports, Vol 15, Iss 1, Pp 1-15 (2025)
Publisher Information: Nature Portfolio, 2025.
Publication Year: 2025
Collection: LCC:Medicine
LCC:Science
Subject Terms: Metabolite of volatile organic compound, Osteoarthritis, Machine learning, Shapley additive explanations, Medicine, Science
Description: Abstract Exposure to volatile organic compounds (VOCs) is widespread and has been implicated in the pathogenesis of various chronic diseases. However, the specific relationship between VOC exposure and the risk of osteoarthritis (OA) remains poorly characterized. This study aimed to investigate the associations between a broad spectrum of VOC metabolites and OA risk, and to identify the most influential VOC metabolites. We analyzed data from the National Health and Nutrition Examination Survey (NHANES) 2011–2018, comprising 3683 US adults. OA status was self-reported. Exposure levels to 17 VOCs were assessed using their urinary metabolites. After data splitting (70% training, 30% testing), multiple machine learning models were trained and evaluated. The optimal model was interpreted using SHapley Additive exPlanations (SHAP) to identify key predictors and elucidate their dose-response relationships with OA risk. The Linear Discriminant Analysis (LDA) model demonstrated the best predictive performance (AUC = 0.755). SHAP interpretation revealed that besides age, specific VOC metabolites were among the top predictors of OA. N-Acetyl-S-(3,4-dihydroxybutyl)-l-cysteine (DHBMA, a metabolite of 1,3-butadiene) and N-Acetyl-S-(3-hydroxypropyl-2-methyl)-l-cysteine (HMPMA, a metabolite of crotonaldehyde) were identified as novel and significant risk factors. Further analysis delineated non-linear, dose-response relationships between these VOCs and OA risk. Subgroup analyses suggested that the associations were consistent across different demographics. In summary, this study developed a machine learning model based on VOC exposure that effectively predicts osteoarthritis risk. LDA model achieved robust performance, with SHAP interpretation identifying DHBMA and HMPMA as novel and significant risk factors, in addition to known demographic predictors. Subgroup analyses further confirmed the consistent and non-linear association of these VOC metabolites with OA across diverse populations. These findings underscore the value of integrating environmental exposure data into OA risk prediction and support its potential for targeted prevention strategies in high-risk groups.
Document Type: article
File Description: electronic resource
Language: English
ISSN: 2045-2322
Relation: https://doaj.org/toc/2045-2322
DOI: 10.1038/s41598-025-23050-7
Access URL: https://doaj.org/article/05446cf3d8b742978f552afae07ceeba
Accession Number: edsdoj.05446cf3d8b742978f552afae07ceeba
Database: Directory of Open Access Journals
Description
Abstract:Abstract Exposure to volatile organic compounds (VOCs) is widespread and has been implicated in the pathogenesis of various chronic diseases. However, the specific relationship between VOC exposure and the risk of osteoarthritis (OA) remains poorly characterized. This study aimed to investigate the associations between a broad spectrum of VOC metabolites and OA risk, and to identify the most influential VOC metabolites. We analyzed data from the National Health and Nutrition Examination Survey (NHANES) 2011–2018, comprising 3683 US adults. OA status was self-reported. Exposure levels to 17 VOCs were assessed using their urinary metabolites. After data splitting (70% training, 30% testing), multiple machine learning models were trained and evaluated. The optimal model was interpreted using SHapley Additive exPlanations (SHAP) to identify key predictors and elucidate their dose-response relationships with OA risk. The Linear Discriminant Analysis (LDA) model demonstrated the best predictive performance (AUC = 0.755). SHAP interpretation revealed that besides age, specific VOC metabolites were among the top predictors of OA. N-Acetyl-S-(3,4-dihydroxybutyl)-l-cysteine (DHBMA, a metabolite of 1,3-butadiene) and N-Acetyl-S-(3-hydroxypropyl-2-methyl)-l-cysteine (HMPMA, a metabolite of crotonaldehyde) were identified as novel and significant risk factors. Further analysis delineated non-linear, dose-response relationships between these VOCs and OA risk. Subgroup analyses suggested that the associations were consistent across different demographics. In summary, this study developed a machine learning model based on VOC exposure that effectively predicts osteoarthritis risk. LDA model achieved robust performance, with SHAP interpretation identifying DHBMA and HMPMA as novel and significant risk factors, in addition to known demographic predictors. Subgroup analyses further confirmed the consistent and non-linear association of these VOC metabolites with OA across diverse populations. These findings underscore the value of integrating environmental exposure data into OA risk prediction and support its potential for targeted prevention strategies in high-risk groups.
ISSN:20452322
DOI:10.1038/s41598-025-23050-7