Machine learning prediction of osteoarthritis risk from volatile organic compound exposure using SHAP interpretation in US adults
Saved in:
| Title: | Machine learning prediction of osteoarthritis risk from volatile organic compound exposure using SHAP interpretation in US adults |
|---|---|
| Authors: | Shanbin Zheng, Jiaqing Zhu, Xun Cao, Zhiyuan Chen, Chao Zhang, Tianwei Xia, Jirong Shen |
| Source: | Scientific Reports, Vol 15, Iss 1, Pp 1-15 (2025) |
| Publisher Information: | Nature Portfolio, 2025. |
| Publication Year: | 2025 |
| Collection: | LCC:Medicine LCC:Science |
| Subject Terms: | Metabolite of volatile organic compound, Osteoarthritis, Machine learning, Shapley additive explanations, Medicine, Science |
| Description: | Abstract Exposure to volatile organic compounds (VOCs) is widespread and has been implicated in the pathogenesis of various chronic diseases. However, the specific relationship between VOC exposure and the risk of osteoarthritis (OA) remains poorly characterized. This study aimed to investigate the associations between a broad spectrum of VOC metabolites and OA risk, and to identify the most influential VOC metabolites. We analyzed data from the National Health and Nutrition Examination Survey (NHANES) 2011–2018, comprising 3683 US adults. OA status was self-reported. Exposure levels to 17 VOCs were assessed using their urinary metabolites. After data splitting (70% training, 30% testing), multiple machine learning models were trained and evaluated. The optimal model was interpreted using SHapley Additive exPlanations (SHAP) to identify key predictors and elucidate their dose-response relationships with OA risk. The Linear Discriminant Analysis (LDA) model demonstrated the best predictive performance (AUC = 0.755). SHAP interpretation revealed that besides age, specific VOC metabolites were among the top predictors of OA. N-Acetyl-S-(3,4-dihydroxybutyl)-l-cysteine (DHBMA, a metabolite of 1,3-butadiene) and N-Acetyl-S-(3-hydroxypropyl-2-methyl)-l-cysteine (HMPMA, a metabolite of crotonaldehyde) were identified as novel and significant risk factors. Further analysis delineated non-linear, dose-response relationships between these VOCs and OA risk. Subgroup analyses suggested that the associations were consistent across different demographics. In summary, this study developed a machine learning model based on VOC exposure that effectively predicts osteoarthritis risk. LDA model achieved robust performance, with SHAP interpretation identifying DHBMA and HMPMA as novel and significant risk factors, in addition to known demographic predictors. Subgroup analyses further confirmed the consistent and non-linear association of these VOC metabolites with OA across diverse populations. These findings underscore the value of integrating environmental exposure data into OA risk prediction and support its potential for targeted prevention strategies in high-risk groups. |
| Document Type: | article |
| File Description: | electronic resource |
| Language: | English |
| ISSN: | 2045-2322 |
| Relation: | https://doaj.org/toc/2045-2322 |
| DOI: | 10.1038/s41598-025-23050-7 |
| Access URL: | https://doaj.org/article/05446cf3d8b742978f552afae07ceeba |
| Accession Number: | edsdoj.05446cf3d8b742978f552afae07ceeba |
| Database: | Directory of Open Access Journals |
| Abstract: | Abstract Exposure to volatile organic compounds (VOCs) is widespread and has been implicated in the pathogenesis of various chronic diseases. However, the specific relationship between VOC exposure and the risk of osteoarthritis (OA) remains poorly characterized. This study aimed to investigate the associations between a broad spectrum of VOC metabolites and OA risk, and to identify the most influential VOC metabolites. We analyzed data from the National Health and Nutrition Examination Survey (NHANES) 2011–2018, comprising 3683 US adults. OA status was self-reported. Exposure levels to 17 VOCs were assessed using their urinary metabolites. After data splitting (70% training, 30% testing), multiple machine learning models were trained and evaluated. The optimal model was interpreted using SHapley Additive exPlanations (SHAP) to identify key predictors and elucidate their dose-response relationships with OA risk. The Linear Discriminant Analysis (LDA) model demonstrated the best predictive performance (AUC = 0.755). SHAP interpretation revealed that besides age, specific VOC metabolites were among the top predictors of OA. N-Acetyl-S-(3,4-dihydroxybutyl)-l-cysteine (DHBMA, a metabolite of 1,3-butadiene) and N-Acetyl-S-(3-hydroxypropyl-2-methyl)-l-cysteine (HMPMA, a metabolite of crotonaldehyde) were identified as novel and significant risk factors. Further analysis delineated non-linear, dose-response relationships between these VOCs and OA risk. Subgroup analyses suggested that the associations were consistent across different demographics. In summary, this study developed a machine learning model based on VOC exposure that effectively predicts osteoarthritis risk. LDA model achieved robust performance, with SHAP interpretation identifying DHBMA and HMPMA as novel and significant risk factors, in addition to known demographic predictors. Subgroup analyses further confirmed the consistent and non-linear association of these VOC metabolites with OA across diverse populations. These findings underscore the value of integrating environmental exposure data into OA risk prediction and support its potential for targeted prevention strategies in high-risk groups. |
|---|---|
| ISSN: | 20452322 |
| DOI: | 10.1038/s41598-025-23050-7 |
Full Text Finder
Nájsť tento článok vo Web of Science