BoostDILI: Extreme Gradient Boost-Powered Drug-Induced Liver Injury Prediction and Structural Alerts Generation

Over the past 60 years, drug-induced liver injury (DILI) has played a key role in the withdrawal of marketed drugs due to safety concerns. Early prediction of DILI is crucial for developing safer pharmaceuticals, yet current and testing methods are complex and cumbersome. In this study, we developed...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Chemical research in toxicology Ročník 38; číslo 5; s. 865
Hlavní autoři: Chutia, Hillul, Borah, Gori Sankar, Mahanta, Hridoy Jyoti, Nagamani, Selvaraman
Médium: Journal Article
Jazyk:angličtina
Vydáno: United States 19.05.2025
Témata:
ISSN:1520-5010, 1520-5010
On-line přístup:Zjistit podrobnosti o přístupu
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Over the past 60 years, drug-induced liver injury (DILI) has played a key role in the withdrawal of marketed drugs due to safety concerns. Early prediction of DILI is crucial for developing safer pharmaceuticals, yet current and testing methods are complex and cumbersome. In this study, we developed an extreme gradient boosting (XGB)-powered machine learning (ML) model for DILI prediction. Comparing various DILI prediction models is challenging because they rely on different public data sets. We comprehensively evaluated the proposed BoostDILI model to address two crucial questions: 1. Can insights derived from public data sets help in DILI prediction for Food and Drug Administration (FDA) approved drugs? 2. Can we generate structural alerts to improve the model's explainability? To address the first question, we developed a DILI prediction model using four publicly available data sets. This effort led to the creation of the BoostDILI model, which achieved a 5-fold CV accuracy of 0.70. A sequential feature selection method was employed to identify relevant descriptors. This model integrates feature-level representations derived from RDKit (12 features) and Mordred (23 features) features. Bayesian statistics was applied to identify high-performance substructures iteratively, and a structural alerts model was developed to address the second question. The developed model was further validated with two FDA-approved drug data sets, DILIst and DILIRank. The BoostDILI model offers a trustable solution for evaluating the DILI risk in preclinical research. The structural alerts help in identifying the substructures that may be responsible for DILI. The data set and the source code are available at https://github.com/Naga270588/BoostDILI.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1520-5010
1520-5010
DOI:10.1021/acs.chemrestox.4c00532