BoostDILI: Extreme Gradient Boost-Powered Drug-Induced Liver Injury Prediction and Structural Alerts Generation

Over the past 60 years, drug-induced liver injury (DILI) has played a key role in the withdrawal of marketed drugs due to safety concerns. Early prediction of DILI is crucial for developing safer pharmaceuticals, yet current and testing methods are complex and cumbersome. In this study, we developed...

Full description

Saved in:
Bibliographic Details
Published in:Chemical research in toxicology Vol. 38; no. 5; p. 865
Main Authors: Chutia, Hillul, Borah, Gori Sankar, Mahanta, Hridoy Jyoti, Nagamani, Selvaraman
Format: Journal Article
Language:English
Published: United States 19.05.2025
Subjects:
ISSN:1520-5010, 1520-5010
Online Access:Get more information
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Over the past 60 years, drug-induced liver injury (DILI) has played a key role in the withdrawal of marketed drugs due to safety concerns. Early prediction of DILI is crucial for developing safer pharmaceuticals, yet current and testing methods are complex and cumbersome. In this study, we developed an extreme gradient boosting (XGB)-powered machine learning (ML) model for DILI prediction. Comparing various DILI prediction models is challenging because they rely on different public data sets. We comprehensively evaluated the proposed BoostDILI model to address two crucial questions: 1. Can insights derived from public data sets help in DILI prediction for Food and Drug Administration (FDA) approved drugs? 2. Can we generate structural alerts to improve the model's explainability? To address the first question, we developed a DILI prediction model using four publicly available data sets. This effort led to the creation of the BoostDILI model, which achieved a 5-fold CV accuracy of 0.70. A sequential feature selection method was employed to identify relevant descriptors. This model integrates feature-level representations derived from RDKit (12 features) and Mordred (23 features) features. Bayesian statistics was applied to identify high-performance substructures iteratively, and a structural alerts model was developed to address the second question. The developed model was further validated with two FDA-approved drug data sets, DILIst and DILIRank. The BoostDILI model offers a trustable solution for evaluating the DILI risk in preclinical research. The structural alerts help in identifying the substructures that may be responsible for DILI. The data set and the source code are available at https://github.com/Naga270588/BoostDILI.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1520-5010
1520-5010
DOI:10.1021/acs.chemrestox.4c00532