BoostDILI: Extreme Gradient Boost-Powered Drug-Induced Liver Injury Prediction and Structural Alerts Generation

Over the past 60 years, drug-induced liver injury (DILI) has played a key role in the withdrawal of marketed drugs due to safety concerns. Early prediction of DILI is crucial for developing safer pharmaceuticals, yet current and testing methods are complex and cumbersome. In this study, we developed...

Full description

Saved in:
Bibliographic Details
Published in:Chemical research in toxicology Vol. 38; no. 5; p. 865
Main Authors: Chutia, Hillul, Borah, Gori Sankar, Mahanta, Hridoy Jyoti, Nagamani, Selvaraman
Format: Journal Article
Language:English
Published: United States 19.05.2025
Subjects:
ISSN:1520-5010, 1520-5010
Online Access:Get more information
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Over the past 60 years, drug-induced liver injury (DILI) has played a key role in the withdrawal of marketed drugs due to safety concerns. Early prediction of DILI is crucial for developing safer pharmaceuticals, yet current and testing methods are complex and cumbersome. In this study, we developed an extreme gradient boosting (XGB)-powered machine learning (ML) model for DILI prediction. Comparing various DILI prediction models is challenging because they rely on different public data sets. We comprehensively evaluated the proposed BoostDILI model to address two crucial questions: 1. Can insights derived from public data sets help in DILI prediction for Food and Drug Administration (FDA) approved drugs? 2. Can we generate structural alerts to improve the model's explainability? To address the first question, we developed a DILI prediction model using four publicly available data sets. This effort led to the creation of the BoostDILI model, which achieved a 5-fold CV accuracy of 0.70. A sequential feature selection method was employed to identify relevant descriptors. This model integrates feature-level representations derived from RDKit (12 features) and Mordred (23 features) features. Bayesian statistics was applied to identify high-performance substructures iteratively, and a structural alerts model was developed to address the second question. The developed model was further validated with two FDA-approved drug data sets, DILIst and DILIRank. The BoostDILI model offers a trustable solution for evaluating the DILI risk in preclinical research. The structural alerts help in identifying the substructures that may be responsible for DILI. The data set and the source code are available at https://github.com/Naga270588/BoostDILI.
AbstractList Over the past 60 years, drug-induced liver injury (DILI) has played a key role in the withdrawal of marketed drugs due to safety concerns. Early prediction of DILI is crucial for developing safer pharmaceuticals, yet current and testing methods are complex and cumbersome. In this study, we developed an extreme gradient boosting (XGB)-powered machine learning (ML) model for DILI prediction. Comparing various DILI prediction models is challenging because they rely on different public data sets. We comprehensively evaluated the proposed BoostDILI model to address two crucial questions: 1. Can insights derived from public data sets help in DILI prediction for Food and Drug Administration (FDA) approved drugs? 2. Can we generate structural alerts to improve the model's explainability? To address the first question, we developed a DILI prediction model using four publicly available data sets. This effort led to the creation of the BoostDILI model, which achieved a 5-fold CV accuracy of 0.70. A sequential feature selection method was employed to identify relevant descriptors. This model integrates feature-level representations derived from RDKit (12 features) and Mordred (23 features) features. Bayesian statistics was applied to identify high-performance substructures iteratively, and a structural alerts model was developed to address the second question. The developed model was further validated with two FDA-approved drug data sets, DILIst and DILIRank. The BoostDILI model offers a trustable solution for evaluating the DILI risk in preclinical research. The structural alerts help in identifying the substructures that may be responsible for DILI. The data set and the source code are available at https://github.com/Naga270588/BoostDILI.
Over the past 60 years, drug-induced liver injury (DILI) has played a key role in the withdrawal of marketed drugs due to safety concerns. Early prediction of DILI is crucial for developing safer pharmaceuticals, yet current in vitro and in vivo testing methods are complex and cumbersome. In this study, we developed an extreme gradient boosting (XGB)-powered machine learning (ML) model for DILI prediction. Comparing various DILI prediction models is challenging because they rely on different public data sets. We comprehensively evaluated the proposed BoostDILI model to address two crucial questions: 1. Can insights derived from public data sets help in DILI prediction for Food and Drug Administration (FDA) approved drugs? 2. Can we generate structural alerts to improve the model's explainability? To address the first question, we developed a DILI prediction model using four publicly available data sets. This effort led to the creation of the BoostDILI model, which achieved a 5-fold CV accuracy of 0.70. A sequential feature selection method was employed to identify relevant descriptors. This model integrates feature-level representations derived from RDKit (12 features) and Mordred (23 features) features. Bayesian statistics was applied to identify high-performance substructures iteratively, and a structural alerts model was developed to address the second question. The developed model was further validated with two FDA-approved drug data sets, DILIst and DILIRank. The BoostDILI model offers a trustable solution for evaluating the DILI risk in preclinical research. The structural alerts help in identifying the substructures that may be responsible for DILI. The data set and the source code are available at https://github.com/Naga270588/BoostDILI.Over the past 60 years, drug-induced liver injury (DILI) has played a key role in the withdrawal of marketed drugs due to safety concerns. Early prediction of DILI is crucial for developing safer pharmaceuticals, yet current in vitro and in vivo testing methods are complex and cumbersome. In this study, we developed an extreme gradient boosting (XGB)-powered machine learning (ML) model for DILI prediction. Comparing various DILI prediction models is challenging because they rely on different public data sets. We comprehensively evaluated the proposed BoostDILI model to address two crucial questions: 1. Can insights derived from public data sets help in DILI prediction for Food and Drug Administration (FDA) approved drugs? 2. Can we generate structural alerts to improve the model's explainability? To address the first question, we developed a DILI prediction model using four publicly available data sets. This effort led to the creation of the BoostDILI model, which achieved a 5-fold CV accuracy of 0.70. A sequential feature selection method was employed to identify relevant descriptors. This model integrates feature-level representations derived from RDKit (12 features) and Mordred (23 features) features. Bayesian statistics was applied to identify high-performance substructures iteratively, and a structural alerts model was developed to address the second question. The developed model was further validated with two FDA-approved drug data sets, DILIst and DILIRank. The BoostDILI model offers a trustable solution for evaluating the DILI risk in preclinical research. The structural alerts help in identifying the substructures that may be responsible for DILI. The data set and the source code are available at https://github.com/Naga270588/BoostDILI.
Author Borah, Gori Sankar
Mahanta, Hridoy Jyoti
Nagamani, Selvaraman
Chutia, Hillul
Author_xml – sequence: 1
  givenname: Hillul
  surname: Chutia
  fullname: Chutia, Hillul
  organization: CSIR-North East Institute of Science and Technology, Jorhat 785006, India
– sequence: 2
  givenname: Gori Sankar
  surname: Borah
  fullname: Borah, Gori Sankar
  organization: School of Computer Science, The Assam Kaziranga University, Jorhat 785006, India
– sequence: 3
  givenname: Hridoy Jyoti
  surname: Mahanta
  fullname: Mahanta, Hridoy Jyoti
  organization: Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
– sequence: 4
  givenname: Selvaraman
  orcidid: 0000-0002-7825-3994
  surname: Nagamani
  fullname: Nagamani, Selvaraman
  organization: Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
BackLink https://www.ncbi.nlm.nih.gov/pubmed/40241442$$D View this record in MEDLINE/PubMed
BookMark eNpNkMtOwzAURC0Eog_4hcpLNim2E-fBrrSlRKpEJWAdGfsGUiV28QPavydAkVjduZozs5gROtVGA0ITSqaUMHotpJvKN-gsOG_200QSwmN2goaUMxJxQsnpPz1AI-e2hNA-m52jQUJYQpOEDZG5Ncb5Rbkub_By7y10gFdWqAa0xz9etDGfYEHhhQ2vUalVkP2zbj7A4lJvgz3gTW830jdGY6EVfvQ2SB-saPGsBesdXoEGK76BC3RWi9bB5fGO0fPd8ml-H60fVuV8to5ETGIfFWmc5kJmqsgLrrgqOGe1qlma1zmVMiYpURxYJqnKpCC5oKrOAWQmgGdQUzZGV7-9O2veQz9S1TVOQtsKDSa4KqYFpUmWpqxHJ0c0vHSgqp1tOmEP1d9I7AuKvG_K
ContentType Journal Article
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1021/acs.chemrestox.4c00532
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Public Health
Pharmacy, Therapeutics, & Pharmacology
EISSN 1520-5010
ExternalDocumentID 40241442
Genre Journal Article
GroupedDBID ---
-~X
29B
4.4
55A
5GY
5RE
5VS
7~N
AABXI
ABBLG
ABJNI
ABLBI
ABMVS
ABQRX
ABUCX
ACGFS
ACJ
ACS
ADHLV
AEESW
AENEX
AFEFF
AGXLV
AHGAQ
ALMA_UNASSIGNED_HOLDINGS
AQSVZ
BAANH
CGR
CS3
CUPRZ
CUY
CVF
EBS
ECM
ED~
EIF
F5P
GGK
GNL
IH9
IHE
JG~
LG6
NPM
P2P
ROL
TN5
UI2
UPT
VF5
VG9
W1F
YZZ
7X8
ID FETCH-LOGICAL-a303t-96368ac7d9895d5d9552fdf268f81cc3060d5e27c1d7ca08a1df8eec7ae57ef12
IEDL.DBID 7X8
ISICitedReferencesCount 1
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001469189800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1520-5010
IngestDate Wed Jul 02 05:00:19 EDT 2025
Wed Jun 25 03:22:00 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 5
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a303t-96368ac7d9895d5d9552fdf268f81cc3060d5e27c1d7ca08a1df8eec7ae57ef12
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0002-7825-3994
PMID 40241442
PQID 3191147662
PQPubID 23479
ParticipantIDs proquest_miscellaneous_3191147662
pubmed_primary_40241442
PublicationCentury 2000
PublicationDate 2025-05-19
PublicationDateYYYYMMDD 2025-05-19
PublicationDate_xml – month: 05
  year: 2025
  text: 2025-05-19
  day: 19
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Chemical research in toxicology
PublicationTitleAlternate Chem Res Toxicol
PublicationYear 2025
SSID ssj0011027
Score 2.4694657
Snippet Over the past 60 years, drug-induced liver injury (DILI) has played a key role in the withdrawal of marketed drugs due to safety concerns. Early prediction of...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 865
SubjectTerms Boosting Machine Learning Algorithms
Chemical and Drug Induced Liver Injury
Humans
Machine Learning
Pharmaceutical Preparations - chemistry
United States Food and Drug Administration
Title BoostDILI: Extreme Gradient Boost-Powered Drug-Induced Liver Injury Prediction and Structural Alerts Generation
URI https://www.ncbi.nlm.nih.gov/pubmed/40241442
https://www.proquest.com/docview/3191147662
Volume 38
WOSCitedRecordID wos001469189800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1bS-QwFA7u6sPCsutlXS-rRBCfjE4yTZP6srheB8ah4IV5GzK56C67rbZV9N97TtthnoQFXwqlCZSck-Rcv4-Q7Tj2ojN2go1lFFjEg2TYP8l8N7EBpiheo_Pf9NVgoIfDJG0DbmVbVjk5E-uD2uUWY-T7oCpguqs4Fj_vHxiyRmF2taXQ-EBmu2DKYEmXGk6zCHB51uQqElwkCY7HpENY8H1jQaR3_h8SYOTPe5FFbRRvm5n1dXP69b0_Ok--tIYmPWw0Y4HM-GyR7KQNUvXLLr2aNl6Vu3SHplMM65dF8rkJ59GmS2mJ5L_yvKyOe_3eAT15rjCoSM-KulysovU3liLfmnf0uHi8ZcgIYuGlj2UftJf9AdHRtMCsEGoCNZmjlzV0LcJ-0MO_vqhK2mBg44Bv5Pr05OronLVcDczAJVgx2MexNla5RCfSSZdIKYILItZBc2vBMek46YWy3ClrOtpwF7T3VhkvlQ9cLJOPWZ75FULBhuXSau3GsY-648QEcAB8QMNVuVjyVbI1WfgR7AVMcJjM54_laLr0q-R7I73RfQPaMQI_OQLnUaz9x-x18kkgzS-CtCY_yGyAk8BvkDn7VP0ui81ayeA5SC9eAWFE3ps
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=BoostDILI%3A+Extreme+Gradient+Boost-Powered+Drug-Induced+Liver+Injury+Prediction+and+Structural+Alerts+Generation&rft.jtitle=Chemical+research+in+toxicology&rft.au=Chutia%2C+Hillul&rft.au=Borah%2C+Gori+Sankar&rft.au=Mahanta%2C+Hridoy+Jyoti&rft.au=Nagamani%2C+Selvaraman&rft.date=2025-05-19&rft.issn=1520-5010&rft.eissn=1520-5010&rft_id=info:doi/10.1021%2Facs.chemrestox.4c00532&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-5010&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-5010&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-5010&client=summon