Efficient Exploration of Chemical Space with Docking and Deep Learning

With the advent of make-on-demand commercial libraries, the number of purchasable compounds available for virtual screening and assay has grown explosively in recent years, with several libraries eclipsing one billion compounds. Today's screening libraries are larger and more diverse, enabling...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of chemical theory and computation Ročník 17; číslo 11; s. 7106
Hlavní autoři: Yang, Ying, Yao, Kun, Repasky, Matthew P, Leswing, Karl, Abel, Robert, Shoichet, Brian K, Jerome, Steven V
Médium: Journal Article
Jazyk:angličtina
Vydáno: 09.11.2021
ISSN:1549-9626, 1549-9626
On-line přístup:Zjistit podrobnosti o přístupu
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract With the advent of make-on-demand commercial libraries, the number of purchasable compounds available for virtual screening and assay has grown explosively in recent years, with several libraries eclipsing one billion compounds. Today's screening libraries are larger and more diverse, enabling the discovery of more-potent hit compounds and unlocking new areas of chemical space, represented by new core scaffolds. Applying physics-based in silico screening methods in an exhaustive manner, where every molecule in the library must be enumerated and evaluated independently, is increasingly cost-prohibitive. Here, we introduce a protocol for machine learning-enhanced molecular docking based on active learning to dramatically increase throughput over traditional docking. We leverage a novel selection protocol that strikes a balance between two objectives: (1) identifying the best scoring compounds and (2) exploring a large region of chemical space, demonstrating superior performance compared to a purely greedy approach. Together with automated redocking of the top compounds, this method captures almost all the high scoring scaffolds in the library found by exhaustive docking. This protocol is applied to our recent virtual screening campaigns against the D4 and AMPC targets that produced dozens of highly potent, novel inhibitors, and a blind test against the MT1 target. Our protocol recovers more than 80% of the experimentally confirmed hits with a 14-fold reduction in compute cost, and more than 90% of the hit scaffolds in the top 5% of model predictions, preserving the diversity of the experimentally confirmed hit compounds.With the advent of make-on-demand commercial libraries, the number of purchasable compounds available for virtual screening and assay has grown explosively in recent years, with several libraries eclipsing one billion compounds. Today's screening libraries are larger and more diverse, enabling the discovery of more-potent hit compounds and unlocking new areas of chemical space, represented by new core scaffolds. Applying physics-based in silico screening methods in an exhaustive manner, where every molecule in the library must be enumerated and evaluated independently, is increasingly cost-prohibitive. Here, we introduce a protocol for machine learning-enhanced molecular docking based on active learning to dramatically increase throughput over traditional docking. We leverage a novel selection protocol that strikes a balance between two objectives: (1) identifying the best scoring compounds and (2) exploring a large region of chemical space, demonstrating superior performance compared to a purely greedy approach. Together with automated redocking of the top compounds, this method captures almost all the high scoring scaffolds in the library found by exhaustive docking. This protocol is applied to our recent virtual screening campaigns against the D4 and AMPC targets that produced dozens of highly potent, novel inhibitors, and a blind test against the MT1 target. Our protocol recovers more than 80% of the experimentally confirmed hits with a 14-fold reduction in compute cost, and more than 90% of the hit scaffolds in the top 5% of model predictions, preserving the diversity of the experimentally confirmed hit compounds.
AbstractList With the advent of make-on-demand commercial libraries, the number of purchasable compounds available for virtual screening and assay has grown explosively in recent years, with several libraries eclipsing one billion compounds. Today's screening libraries are larger and more diverse, enabling the discovery of more-potent hit compounds and unlocking new areas of chemical space, represented by new core scaffolds. Applying physics-based in silico screening methods in an exhaustive manner, where every molecule in the library must be enumerated and evaluated independently, is increasingly cost-prohibitive. Here, we introduce a protocol for machine learning-enhanced molecular docking based on active learning to dramatically increase throughput over traditional docking. We leverage a novel selection protocol that strikes a balance between two objectives: (1) identifying the best scoring compounds and (2) exploring a large region of chemical space, demonstrating superior performance compared to a purely greedy approach. Together with automated redocking of the top compounds, this method captures almost all the high scoring scaffolds in the library found by exhaustive docking. This protocol is applied to our recent virtual screening campaigns against the D4 and AMPC targets that produced dozens of highly potent, novel inhibitors, and a blind test against the MT1 target. Our protocol recovers more than 80% of the experimentally confirmed hits with a 14-fold reduction in compute cost, and more than 90% of the hit scaffolds in the top 5% of model predictions, preserving the diversity of the experimentally confirmed hit compounds.With the advent of make-on-demand commercial libraries, the number of purchasable compounds available for virtual screening and assay has grown explosively in recent years, with several libraries eclipsing one billion compounds. Today's screening libraries are larger and more diverse, enabling the discovery of more-potent hit compounds and unlocking new areas of chemical space, represented by new core scaffolds. Applying physics-based in silico screening methods in an exhaustive manner, where every molecule in the library must be enumerated and evaluated independently, is increasingly cost-prohibitive. Here, we introduce a protocol for machine learning-enhanced molecular docking based on active learning to dramatically increase throughput over traditional docking. We leverage a novel selection protocol that strikes a balance between two objectives: (1) identifying the best scoring compounds and (2) exploring a large region of chemical space, demonstrating superior performance compared to a purely greedy approach. Together with automated redocking of the top compounds, this method captures almost all the high scoring scaffolds in the library found by exhaustive docking. This protocol is applied to our recent virtual screening campaigns against the D4 and AMPC targets that produced dozens of highly potent, novel inhibitors, and a blind test against the MT1 target. Our protocol recovers more than 80% of the experimentally confirmed hits with a 14-fold reduction in compute cost, and more than 90% of the hit scaffolds in the top 5% of model predictions, preserving the diversity of the experimentally confirmed hit compounds.
Author Shoichet, Brian K
Yao, Kun
Jerome, Steven V
Yang, Ying
Leswing, Karl
Repasky, Matthew P
Abel, Robert
Author_xml – sequence: 1
  givenname: Ying
  surname: Yang
  fullname: Yang, Ying
– sequence: 2
  givenname: Kun
  surname: Yao
  fullname: Yao, Kun
– sequence: 3
  givenname: Matthew P
  surname: Repasky
  fullname: Repasky, Matthew P
– sequence: 4
  givenname: Karl
  surname: Leswing
  fullname: Leswing, Karl
– sequence: 5
  givenname: Robert
  surname: Abel
  fullname: Abel, Robert
– sequence: 6
  givenname: Brian K
  surname: Shoichet
  fullname: Shoichet, Brian K
– sequence: 7
  givenname: Steven V
  surname: Jerome
  fullname: Jerome, Steven V
BookMark eNpNjLtOAzEQAC0UJJJAT-mS5sLavoevRHkAUiQKoI426zVxuNhHfBF8PkhQUM1oipmIUUyRhbhWMFOg1S1Snu1poJkiAKvgTIxVVbZFW-t69M8vxCTnPYAxpTZjsVp6HyhwHOTyq-_SEYeQokxeznd8CISdfO6RWH6GYScXid5DfJMYnVww93LNeIw_5VKce-wyX_1xKl5Xy5f5Q7F-un-c360LNKUaCqeh1aTZNQS0RU8WLDrvjGXjDLc1gmodWN62-qf6Wil0CqhC6x00Xk_Fze-3P6aPE-dhcwiZuOswcjrlja4a29SVKiv9DfNzUxY
CitedBy_id crossref_primary_10_1016_j_rechem_2025_102651
crossref_primary_10_1007_s40203_025_00360_2
crossref_primary_10_1002_ejoc_202400367
crossref_primary_10_1021_acsomega_5c00829
crossref_primary_10_2174_0109298665361116250121103146
crossref_primary_10_1016_j_heliyon_2025_e42584
crossref_primary_10_1021_acs_jcim_5c00394
crossref_primary_10_1002_cbdv_202403449
crossref_primary_10_1007_s11030_025_11206_6
crossref_primary_10_1007_s12031_025_02409_5
crossref_primary_10_1016_j_molstruc_2025_141375
crossref_primary_10_3390_cimb46100666
crossref_primary_10_1016_j_drudis_2024_104106
crossref_primary_10_1038_s41598_025_05644_3
crossref_primary_10_3390_ijms26115321
crossref_primary_10_1039_D5RA03958D
crossref_primary_10_1002_cmdc_202500247
crossref_primary_10_1016_j_sbi_2023_102528
crossref_primary_10_1038_s41598_023_48281_4
crossref_primary_10_1038_s41598_025_00024_3
crossref_primary_10_1016_j_ejmech_2025_117402
crossref_primary_10_1016_j_ailsci_2021_100023
crossref_primary_10_1007_s10462_024_10714_5
crossref_primary_10_1016_j_ijbiomac_2025_141111
crossref_primary_10_1002_cmdc_202500365
crossref_primary_10_1093_bib_bbaf286
crossref_primary_10_1016_j_ejmech_2024_116539
crossref_primary_10_1016_j_ijbiomac_2024_138180
crossref_primary_10_1146_annurev_biochem_030222_120000
crossref_primary_10_1016_j_compbiomed_2025_110570
crossref_primary_10_3897_pharmacia_71_e132720
crossref_primary_10_1016_j_bbadis_2024_167626
crossref_primary_10_1016_j_xphs_2025_103708
crossref_primary_10_1080_14756366_2024_2418470
crossref_primary_10_2174_0113816128349577240927071706
crossref_primary_10_1016_j_molliq_2025_127423
crossref_primary_10_1016_j_csbj_2024_04_063
crossref_primary_10_1038_s41598_025_09420_1
crossref_primary_10_2174_0109298673307315240730042209
crossref_primary_10_1038_s41401_025_01607_6
crossref_primary_10_1016_j_ijbiomac_2025_141762
crossref_primary_10_1016_j_matchemphys_2025_131269
crossref_primary_10_1016_j_molstruc_2024_138763
crossref_primary_10_1002_app_57100
crossref_primary_10_1021_acs_jcim_5c00214
crossref_primary_10_3390_ijms241411265
crossref_primary_10_3390_ijms252212350
crossref_primary_10_1038_s41467_024_55287_7
crossref_primary_10_1021_acs_jcim_5c01029
crossref_primary_10_1021_acs_jcim_4c02107
crossref_primary_10_1016_j_ejmcr_2025_100289
crossref_primary_10_1021_acs_jcim_5c00850
crossref_primary_10_1080_17568919_2025_2458452
crossref_primary_10_3390_ijms24076109
crossref_primary_10_1007_s40203_024_00278_1
crossref_primary_10_1002_anie_202504107
crossref_primary_10_3390_molecules29225349
crossref_primary_10_1016_j_molstruc_2024_140464
crossref_primary_10_1016_j_molstruc_2025_142007
crossref_primary_10_1080_10408398_2025_2516136
crossref_primary_10_1016_j_ejmech_2024_117062
crossref_primary_10_3390_ijms26010315
crossref_primary_10_1016_j_chphi_2024_100743
crossref_primary_10_1016_j_molstruc_2025_142487
crossref_primary_10_3390_cimb47060434
crossref_primary_10_1016_j_sbi_2024_102829
crossref_primary_10_1002_agt2_365
crossref_primary_10_1021_acs_jafc_5c07416
crossref_primary_10_1021_acs_jcim_5c01017
crossref_primary_10_1002_ps_8496
crossref_primary_10_1016_j_jpha_2025_101317
crossref_primary_10_1039_D5RA01184A
crossref_primary_10_1002_ange_202504107
crossref_primary_10_3390_ijms252413509
crossref_primary_10_1093_bib_bbac626
crossref_primary_10_3390_inventions9050096
crossref_primary_10_1002_wcms_1678
crossref_primary_10_1016_j_sbi_2024_102812
crossref_primary_10_3390_cancers16010050
crossref_primary_10_1016_j_sbi_2024_102776
crossref_primary_10_1186_s13321_022_00598_4
crossref_primary_10_1016_j_jconrel_2022_11_023
crossref_primary_10_3390_molecules28031069
crossref_primary_10_1186_s12964_025_02337_2
crossref_primary_10_1038_s41586_025_09302_6
crossref_primary_10_1134_S1070363225600602
crossref_primary_10_1021_acsomega_5c02128
crossref_primary_10_1093_bib_bbaf259
crossref_primary_10_1002_med_21995
crossref_primary_10_1002_cmdc_202401012
crossref_primary_10_1039_D4MD00344F
crossref_primary_10_1186_s13321_024_00820_5
crossref_primary_10_1016_j_bmcl_2024_129711
crossref_primary_10_3389_fchem_2025_1651402
crossref_primary_10_1002_minf_202400293
crossref_primary_10_3390_ph15020236
crossref_primary_10_1186_s12859_025_06120_5
crossref_primary_10_1002_cbdv_202401987
crossref_primary_10_1021_acs_jmedchem_4c02972
crossref_primary_10_1016_j_compbiolchem_2025_108497
crossref_primary_10_1021_acs_jctc_5c00128
crossref_primary_10_1016_j_abb_2024_110233
crossref_primary_10_1038_s41573_023_00832_0
crossref_primary_10_1016_j_colsurfb_2025_114892
crossref_primary_10_1039_D5MD00252D
crossref_primary_10_1016_j_bioorg_2025_108216
crossref_primary_10_15302_J_QB_022_0321
crossref_primary_10_1016_j_compbiomed_2025_110545
crossref_primary_10_1016_j_ejmech_2025_118018
crossref_primary_10_1021_acsomega_5c03581
crossref_primary_10_1038_s41524_025_01552_2
crossref_primary_10_1124_jpet_123_001853
crossref_primary_10_1016_j_lfs_2025_123548
crossref_primary_10_1016_j_sciaf_2025_e02634
crossref_primary_10_3390_jof10110781
crossref_primary_10_1016_j_compbiomed_2025_110499
crossref_primary_10_1016_j_bioorg_2025_108465
crossref_primary_10_1016_j_aichem_2023_100022
crossref_primary_10_3390_ijms251910779
crossref_primary_10_1038_s42256_022_00463_x
crossref_primary_10_1016_j_ijantimicag_2024_107124
crossref_primary_10_3390_ph18091323
crossref_primary_10_3390_antiox12030665
crossref_primary_10_1021_acs_jmedchem_5c00512
crossref_primary_10_22159_ijap_2025v17i4_54404
crossref_primary_10_1016_j_ejmech_2025_117294
crossref_primary_10_1080_10406638_2025_2470274
crossref_primary_10_1016_j_bioorg_2024_107875
crossref_primary_10_1016_j_drudis_2024_103985
crossref_primary_10_3390_molecules30081812
crossref_primary_10_1016_j_steroids_2024_109517
crossref_primary_10_1021_acs_jafc_5c04753
crossref_primary_10_1038_s41589_024_01797_w
crossref_primary_10_1073_pnas_2310933120
crossref_primary_10_1016_j_drudis_2021_11_023
crossref_primary_10_1016_j_ejmech_2024_116729
crossref_primary_10_1016_j_bpj_2025_06_017
crossref_primary_10_3390_ijms25126580
crossref_primary_10_1039_D1SC05579H
crossref_primary_10_1039_D4MD00722K
crossref_primary_10_1038_s43588_025_00777_x
crossref_primary_10_1021_acs_biochem_5c00089
crossref_primary_10_1063_5_0283692
crossref_primary_10_1007_s12033_024_01307_2
crossref_primary_10_1039_D4SC00094C
crossref_primary_10_1021_acs_jpcb_4c04289
crossref_primary_10_1038_s44386_025_00019_0
crossref_primary_10_1007_s44371_025_00144_9
crossref_primary_10_1039_D4MD00123K
crossref_primary_10_3390_foods13142282
crossref_primary_10_1016_j_compbiomed_2025_110886
crossref_primary_10_1039_D5MD00161G
crossref_primary_10_1021_acs_jmedchem_4c03087
crossref_primary_10_1021_acschembio_5c00114
crossref_primary_10_1021_acs_jcim_5c01335
crossref_primary_10_1007_s11696_025_04321_z
crossref_primary_10_1007_s40203_024_00264_7
crossref_primary_10_3390_ph17080992
crossref_primary_10_3389_fmolb_2024_1442267
crossref_primary_10_1186_s13321_025_01070_9
crossref_primary_10_3390_plants14101473
crossref_primary_10_1128_mbio_03728_24
crossref_primary_10_1038_s41467_024_52061_7
crossref_primary_10_3892_etm_2024_12637
crossref_primary_10_1002_minf_202400305
crossref_primary_10_1016_j_molstruc_2025_143823
crossref_primary_10_1007_s11030_025_11333_0
crossref_primary_10_1021_acsmedchemlett_5c00339
crossref_primary_10_3389_fmolb_2025_1644169
crossref_primary_10_1002_wcms_1637
crossref_primary_10_1158_2159_8290_CD_23_0280
crossref_primary_10_3390_antiox13121585
crossref_primary_10_3390_molecules30122607
crossref_primary_10_1146_annurev_biodatasci_020222_025013
crossref_primary_10_1038_s41586_023_05905_z
crossref_primary_10_1016_j_medntd_2025_100375
crossref_primary_10_1038_s41598_025_02065_0
ContentType Journal Article
DBID 7X8
DOI 10.1021/acs.jctc.1c00810
DatabaseName MEDLINE - Academic
DatabaseTitle MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
Database_xml – sequence: 1
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Chemistry
EISSN 1549-9626
GroupedDBID 4.4
53G
55A
5GY
5VS
7X8
7~N
AABXI
ABBLG
ABJNI
ABLBI
ABMVS
ABQRX
ABUCX
ACGFS
ACIWK
ACS
ADHLV
AEESW
AENEX
AFEFF
AHGAQ
ALMA_UNASSIGNED_HOLDINGS
AQSVZ
BAANH
CS3
CUPRZ
D0L
DU5
EBS
ED~
F5P
GGK
GNL
IH9
J9A
JG~
P2P
RNS
ROL
UI2
VF5
VG9
W1F
ID FETCH-LOGICAL-a341t-d2092c2ed7c0cbafc808adfd38e3d3e96a019d08eb92fd3f611ad10c5a8fd07f2
IEDL.DBID 7X8
ISICitedReferencesCount 247
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000718183600033&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1549-9626
IngestDate Thu Jul 10 22:40:29 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 11
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a341t-d2092c2ed7c0cbafc808adfd38e3d3e96a019d08eb92fd3f611ad10c5a8fd07f2
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
PQID 2578765145
PQPubID 23479
ParticipantIDs proquest_miscellaneous_2578765145
PublicationCentury 2000
PublicationDate 20211109
PublicationDateYYYYMMDD 2021-11-09
PublicationDate_xml – month: 11
  year: 2021
  text: 20211109
  day: 09
PublicationDecade 2020
PublicationTitle Journal of chemical theory and computation
PublicationYear 2021
SSID ssj0033423
Score 2.6961231
Snippet With the advent of make-on-demand commercial libraries, the number of purchasable compounds available for virtual screening and assay has grown explosively in...
SourceID proquest
SourceType Aggregation Database
StartPage 7106
Title Efficient Exploration of Chemical Space with Docking and Deep Learning
URI https://www.proquest.com/docview/2578765145
Volume 17
WOSCitedRecordID wos000718183600033&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV09T8MwELWAIsHCN-JbRmJN69j5cCaESiumCgmQulXO-YxgSAop_H7ObgIDCxJr5CHxOffO707vMXYlMXWlzVSENjdR4ixEWnndW8gACeFUGbtgNpFPJno6Le5bwq1pxyq7nBgSta3Bc-SDcLQygvf0ev4Wedco311tLTRWWU9RKeNHuvLpdxdBeXW7oJeaeBVK2bUpCdYGBpr-KyygH4NHRfErFQd8GW__98122FZbWfKb5VHYZStY7bGNYWfots_GoyAXQSjDl5N3ISi8drxTDeAPdING7rlZTujjWXRuKstvEee8VWJ9PmBP49Hj8C5qbRQiQxC1iKwUhQRJwQABpXGghTbWWaVRWYVFZqjMs0JjWUh66rI4NjYWkBrtrMidPGRrVV3hEeMuKV2ZpS5VVtK92hmV06oSY0hyTDA9ZpfdFs3o43zvwVRYfzSzn006-cOaU7Yp_eyIp2-LM9Zz9CviOVuHz8VL834RovwFDf6zuQ
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Efficient+Exploration+of+Chemical+Space+with+Docking+and+Deep+Learning&rft.jtitle=Journal+of+chemical+theory+and+computation&rft.au=Yang%2C+Ying&rft.au=Yao%2C+Kun&rft.au=Repasky%2C+Matthew+P&rft.au=Leswing%2C+Karl&rft.date=2021-11-09&rft.issn=1549-9626&rft.eissn=1549-9626&rft.volume=17&rft.issue=11&rft.spage=7106&rft_id=info:doi/10.1021%2Facs.jctc.1c00810&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1549-9626&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1549-9626&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1549-9626&client=summon