Efficient Exploration of Chemical Space with Docking and Deep Learning
With the advent of make-on-demand commercial libraries, the number of purchasable compounds available for virtual screening and assay has grown explosively in recent years, with several libraries eclipsing one billion compounds. Today's screening libraries are larger and more diverse, enabling...
Gespeichert in:
| Veröffentlicht in: | Journal of chemical theory and computation Jg. 17; H. 11; S. 7106 |
|---|---|
| Hauptverfasser: | , , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
09.11.2021
|
| ISSN: | 1549-9626, 1549-9626 |
| Online-Zugang: | Weitere Angaben |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | With the advent of make-on-demand commercial libraries, the number of purchasable compounds available for virtual screening and assay has grown explosively in recent years, with several libraries eclipsing one billion compounds. Today's screening libraries are larger and more diverse, enabling the discovery of more-potent hit compounds and unlocking new areas of chemical space, represented by new core scaffolds. Applying physics-based in silico screening methods in an exhaustive manner, where every molecule in the library must be enumerated and evaluated independently, is increasingly cost-prohibitive. Here, we introduce a protocol for machine learning-enhanced molecular docking based on active learning to dramatically increase throughput over traditional docking. We leverage a novel selection protocol that strikes a balance between two objectives: (1) identifying the best scoring compounds and (2) exploring a large region of chemical space, demonstrating superior performance compared to a purely greedy approach. Together with automated redocking of the top compounds, this method captures almost all the high scoring scaffolds in the library found by exhaustive docking. This protocol is applied to our recent virtual screening campaigns against the D4 and AMPC targets that produced dozens of highly potent, novel inhibitors, and a blind test against the MT1 target. Our protocol recovers more than 80% of the experimentally confirmed hits with a 14-fold reduction in compute cost, and more than 90% of the hit scaffolds in the top 5% of model predictions, preserving the diversity of the experimentally confirmed hit compounds.With the advent of make-on-demand commercial libraries, the number of purchasable compounds available for virtual screening and assay has grown explosively in recent years, with several libraries eclipsing one billion compounds. Today's screening libraries are larger and more diverse, enabling the discovery of more-potent hit compounds and unlocking new areas of chemical space, represented by new core scaffolds. Applying physics-based in silico screening methods in an exhaustive manner, where every molecule in the library must be enumerated and evaluated independently, is increasingly cost-prohibitive. Here, we introduce a protocol for machine learning-enhanced molecular docking based on active learning to dramatically increase throughput over traditional docking. We leverage a novel selection protocol that strikes a balance between two objectives: (1) identifying the best scoring compounds and (2) exploring a large region of chemical space, demonstrating superior performance compared to a purely greedy approach. Together with automated redocking of the top compounds, this method captures almost all the high scoring scaffolds in the library found by exhaustive docking. This protocol is applied to our recent virtual screening campaigns against the D4 and AMPC targets that produced dozens of highly potent, novel inhibitors, and a blind test against the MT1 target. Our protocol recovers more than 80% of the experimentally confirmed hits with a 14-fold reduction in compute cost, and more than 90% of the hit scaffolds in the top 5% of model predictions, preserving the diversity of the experimentally confirmed hit compounds. |
|---|---|
| AbstractList | With the advent of make-on-demand commercial libraries, the number of purchasable compounds available for virtual screening and assay has grown explosively in recent years, with several libraries eclipsing one billion compounds. Today's screening libraries are larger and more diverse, enabling the discovery of more-potent hit compounds and unlocking new areas of chemical space, represented by new core scaffolds. Applying physics-based in silico screening methods in an exhaustive manner, where every molecule in the library must be enumerated and evaluated independently, is increasingly cost-prohibitive. Here, we introduce a protocol for machine learning-enhanced molecular docking based on active learning to dramatically increase throughput over traditional docking. We leverage a novel selection protocol that strikes a balance between two objectives: (1) identifying the best scoring compounds and (2) exploring a large region of chemical space, demonstrating superior performance compared to a purely greedy approach. Together with automated redocking of the top compounds, this method captures almost all the high scoring scaffolds in the library found by exhaustive docking. This protocol is applied to our recent virtual screening campaigns against the D4 and AMPC targets that produced dozens of highly potent, novel inhibitors, and a blind test against the MT1 target. Our protocol recovers more than 80% of the experimentally confirmed hits with a 14-fold reduction in compute cost, and more than 90% of the hit scaffolds in the top 5% of model predictions, preserving the diversity of the experimentally confirmed hit compounds.With the advent of make-on-demand commercial libraries, the number of purchasable compounds available for virtual screening and assay has grown explosively in recent years, with several libraries eclipsing one billion compounds. Today's screening libraries are larger and more diverse, enabling the discovery of more-potent hit compounds and unlocking new areas of chemical space, represented by new core scaffolds. Applying physics-based in silico screening methods in an exhaustive manner, where every molecule in the library must be enumerated and evaluated independently, is increasingly cost-prohibitive. Here, we introduce a protocol for machine learning-enhanced molecular docking based on active learning to dramatically increase throughput over traditional docking. We leverage a novel selection protocol that strikes a balance between two objectives: (1) identifying the best scoring compounds and (2) exploring a large region of chemical space, demonstrating superior performance compared to a purely greedy approach. Together with automated redocking of the top compounds, this method captures almost all the high scoring scaffolds in the library found by exhaustive docking. This protocol is applied to our recent virtual screening campaigns against the D4 and AMPC targets that produced dozens of highly potent, novel inhibitors, and a blind test against the MT1 target. Our protocol recovers more than 80% of the experimentally confirmed hits with a 14-fold reduction in compute cost, and more than 90% of the hit scaffolds in the top 5% of model predictions, preserving the diversity of the experimentally confirmed hit compounds. |
| Author | Shoichet, Brian K Yao, Kun Jerome, Steven V Yang, Ying Leswing, Karl Repasky, Matthew P Abel, Robert |
| Author_xml | – sequence: 1 givenname: Ying surname: Yang fullname: Yang, Ying – sequence: 2 givenname: Kun surname: Yao fullname: Yao, Kun – sequence: 3 givenname: Matthew P surname: Repasky fullname: Repasky, Matthew P – sequence: 4 givenname: Karl surname: Leswing fullname: Leswing, Karl – sequence: 5 givenname: Robert surname: Abel fullname: Abel, Robert – sequence: 6 givenname: Brian K surname: Shoichet fullname: Shoichet, Brian K – sequence: 7 givenname: Steven V surname: Jerome fullname: Jerome, Steven V |
| BookMark | eNpNjLtOAzEQAC0UJJJAT-mS5sLavoevRHkAUiQKoI426zVxuNhHfBF8PkhQUM1oipmIUUyRhbhWMFOg1S1Snu1poJkiAKvgTIxVVbZFW-t69M8vxCTnPYAxpTZjsVp6HyhwHOTyq-_SEYeQokxeznd8CISdfO6RWH6GYScXid5DfJMYnVww93LNeIw_5VKce-wyX_1xKl5Xy5f5Q7F-un-c360LNKUaCqeh1aTZNQS0RU8WLDrvjGXjDLc1gmodWN62-qf6Wil0CqhC6x00Xk_Fze-3P6aPE-dhcwiZuOswcjrlja4a29SVKiv9DfNzUxY |
| CitedBy_id | crossref_primary_10_1016_j_rechem_2025_102651 crossref_primary_10_1007_s40203_025_00360_2 crossref_primary_10_1002_ejoc_202400367 crossref_primary_10_1021_acsomega_5c00829 crossref_primary_10_2174_0109298665361116250121103146 crossref_primary_10_1016_j_heliyon_2025_e42584 crossref_primary_10_1021_acs_jcim_5c00394 crossref_primary_10_1002_cbdv_202403449 crossref_primary_10_1007_s11030_025_11206_6 crossref_primary_10_1007_s12031_025_02409_5 crossref_primary_10_1016_j_molstruc_2025_141375 crossref_primary_10_3390_cimb46100666 crossref_primary_10_1016_j_drudis_2024_104106 crossref_primary_10_1038_s41598_025_05644_3 crossref_primary_10_3390_ijms26115321 crossref_primary_10_1039_D5RA03958D crossref_primary_10_1002_cmdc_202500247 crossref_primary_10_1016_j_sbi_2023_102528 crossref_primary_10_1038_s41598_023_48281_4 crossref_primary_10_1038_s41598_025_00024_3 crossref_primary_10_1016_j_ejmech_2025_117402 crossref_primary_10_1016_j_ailsci_2021_100023 crossref_primary_10_1007_s10462_024_10714_5 crossref_primary_10_1016_j_ijbiomac_2025_141111 crossref_primary_10_1002_cmdc_202500365 crossref_primary_10_1093_bib_bbaf286 crossref_primary_10_1016_j_ejmech_2024_116539 crossref_primary_10_1016_j_ijbiomac_2024_138180 crossref_primary_10_1146_annurev_biochem_030222_120000 crossref_primary_10_1016_j_compbiomed_2025_110570 crossref_primary_10_3897_pharmacia_71_e132720 crossref_primary_10_1016_j_bbadis_2024_167626 crossref_primary_10_1016_j_xphs_2025_103708 crossref_primary_10_1080_14756366_2024_2418470 crossref_primary_10_2174_0113816128349577240927071706 crossref_primary_10_1016_j_molliq_2025_127423 crossref_primary_10_1016_j_csbj_2024_04_063 crossref_primary_10_1038_s41598_025_09420_1 crossref_primary_10_2174_0109298673307315240730042209 crossref_primary_10_1038_s41401_025_01607_6 crossref_primary_10_1016_j_ijbiomac_2025_141762 crossref_primary_10_1016_j_matchemphys_2025_131269 crossref_primary_10_1016_j_molstruc_2024_138763 crossref_primary_10_1002_app_57100 crossref_primary_10_1021_acs_jcim_5c00214 crossref_primary_10_3390_ijms241411265 crossref_primary_10_3390_ijms252212350 crossref_primary_10_1038_s41467_024_55287_7 crossref_primary_10_1021_acs_jcim_5c01029 crossref_primary_10_1021_acs_jcim_4c02107 crossref_primary_10_1016_j_ejmcr_2025_100289 crossref_primary_10_1021_acs_jcim_5c00850 crossref_primary_10_1080_17568919_2025_2458452 crossref_primary_10_3390_ijms24076109 crossref_primary_10_1007_s40203_024_00278_1 crossref_primary_10_1002_anie_202504107 crossref_primary_10_3390_molecules29225349 crossref_primary_10_1016_j_molstruc_2024_140464 crossref_primary_10_1016_j_molstruc_2025_142007 crossref_primary_10_1080_10408398_2025_2516136 crossref_primary_10_1016_j_ejmech_2024_117062 crossref_primary_10_3390_ijms26010315 crossref_primary_10_1016_j_chphi_2024_100743 crossref_primary_10_1016_j_molstruc_2025_142487 crossref_primary_10_3390_cimb47060434 crossref_primary_10_1016_j_sbi_2024_102829 crossref_primary_10_1002_agt2_365 crossref_primary_10_1021_acs_jafc_5c07416 crossref_primary_10_1021_acs_jcim_5c01017 crossref_primary_10_1002_ps_8496 crossref_primary_10_1016_j_jpha_2025_101317 crossref_primary_10_1039_D5RA01184A crossref_primary_10_1002_ange_202504107 crossref_primary_10_3390_ijms252413509 crossref_primary_10_1093_bib_bbac626 crossref_primary_10_3390_inventions9050096 crossref_primary_10_1002_wcms_1678 crossref_primary_10_1016_j_sbi_2024_102812 crossref_primary_10_3390_cancers16010050 crossref_primary_10_1016_j_sbi_2024_102776 crossref_primary_10_1186_s13321_022_00598_4 crossref_primary_10_1016_j_jconrel_2022_11_023 crossref_primary_10_3390_molecules28031069 crossref_primary_10_1186_s12964_025_02337_2 crossref_primary_10_1038_s41586_025_09302_6 crossref_primary_10_1134_S1070363225600602 crossref_primary_10_1021_acsomega_5c02128 crossref_primary_10_1093_bib_bbaf259 crossref_primary_10_1002_med_21995 crossref_primary_10_1002_cmdc_202401012 crossref_primary_10_1039_D4MD00344F crossref_primary_10_1186_s13321_024_00820_5 crossref_primary_10_1016_j_bmcl_2024_129711 crossref_primary_10_3389_fchem_2025_1651402 crossref_primary_10_1002_minf_202400293 crossref_primary_10_3390_ph15020236 crossref_primary_10_1186_s12859_025_06120_5 crossref_primary_10_1002_cbdv_202401987 crossref_primary_10_1021_acs_jmedchem_4c02972 crossref_primary_10_1016_j_compbiolchem_2025_108497 crossref_primary_10_1021_acs_jctc_5c00128 crossref_primary_10_1016_j_abb_2024_110233 crossref_primary_10_1038_s41573_023_00832_0 crossref_primary_10_1016_j_colsurfb_2025_114892 crossref_primary_10_1039_D5MD00252D crossref_primary_10_1016_j_bioorg_2025_108216 crossref_primary_10_15302_J_QB_022_0321 crossref_primary_10_1016_j_compbiomed_2025_110545 crossref_primary_10_1016_j_ejmech_2025_118018 crossref_primary_10_1021_acsomega_5c03581 crossref_primary_10_1038_s41524_025_01552_2 crossref_primary_10_1124_jpet_123_001853 crossref_primary_10_1016_j_lfs_2025_123548 crossref_primary_10_1016_j_sciaf_2025_e02634 crossref_primary_10_3390_jof10110781 crossref_primary_10_1016_j_compbiomed_2025_110499 crossref_primary_10_1016_j_bioorg_2025_108465 crossref_primary_10_1016_j_aichem_2023_100022 crossref_primary_10_3390_ijms251910779 crossref_primary_10_1038_s42256_022_00463_x crossref_primary_10_1016_j_ijantimicag_2024_107124 crossref_primary_10_3390_ph18091323 crossref_primary_10_3390_antiox12030665 crossref_primary_10_1021_acs_jmedchem_5c00512 crossref_primary_10_22159_ijap_2025v17i4_54404 crossref_primary_10_1016_j_ejmech_2025_117294 crossref_primary_10_1080_10406638_2025_2470274 crossref_primary_10_1016_j_bioorg_2024_107875 crossref_primary_10_1016_j_drudis_2024_103985 crossref_primary_10_3390_molecules30081812 crossref_primary_10_1016_j_steroids_2024_109517 crossref_primary_10_1021_acs_jafc_5c04753 crossref_primary_10_1038_s41589_024_01797_w crossref_primary_10_1073_pnas_2310933120 crossref_primary_10_1016_j_drudis_2021_11_023 crossref_primary_10_1016_j_ejmech_2024_116729 crossref_primary_10_1016_j_bpj_2025_06_017 crossref_primary_10_3390_ijms25126580 crossref_primary_10_1039_D1SC05579H crossref_primary_10_1039_D4MD00722K crossref_primary_10_1038_s43588_025_00777_x crossref_primary_10_1021_acs_biochem_5c00089 crossref_primary_10_1063_5_0283692 crossref_primary_10_1007_s12033_024_01307_2 crossref_primary_10_1039_D4SC00094C crossref_primary_10_1021_acs_jpcb_4c04289 crossref_primary_10_1038_s44386_025_00019_0 crossref_primary_10_1007_s44371_025_00144_9 crossref_primary_10_1039_D4MD00123K crossref_primary_10_3390_foods13142282 crossref_primary_10_1016_j_compbiomed_2025_110886 crossref_primary_10_1039_D5MD00161G crossref_primary_10_1021_acs_jmedchem_4c03087 crossref_primary_10_1021_acschembio_5c00114 crossref_primary_10_1021_acs_jcim_5c01335 crossref_primary_10_1007_s11696_025_04321_z crossref_primary_10_1007_s40203_024_00264_7 crossref_primary_10_3390_ph17080992 crossref_primary_10_3389_fmolb_2024_1442267 crossref_primary_10_1186_s13321_025_01070_9 crossref_primary_10_3390_plants14101473 crossref_primary_10_1128_mbio_03728_24 crossref_primary_10_1038_s41467_024_52061_7 crossref_primary_10_3892_etm_2024_12637 crossref_primary_10_1002_minf_202400305 crossref_primary_10_1016_j_molstruc_2025_143823 crossref_primary_10_1007_s11030_025_11333_0 crossref_primary_10_1021_acsmedchemlett_5c00339 crossref_primary_10_3389_fmolb_2025_1644169 crossref_primary_10_1002_wcms_1637 crossref_primary_10_1158_2159_8290_CD_23_0280 crossref_primary_10_3390_antiox13121585 crossref_primary_10_3390_molecules30122607 crossref_primary_10_1146_annurev_biodatasci_020222_025013 crossref_primary_10_1038_s41586_023_05905_z crossref_primary_10_1016_j_medntd_2025_100375 crossref_primary_10_1038_s41598_025_02065_0 |
| ContentType | Journal Article |
| DBID | 7X8 |
| DOI | 10.1021/acs.jctc.1c00810 |
| DatabaseName | MEDLINE - Academic |
| DatabaseTitle | MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Chemistry |
| EISSN | 1549-9626 |
| GroupedDBID | 4.4 53G 55A 5GY 5VS 7X8 7~N AABXI ABBLG ABJNI ABLBI ABMVS ABQRX ABUCX ACGFS ACIWK ACS ADHLV AEESW AENEX AFEFF AHGAQ ALMA_UNASSIGNED_HOLDINGS AQSVZ BAANH CS3 CUPRZ D0L DU5 EBS ED~ F5P GGK GNL IH9 J9A JG~ P2P RNS ROL UI2 VF5 VG9 W1F |
| ID | FETCH-LOGICAL-a341t-d2092c2ed7c0cbafc808adfd38e3d3e96a019d08eb92fd3f611ad10c5a8fd07f2 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 247 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000718183600033&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1549-9626 |
| IngestDate | Thu Jul 10 22:40:29 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 11 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a341t-d2092c2ed7c0cbafc808adfd38e3d3e96a019d08eb92fd3f611ad10c5a8fd07f2 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| PQID | 2578765145 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_2578765145 |
| PublicationCentury | 2000 |
| PublicationDate | 20211109 |
| PublicationDateYYYYMMDD | 2021-11-09 |
| PublicationDate_xml | – month: 11 year: 2021 text: 20211109 day: 09 |
| PublicationDecade | 2020 |
| PublicationTitle | Journal of chemical theory and computation |
| PublicationYear | 2021 |
| SSID | ssj0033423 |
| Score | 2.6961231 |
| Snippet | With the advent of make-on-demand commercial libraries, the number of purchasable compounds available for virtual screening and assay has grown explosively in... |
| SourceID | proquest |
| SourceType | Aggregation Database |
| StartPage | 7106 |
| Title | Efficient Exploration of Chemical Space with Docking and Deep Learning |
| URI | https://www.proquest.com/docview/2578765145 |
| Volume | 17 |
| WOSCitedRecordID | wos000718183600033&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8QwEA7qCnrxLb6J4LW7afpKTiLrLp4WQYW9LdNkInpoV7v6-81kWz14EbyWFNo85pt88_ENY1eJVChlbEipBlGqNEYKShlBZuIi9y-pMg3NJorJRE2n-r4l3JpWVtnFxBCobW2IIx-ErZV7eM-u528RdY2i6mrbQmOV9RKfypCkq5h-VxEScrcLfqkpuVDKrkzpYW0Apum_moXpx4ZQUfwKxQFfxtv__bIdttVmlvxmuRV22QpWe2xj2DV022fjUbCL8CjDl8q7sCi8drxzDeAP_gaNnLhZ7tGHWHQOleW3iHPeOrE-H7Cn8ehxeBe1bRQi8BC1iKwUWhqJtjDClOCMEgqss4nCxCaoc_BpnhUKSy39U5fHMdhYmAyUs6Jw8pCtVXWFR4xnWJJlXiHTXKWQIThlLViNKalPtD5ml90UzfzPUe0BKqw_mtnPJJ38Ycwp25SkHSH6Vp-xnvNHEc_ZuvlcvDTvF2GVvwC-BrNn |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Efficient+Exploration+of+Chemical+Space+with+Docking+and+Deep+Learning&rft.jtitle=Journal+of+chemical+theory+and+computation&rft.au=Yang%2C+Ying&rft.au=Yao%2C+Kun&rft.au=Repasky%2C+Matthew+P&rft.au=Leswing%2C+Karl&rft.date=2021-11-09&rft.issn=1549-9626&rft.eissn=1549-9626&rft.volume=17&rft.issue=11&rft.spage=7106&rft_id=info:doi/10.1021%2Facs.jctc.1c00810&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1549-9626&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1549-9626&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1549-9626&client=summon |