FAME: Fragment-based Conditional Molecular Generation for Phenotypic Drug Discovery
molecular design is a key challenge in drug discovery due to the complexity of chemical space. With the availability of molecular datasets and advances in machine learning, many deep generative models are proposed for generating novel molecules with desired properties. However, most of the existing...
Gespeichert in:
| Veröffentlicht in: | Proceedings of the ... SIAM International Conference on Data Mining Jg. 2022; S. 720 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
United States
2022
|
| Schlagworte: | |
| ISSN: | 2167-0102 |
| Online-Zugang: | Weitere Angaben |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | molecular design is a key challenge in drug discovery due to the complexity of chemical space. With the availability of molecular datasets and advances in machine learning, many deep generative models are proposed for generating novel molecules with desired properties. However, most of the existing models focus only on molecular distribution learning and target-based molecular design, thereby hindering their potentials in real-world applications. In drug discovery, phenotypic molecular design has advantages over target-based molecular design, especially in first-in-class drug discovery. In this work, we propose the first deep graph generative model (FAME) targeting phenotypic molecular design, in particular gene expression-based molecular design. FAME leverages a conditional variational autoencoder framework to learn the conditional distribution generating molecules from gene expression profiles. However, this distribution is difficult to learn due to the complexity of the molecular space and the noisy phenomenon in gene expression data. To tackle these issues, a gene expression denoising (GED) model that employs contrastive objective function is first proposed to reduce noise from gene expression data. FAME is then designed to treat molecules as the sequences of fragments and learn to generate these fragments in autoregressive manner. By leveraging this fragment-based generation strategy and the denoised gene expression profiles, FAME can generate novel molecules with a high validity rate and desired biological activity. The experimental results show that FAME outperforms existing methods including both SMILES-based and graph-based deep generative models for phenotypic molecular design. Furthermore, the effective mechanism for reducing noise in gene expression data proposed in our study can be applied to omics data modeling in general for facilitating phenotypic drug discovery. |
|---|---|
| AbstractList | molecular design is a key challenge in drug discovery due to the complexity of chemical space. With the availability of molecular datasets and advances in machine learning, many deep generative models are proposed for generating novel molecules with desired properties. However, most of the existing models focus only on molecular distribution learning and target-based molecular design, thereby hindering their potentials in real-world applications. In drug discovery, phenotypic molecular design has advantages over target-based molecular design, especially in first-in-class drug discovery. In this work, we propose the first deep graph generative model (FAME) targeting phenotypic molecular design, in particular gene expression-based molecular design. FAME leverages a conditional variational autoencoder framework to learn the conditional distribution generating molecules from gene expression profiles. However, this distribution is difficult to learn due to the complexity of the molecular space and the noisy phenomenon in gene expression data. To tackle these issues, a gene expression denoising (GED) model that employs contrastive objective function is first proposed to reduce noise from gene expression data. FAME is then designed to treat molecules as the sequences of fragments and learn to generate these fragments in autoregressive manner. By leveraging this fragment-based generation strategy and the denoised gene expression profiles, FAME can generate novel molecules with a high validity rate and desired biological activity. The experimental results show that FAME outperforms existing methods including both SMILES-based and graph-based deep generative models for phenotypic molecular design. Furthermore, the effective mechanism for reducing noise in gene expression data proposed in our study can be applied to omics data modeling in general for facilitating phenotypic drug discovery. De novo molecular design is a key challenge in drug discovery due to the complexity of chemical space. With the availability of molecular datasets and advances in machine learning, many deep generative models are proposed for generating novel molecules with desired properties. However, most of the existing models focus only on molecular distribution learning and target-based molecular design, thereby hindering their potentials in real-world applications. In drug discovery, phenotypic molecular design has advantages over target-based molecular design, especially in first-in-class drug discovery. In this work, we propose the first deep graph generative model (FAME) targeting phenotypic molecular design, in particular gene expression-based molecular design. FAME leverages a conditional variational autoencoder framework to learn the conditional distribution generating molecules from gene expression profiles. However, this distribution is difficult to learn due to the complexity of the molecular space and the noisy phenomenon in gene expression data. To tackle these issues, a gene expression denoising (GED) model that employs contrastive objective function is first proposed to reduce noise from gene expression data. FAME is then designed to treat molecules as the sequences of fragments and learn to generate these fragments in autoregressive manner. By leveraging this fragment-based generation strategy and the denoised gene expression profiles, FAME can generate novel molecules with a high validity rate and desired biological activity. The experimental results show that FAME outperforms existing methods including both SMILES-based and graph-based deep generative models for phenotypic molecular design. Furthermore, the effective mechanism for reducing noise in gene expression data proposed in our study can be applied to omics data modeling in general for facilitating phenotypic drug discovery.De novo molecular design is a key challenge in drug discovery due to the complexity of chemical space. With the availability of molecular datasets and advances in machine learning, many deep generative models are proposed for generating novel molecules with desired properties. However, most of the existing models focus only on molecular distribution learning and target-based molecular design, thereby hindering their potentials in real-world applications. In drug discovery, phenotypic molecular design has advantages over target-based molecular design, especially in first-in-class drug discovery. In this work, we propose the first deep graph generative model (FAME) targeting phenotypic molecular design, in particular gene expression-based molecular design. FAME leverages a conditional variational autoencoder framework to learn the conditional distribution generating molecules from gene expression profiles. However, this distribution is difficult to learn due to the complexity of the molecular space and the noisy phenomenon in gene expression data. To tackle these issues, a gene expression denoising (GED) model that employs contrastive objective function is first proposed to reduce noise from gene expression data. FAME is then designed to treat molecules as the sequences of fragments and learn to generate these fragments in autoregressive manner. By leveraging this fragment-based generation strategy and the denoised gene expression profiles, FAME can generate novel molecules with a high validity rate and desired biological activity. The experimental results show that FAME outperforms existing methods including both SMILES-based and graph-based deep generative models for phenotypic molecular design. Furthermore, the effective mechanism for reducing noise in gene expression data proposed in our study can be applied to omics data modeling in general for facilitating phenotypic drug discovery. |
| Author | Xie, Lei Zhang, Ping Pham, Thai-Hoang |
| Author_xml | – sequence: 1 givenname: Thai-Hoang surname: Pham fullname: Pham, Thai-Hoang organization: Department of Computer Science and Engineering, The Ohio State University, Columbus, USA – sequence: 2 givenname: Lei surname: Xie fullname: Xie, Lei organization: Department of Computer Science, Hunter College, The City University of New York, New York City, USA; Neuroscience, Weill Cornell Medicine, New York City, USA – sequence: 3 givenname: Ping surname: Zhang fullname: Zhang, Ping organization: Department of Biomedical Informatics and Department of Computer Science and Engineering, The Ohio State University, Columbus, USA |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/35509686$$D View this record in MEDLINE/PubMed |
| BookMark | eNo1UEtPwzAYy2GIjbE_wAHlyKUjjzUPbtNeIG0CCThXSfp1FLVJSVuk_XuKGCdblmXZvkIjHzwgdEPJnFIu7-lcS0UFpVpKKtlc0RGaMCpkQihhYzRr209CBp7KBSOXaMzTlGihxAS9bpeHzQPeRnOswXeJNS3keBV8XnZl8KbCh1CB6ysT8Q48RPMr4yJE_PIBPnSnpnR4HfsjXpetC98QT9foojBVC7MzTtH7dvO2ekz2z7un1XKfNJySLhGKM0eA60JycIUD61SqHeHWGcILK1JptS2sllo4upBMEceVzo1aCGNywqbo7i-3ieGrh7bL6qECVJXxEPo2Y0IMm2kq1WC9PVt7W0OeNbGsTTxl_0ewH_OOYLE |
| ContentType | Journal Article |
| DBID | NPM 7X8 |
| DOI | 10.1137/1.9781611977172.81 |
| DatabaseName | PubMed MEDLINE - Academic |
| DatabaseTitle | PubMed MEDLINE - Academic |
| DatabaseTitleList | PubMed MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Computer Science |
| ExternalDocumentID | 35509686 |
| Genre | Journal Article |
| GrantInformation_xml | – fundername: NLM NIH HHS grantid: R01 LM013771 – fundername: NIGMS NIH HHS grantid: R01 GM141279 |
| GroupedDBID | 7WY 7X2 7XC 88I 8CJ 8FE 8FG 8FH 8FL 8G5 ABJCF ABUWG ACGOD ACIWK ACPRK ADBBV AFKRA AFRAH ALMA_UNASSIGNED_HOLDINGS ARAPS ATCPS AZQEC BBNVY BENPR BEZIV BGLVJ BHPHI BPHCQ CCPQU CZ9 D1I D1J D1K DWQXO FRNLG GNUQQ GUQSH HCIFZ K6- K60 K6V K6~ K7- KB. KC. L6V LK5 LK8 M0C M0K M1Q M2O M2P M7P M7R M7S NPM P62 PATMY PDBOC PHGZT PQBIZ PQBZA PQQKQ PROAC PTHSS PYCSY 7X8 |
| ID | FETCH-LOGICAL-p310t-6832c0e39f73ecfcebc859c03bca03fb657b9bfb9796c147280c389da846aad02 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 9 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001281343300080&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2167-0102 |
| IngestDate | Fri Jul 11 05:47:45 EDT 2025 Thu Apr 03 07:06:51 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | false |
| Keywords | conditional generation contrastive learning fragment variational autoencoder gene expression |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-p310t-6832c0e39f73ecfcebc859c03bca03fb657b9bfb9796c147280c389da846aad02 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| OpenAccessLink | https://www.ncbi.nlm.nih.gov/pmc/articles/9061137 |
| PMID | 35509686 |
| PQID | 2660101578 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_2660101578 pubmed_primary_35509686 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-00-00 20220101 |
| PublicationDateYYYYMMDD | 2022-01-01 |
| PublicationDate_xml | – year: 2022 text: 2022-00-00 |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | Proceedings of the ... SIAM International Conference on Data Mining |
| PublicationTitleAlternate | Proc SIAM Int Conf Data Min |
| PublicationYear | 2022 |
| SSID | ssj0001057420 |
| Score | 1.8489869 |
| Snippet | molecular design is a key challenge in drug discovery due to the complexity of chemical space. With the availability of molecular datasets and advances in... De novo molecular design is a key challenge in drug discovery due to the complexity of chemical space. With the availability of molecular datasets and advances... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 720 |
| Title | FAME: Fragment-based Conditional Molecular Generation for Phenotypic Drug Discovery |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/35509686 https://www.proquest.com/docview/2660101578 |
| Volume | 2022 |
| WOSCitedRecordID | wos001281343300080&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8NAEF7UevBifVtfrOA1bZptsokXKX3gpaWgQm9hd7KpvSSxaYX-e2fyoCdB8LKHQCBMZna-mdn9PsaedKyFcj1tIbT1cVGupYV0LCFAYnaSputAITYhp1N_Pg9mVcMtr45V1ntisVFHKVCPvIOJhOjQ0MFesi-LVKNoulpJaOyzhkAoQ14t5_6ux4JgpFcwMzpE7030afW9GSE73TbxPXk0R8Oaxmn73d9RZpFtxs3_fucJO65wJu-XjnHK9kxyxpq1hgOvQvqcvY37k9EzR_y6oD6hRVkt4oOUJtlFl5BPagFdXlJU02OOUJfPPk2SrrfZEvhwtVnw4TIHOhC6vWAf49H74NWqhBasDNHd2vIwrME2IoilMBCD0eC7AdhCg7JFrD1X6gD_aiADD7o9UrQCBDqRQvCiVGQ7l-wgSRNzzTggPBNgfNWLEYkpUBCBNK7CHGhrLI1a7LE2W4iOTNMJlZh0k4c7w7XYVWn7MCsZN0IERVhq-d7NH96-ZUcOXVEo2iR3rBFjGJt7dgjf62W-eig8BNfpbPIDT7jEeg |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=FAME%3A+Fragment-based+Conditional+Molecular+Generation+for+Phenotypic+Drug+Discovery&rft.jtitle=Proceedings+of+the+...+SIAM+International+Conference+on+Data+Mining&rft.au=Pham%2C+Thai-Hoang&rft.au=Xie%2C+Lei&rft.au=Zhang%2C+Ping&rft.date=2022-01-01&rft.issn=2167-0102&rft.volume=2022&rft.spage=720&rft_id=info:doi/10.1137%2F1.9781611977172.81&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2167-0102&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2167-0102&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2167-0102&client=summon |