FAME: Fragment-based Conditional Molecular Generation for Phenotypic Drug Discovery

molecular design is a key challenge in drug discovery due to the complexity of chemical space. With the availability of molecular datasets and advances in machine learning, many deep generative models are proposed for generating novel molecules with desired properties. However, most of the existing...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the ... SIAM International Conference on Data Mining Jg. 2022; S. 720
Hauptverfasser: Pham, Thai-Hoang, Xie, Lei, Zhang, Ping
Format: Journal Article
Sprache:Englisch
Veröffentlicht: United States 2022
Schlagworte:
ISSN:2167-0102
Online-Zugang:Weitere Angaben
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract molecular design is a key challenge in drug discovery due to the complexity of chemical space. With the availability of molecular datasets and advances in machine learning, many deep generative models are proposed for generating novel molecules with desired properties. However, most of the existing models focus only on molecular distribution learning and target-based molecular design, thereby hindering their potentials in real-world applications. In drug discovery, phenotypic molecular design has advantages over target-based molecular design, especially in first-in-class drug discovery. In this work, we propose the first deep graph generative model (FAME) targeting phenotypic molecular design, in particular gene expression-based molecular design. FAME leverages a conditional variational autoencoder framework to learn the conditional distribution generating molecules from gene expression profiles. However, this distribution is difficult to learn due to the complexity of the molecular space and the noisy phenomenon in gene expression data. To tackle these issues, a gene expression denoising (GED) model that employs contrastive objective function is first proposed to reduce noise from gene expression data. FAME is then designed to treat molecules as the sequences of fragments and learn to generate these fragments in autoregressive manner. By leveraging this fragment-based generation strategy and the denoised gene expression profiles, FAME can generate novel molecules with a high validity rate and desired biological activity. The experimental results show that FAME outperforms existing methods including both SMILES-based and graph-based deep generative models for phenotypic molecular design. Furthermore, the effective mechanism for reducing noise in gene expression data proposed in our study can be applied to omics data modeling in general for facilitating phenotypic drug discovery.
AbstractList molecular design is a key challenge in drug discovery due to the complexity of chemical space. With the availability of molecular datasets and advances in machine learning, many deep generative models are proposed for generating novel molecules with desired properties. However, most of the existing models focus only on molecular distribution learning and target-based molecular design, thereby hindering their potentials in real-world applications. In drug discovery, phenotypic molecular design has advantages over target-based molecular design, especially in first-in-class drug discovery. In this work, we propose the first deep graph generative model (FAME) targeting phenotypic molecular design, in particular gene expression-based molecular design. FAME leverages a conditional variational autoencoder framework to learn the conditional distribution generating molecules from gene expression profiles. However, this distribution is difficult to learn due to the complexity of the molecular space and the noisy phenomenon in gene expression data. To tackle these issues, a gene expression denoising (GED) model that employs contrastive objective function is first proposed to reduce noise from gene expression data. FAME is then designed to treat molecules as the sequences of fragments and learn to generate these fragments in autoregressive manner. By leveraging this fragment-based generation strategy and the denoised gene expression profiles, FAME can generate novel molecules with a high validity rate and desired biological activity. The experimental results show that FAME outperforms existing methods including both SMILES-based and graph-based deep generative models for phenotypic molecular design. Furthermore, the effective mechanism for reducing noise in gene expression data proposed in our study can be applied to omics data modeling in general for facilitating phenotypic drug discovery.
De novo molecular design is a key challenge in drug discovery due to the complexity of chemical space. With the availability of molecular datasets and advances in machine learning, many deep generative models are proposed for generating novel molecules with desired properties. However, most of the existing models focus only on molecular distribution learning and target-based molecular design, thereby hindering their potentials in real-world applications. In drug discovery, phenotypic molecular design has advantages over target-based molecular design, especially in first-in-class drug discovery. In this work, we propose the first deep graph generative model (FAME) targeting phenotypic molecular design, in particular gene expression-based molecular design. FAME leverages a conditional variational autoencoder framework to learn the conditional distribution generating molecules from gene expression profiles. However, this distribution is difficult to learn due to the complexity of the molecular space and the noisy phenomenon in gene expression data. To tackle these issues, a gene expression denoising (GED) model that employs contrastive objective function is first proposed to reduce noise from gene expression data. FAME is then designed to treat molecules as the sequences of fragments and learn to generate these fragments in autoregressive manner. By leveraging this fragment-based generation strategy and the denoised gene expression profiles, FAME can generate novel molecules with a high validity rate and desired biological activity. The experimental results show that FAME outperforms existing methods including both SMILES-based and graph-based deep generative models for phenotypic molecular design. Furthermore, the effective mechanism for reducing noise in gene expression data proposed in our study can be applied to omics data modeling in general for facilitating phenotypic drug discovery.De novo molecular design is a key challenge in drug discovery due to the complexity of chemical space. With the availability of molecular datasets and advances in machine learning, many deep generative models are proposed for generating novel molecules with desired properties. However, most of the existing models focus only on molecular distribution learning and target-based molecular design, thereby hindering their potentials in real-world applications. In drug discovery, phenotypic molecular design has advantages over target-based molecular design, especially in first-in-class drug discovery. In this work, we propose the first deep graph generative model (FAME) targeting phenotypic molecular design, in particular gene expression-based molecular design. FAME leverages a conditional variational autoencoder framework to learn the conditional distribution generating molecules from gene expression profiles. However, this distribution is difficult to learn due to the complexity of the molecular space and the noisy phenomenon in gene expression data. To tackle these issues, a gene expression denoising (GED) model that employs contrastive objective function is first proposed to reduce noise from gene expression data. FAME is then designed to treat molecules as the sequences of fragments and learn to generate these fragments in autoregressive manner. By leveraging this fragment-based generation strategy and the denoised gene expression profiles, FAME can generate novel molecules with a high validity rate and desired biological activity. The experimental results show that FAME outperforms existing methods including both SMILES-based and graph-based deep generative models for phenotypic molecular design. Furthermore, the effective mechanism for reducing noise in gene expression data proposed in our study can be applied to omics data modeling in general for facilitating phenotypic drug discovery.
Author Xie, Lei
Zhang, Ping
Pham, Thai-Hoang
Author_xml – sequence: 1
  givenname: Thai-Hoang
  surname: Pham
  fullname: Pham, Thai-Hoang
  organization: Department of Computer Science and Engineering, The Ohio State University, Columbus, USA
– sequence: 2
  givenname: Lei
  surname: Xie
  fullname: Xie, Lei
  organization: Department of Computer Science, Hunter College, The City University of New York, New York City, USA; Neuroscience, Weill Cornell Medicine, New York City, USA
– sequence: 3
  givenname: Ping
  surname: Zhang
  fullname: Zhang, Ping
  organization: Department of Biomedical Informatics and Department of Computer Science and Engineering, The Ohio State University, Columbus, USA
BackLink https://www.ncbi.nlm.nih.gov/pubmed/35509686$$D View this record in MEDLINE/PubMed
BookMark eNo1UEtPwzAYy2GIjbE_wAHlyKUjjzUPbtNeIG0CCThXSfp1FLVJSVuk_XuKGCdblmXZvkIjHzwgdEPJnFIu7-lcS0UFpVpKKtlc0RGaMCpkQihhYzRr209CBp7KBSOXaMzTlGihxAS9bpeHzQPeRnOswXeJNS3keBV8XnZl8KbCh1CB6ysT8Q48RPMr4yJE_PIBPnSnpnR4HfsjXpetC98QT9foojBVC7MzTtH7dvO2ekz2z7un1XKfNJySLhGKM0eA60JycIUD61SqHeHWGcILK1JptS2sllo4upBMEceVzo1aCGNywqbo7i-3ieGrh7bL6qECVJXxEPo2Y0IMm2kq1WC9PVt7W0OeNbGsTTxl_0ewH_OOYLE
ContentType Journal Article
DBID NPM
7X8
DOI 10.1137/1.9781611977172.81
DatabaseName PubMed
MEDLINE - Academic
DatabaseTitle PubMed
MEDLINE - Academic
DatabaseTitleList PubMed
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Computer Science
ExternalDocumentID 35509686
Genre Journal Article
GrantInformation_xml – fundername: NLM NIH HHS
  grantid: R01 LM013771
– fundername: NIGMS NIH HHS
  grantid: R01 GM141279
GroupedDBID 7WY
7X2
7XC
88I
8CJ
8FE
8FG
8FH
8FL
8G5
ABJCF
ABUWG
ACGOD
ACIWK
ACPRK
ADBBV
AFKRA
AFRAH
ALMA_UNASSIGNED_HOLDINGS
ARAPS
ATCPS
AZQEC
BBNVY
BENPR
BEZIV
BGLVJ
BHPHI
BPHCQ
CCPQU
CZ9
D1I
D1J
D1K
DWQXO
FRNLG
GNUQQ
GUQSH
HCIFZ
K6-
K60
K6V
K6~
K7-
KB.
KC.
L6V
LK5
LK8
M0C
M0K
M1Q
M2O
M2P
M7P
M7R
M7S
NPM
P62
PATMY
PDBOC
PHGZT
PQBIZ
PQBZA
PQQKQ
PROAC
PTHSS
PYCSY
7X8
ID FETCH-LOGICAL-p310t-6832c0e39f73ecfcebc859c03bca03fb657b9bfb9796c147280c389da846aad02
IEDL.DBID 7X8
ISICitedReferencesCount 9
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001281343300080&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2167-0102
IngestDate Fri Jul 11 05:47:45 EDT 2025
Thu Apr 03 07:06:51 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Keywords conditional generation
contrastive learning
fragment
variational autoencoder
gene expression
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-p310t-6832c0e39f73ecfcebc859c03bca03fb657b9bfb9796c147280c389da846aad02
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
OpenAccessLink https://www.ncbi.nlm.nih.gov/pmc/articles/9061137
PMID 35509686
PQID 2660101578
PQPubID 23479
ParticipantIDs proquest_miscellaneous_2660101578
pubmed_primary_35509686
PublicationCentury 2000
PublicationDate 2022-00-00
20220101
PublicationDateYYYYMMDD 2022-01-01
PublicationDate_xml – year: 2022
  text: 2022-00-00
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Proceedings of the ... SIAM International Conference on Data Mining
PublicationTitleAlternate Proc SIAM Int Conf Data Min
PublicationYear 2022
SSID ssj0001057420
Score 1.8489869
Snippet molecular design is a key challenge in drug discovery due to the complexity of chemical space. With the availability of molecular datasets and advances in...
De novo molecular design is a key challenge in drug discovery due to the complexity of chemical space. With the availability of molecular datasets and advances...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 720
Title FAME: Fragment-based Conditional Molecular Generation for Phenotypic Drug Discovery
URI https://www.ncbi.nlm.nih.gov/pubmed/35509686
https://www.proquest.com/docview/2660101578
Volume 2022
WOSCitedRecordID wos001281343300080&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3JasMwEBVt00MvTfemGyr0qsSxvEi9lJCFXhICbSG3YMlSmovtxkkhf98ZL-RUKPTig8FghhnN0xvpPUKeYm2F41uPaeUZ5lmouSiwlvlKcCVMbGVlNhFOJmI2k9OKcMurY5X1mlgs1HGqkSPvQCNBOTRIsJfsi6FrFE5XKwuNfdLgAGUwq8OZ2HEsAEa8QpnRRXlvlE-r783wsNNto95TgHM02NO4bdH9HWUW3WbU_O9_npDjCmfSXpkYp2TPJGekWXs40Kqkz8nbqDcePlPArwvkCRl2tZj2U5xkFywhHdcGurSUqMbXFKAunX6aJF1vs6Wmg9VmQQfLXOOB0O0F-RgN3_uvrDJaYBmguzULoKy1Y7i0ITfaaqO08KV2uNKRw60K_FBJZZUMZaC7HjpaaQA6cQTgJYpix70kB0mamGtCfRdC6kknkL70LNQ358oxKILvCx07qkUe67DNIZFxOhElJt3k813gWuSqjP08KxU35gCKYKslgps_fH1Ljly8olDQJHekYaGMzT051N_rZb56KDIEnpPp-Ad3WMRB
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=FAME%3A+Fragment-based+Conditional+Molecular+Generation+for+Phenotypic+Drug+Discovery&rft.jtitle=Proceedings+of+the+...+SIAM+International+Conference+on+Data+Mining&rft.au=Pham%2C+Thai-Hoang&rft.au=Xie%2C+Lei&rft.au=Zhang%2C+Ping&rft.date=2022-01-01&rft.issn=2167-0102&rft.volume=2022&rft.spage=720&rft_id=info:doi/10.1137%2F1.9781611977172.81&rft_id=info%3Apmid%2F35509686&rft_id=info%3Apmid%2F35509686&rft.externalDocID=35509686
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2167-0102&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2167-0102&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2167-0102&client=summon