Geometry-Based Molecular Generation With Deep Constrained Variational Autoencoder

Finding target molecules with specific chemical properties plays a decisive role in drug development. We proposed GEOM-CVAE, a constrained variational autoencoder based on geometric representation for molecular generation with specific properties, which is protein-context-dependent. In terms of mach...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transaction on neural networks and learning systems Vol. 35; no. 4; pp. 4852 - 4861
Main Authors: Li, Chunyan, Yao, Junfeng, Wei, Wei, Niu, Zhangming, Zeng, Xiangxiang, Li, Jin, Wang, Jianmin
Format: Journal Article
Language:English
Published: United States IEEE 01.04.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:2162-237X, 2162-2388, 2162-2388
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Finding target molecules with specific chemical properties plays a decisive role in drug development. We proposed GEOM-CVAE, a constrained variational autoencoder based on geometric representation for molecular generation with specific properties, which is protein-context-dependent. In terms of machine learning, it includes continuous feature embedding encoder and molecular generation decoder. Our key contribution is to propose an efficient geometric embedding method, including the spatial structure representations of drug molecule (converting the 3-D coordinates into image) and the geometric graph representations of protein target (modeling the protein surface as a mesh). The 3-D geometric information is vital to successful molecular generation, which is different from previous molecular generative methods based on 1-D or 2-D. Our model framework generates specific molecules in two phases, by first generating special image with molecular 3-D information to learn latent representations and generating molecules with constrained condition based on geometric graph convolution for specific protein and then inputting the generated structural molecules into a parser network for obtaining Simplified Molecular Input Line Entry System (SMILES) strings. Our model achieves competitive performance that implies its potential effectiveness to enable the exploration of the vast chemical space for drug discovery.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2162-237X
2162-2388
2162-2388
DOI:10.1109/TNNLS.2022.3147790