NRC-VABS: Normalized Reparameterized Conditional Variational Autoencoder with applied beam search in latent space for drug molecule design
Designing an optimal and desired drug molecule structure is a challenging problem. Most of the existing solutions/representations reported in the literature for this problem are complex and time consuming. This is due to larger datasets with longer training periods and long learning dependencies. De...
Uloženo v:
| Vydáno v: | Expert systems with applications Ročník 240; s. 122396 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier Ltd
15.04.2024
|
| Témata: | |
| ISSN: | 0957-4174, 1873-6793 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Designing an optimal and desired drug molecule structure is a challenging problem. Most of the existing solutions/representations reported in the literature for this problem are complex and time consuming. This is due to larger datasets with longer training periods and long learning dependencies. Deep learning’s generative model can be used to enable chemical modelling to generate molecules without explicit complex molecular rules. However, Deep Learning models (LSTM based VAE) suffer from posterior collapse, larger vocabulary of datasets and sub-optimal latent space searching mechanisms. Motivated by this, we propose a recently researched idea of Normalized Reparameterized conditional Variational Autoencoder with applied beam search in latent space (NRC-VABS). The resulting model with normalized vocabulary, conditionally augmented dataset and revised/reparameterized loss function addresses posterior collapse and constructs continuous and consistent latent space for exploitation by beam search during generation stages. The conditions/properties of desirable molecules are specified through a condition vector and is used while training as well as during generation of drug molecules. Beam search is coined on improved normalized SMILES representation. The idea entails by creating samples with beam search and filtering them depending on their condition and identifying the optimal molecules with desired properties. Normalization also improves the information and reduces complexity in latent space. To address the diversity of the generated molecules, a tunable parameter (D) is also used. Various performance evaluation metrics, such as validity, uniqueness, novelty, accuracy, and Frechet ChemNet Distance are used to evaluate the NRC-VABS on benchmark data sets such as GDB13, MOSES and subset of 250k ZINC molecules. The performance of the NRC-VABS is compared with state-of-the-art peer techniques. NRC-VABS generates molecules at validity range from 92% to 84%, Accuracy 89% to 97% at varied level of diversities (D = 1, D = 2 and D = 3). An application of the proposal in terms interpolation and controlling other (2 of 3) properties by varying one (1 of 3) property at a time. Generating only target molecules with desired properties and maintaining diversity improves novel molecules while greatly reducing time complexity as only novel and desired molecules can be generated.
•NRC-VABS improved SMILES notation for reducing complexity in the SMILES.•A solution to posterior collapse is proposed that enhance the NRC-VABS performance.•Latent Space is explored by using beam search algorithm. |
|---|---|
| ISSN: | 0957-4174 1873-6793 |
| DOI: | 10.1016/j.eswa.2023.122396 |