Extracting prime protein targets as possible drug candidates: machine learning evaluation

Extracting “high ranking” or “prime protein targets” (PPTs) as potent MRSA drug candidates from a given set of ligands is a key challenge in efficient molecular docking. This study combines protein-versus-ligand matching molecular docking (MD) data extracted from 10 independent molecular docking (MD...

Full description

Saved in:

Bibliographic Details
Published in:	Medical & biological engineering & computing Vol. 61; no. 11; pp. 3035 - 3048
Main Authors:	Chattopadhyay, Subhagata, Do, Nhat Phuong, Flower, Darren R., Chattopadhyay, Amit K.
Format:	Journal Article
Language:	English
Published:	Berlin/Heidelberg Springer Berlin Heidelberg 01.11.2023 Springer Nature B.V
Subjects:	Algorithms Biomedical and Life Sciences Biomedical Engineering and Bioengineering Biomedicine Cluster analysis Clustering Computer Applications Data mining Decoys Drug Design Drug development Drug resistance drugs head Human Physiology Imaging Learning algorithms Ligands Machine Learning methicillin-resistant Staphylococcus aureus Modelling Molecular docking Molecular Docking Simulation Original Original Article Outliers (statistics) prediction Probabilistic models Proteins Radiology tail Vector quantization Forward modeling Reverse modeling Machine learning (ML) DBSCAN Data mining Molecular docking K-means clustering Gaussian mixture model Protein–ligand interaction DUD-E repository Ligands Drug design Protein targets
ISSN:	0140-0118, 1741-0444, 1741-0444
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Extracting “high ranking” or “prime protein targets” (PPTs) as potent MRSA drug candidates from a given set of ligands is a key challenge in efficient molecular docking. This study combines protein-versus-ligand matching molecular docking (MD) data extracted from 10 independent molecular docking (MD) evaluations — ADFR, DOCK, Gemdock, Ledock, Plants, Psovina, Quickvina2, smina, vina, and vinaxb to identify top MRSA drug candidates. Twenty-nine active protein targets (APT) from the enhanced DUD-E repository ( http://DUD-E.decoys.org ) are matched against 1040 ligands using “forward modeling” machine learning for initial “data mining and modeling” (DDM) to extract PPTs and the corresponding high affinity ligands (HALs). K-means clustering (KMC) is then performed on 400 ligands matched against 29 PTs, with each cluster accommodating HALs, and the corresponding PPTs. Performance of KMC is then validated against randomly chosen head, tail, and middle active ligands (ALs). KMC outcomes have been validated against two other clustering methods, namely, Gaussian mixture model (GMM) and density based spatial clustering of applications with noise (DBSCAN). While GMM shows similar results as with KMC, DBSCAN has failed to yield more than one cluster and handle the noise (outliers), thus affirming the choice of KMC or GMM. Databases obtained from ADFR to mine PPTs are then ranked according to the number of the corresponding HAL-PPT combinations (HPC) inside the derived clusters, an approach called “reverse modeling” (RM). From the set of 29 PTs studied, RM predicts high fidelity of 5 PPTs (17%) that bind with 76 out of 400, i.e., 19% ligands leading to a prediction of next-generation MRSA drug candidates: PPT2 (average HPC is 41.1%) is the top choice, followed by PPT14 (average HPC 25.46%), and then PPT15 (average HPC 23.12%). This algorithm can be generically implemented irrespective of pathogenic forms and is particularly effective for sparse data. Graphical Abstract
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	0140-0118 1741-0444 1741-0444
DOI:	10.1007/s11517-023-02893-0