Development and validation of a novel protein-ligand fingerprint to mine chemogenomic space: application to G protein-coupled receptors and their ligands

The present study introduces a novel low-dimensionality fingerprint encoding both ligand and target properties which is suitable to mine protein-ligand chemogenomic space. Whereas ligand properties have been represented by standard descriptors, protein cavities are encoded by a fixed length bit stri...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Journal of chemical information and modeling Ročník 49; číslo 4; s. 1049
Hlavní autori: Weill, Nathanael, Rognan, Didier
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: United States 27.04.2009
Predmet:
ISSN:1549-9596
On-line prístup:Zistit podrobnosti o prístupe
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:The present study introduces a novel low-dimensionality fingerprint encoding both ligand and target properties which is suitable to mine protein-ligand chemogenomic space. Whereas ligand properties have been represented by standard descriptors, protein cavities are encoded by a fixed length bit string describing pharmacophoric properties of a definite number of binding site residues. In order to simplify the cavity fingerprint, the concept was applied here to a unique family of targets (G protein-coupled receptors) with a homogeneous cavity description. Particular attention was given to set up data sets of really diverse protein-ligand pairs covering as exhaustively as possible both ligand and target spaces. Several machine learning classification algorithms were trained on two sets of roughly 200000 receptor-ligand fingerprints with a different definition of inactive decoys. Cross-validated models show excellent precision (>0.9) in distinguishing true from false pairs with a particular preference for support vector machine classifiers. When applied to two external test sets of GPCR ligands, the most predictive models were not those performing the best in the previous cross-validation. The ability to recover true GPCR ligands (ligand prediction mode) or true GPCRs (receptor prediction mode) depends on multiple parameters: the molecular complexity of the ligands, the chemical space from which ligand decoys are selected to generate false protein-ligand pairs, and the target space under consideration. In most cases, predicting ligands is easier than predicting receptors. Although receptor profiling is possible, it probably requires a more detailed description of the ligand-binding site. Noteworthy, protein-ligand fingerprints outperform the corresponding ligand fingerprints in mining the GPCR-ligand space. Since they can be applied to a much larger number of receptors than ligand-based fingerprints, protein-ligand fingerprints represent a novel and promising way to directly screen protein-ligand pairs in chemogenomic applications.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1549-9596
DOI:10.1021/ci800447g