In silico Platform for Prediction of N-, O- and C-Glycosites in Eukaryotic Protein Sequences

Glycosylation is one of the most abundant and an important post-translational modification of proteins. Glycosylated proteins (glycoproteins) are involved in various cellular biological functions like protein folding, cell-cell interactions, cell recognition and host-pathogen interactions. A large n...

Full description

Saved in:

Bibliographic Details
Published in:	PloS one Vol. 8; no. 6; p. e67008
Main Authors:	Chauhan, Jagat Singh, Rao, Alka, Raghava, Gajendra P. S.
Format:	Journal Article
Language:	English
Published:	United States Public Library of Science 28.06.2013 Public Library of Science (PLoS)
Subjects:	Amino Acid Motifs Amino Acid Sequence Amino acids Animals Bioinformatics Biology Cell interactions Cell recognition Computational Biology Computer Science Datasets Endoplasmic reticulum Enzymes Glycoproteins Glycoproteins - chemistry Glycoproteins - metabolism Glycosylation Host-pathogen interactions Humans Internet Learning algorithms Machine learning Mammals Mathematical models Models, Biological Molecular Sequence Data Open access Pathogens Post-translation Prediction models Prokaryotes Protein folding Protein Processing, Post-Translational Proteins Proteomics Servers Software Support Vector Machine Support vector machines
ISSN:	1932-6203, 1932-6203
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Glycosylation is one of the most abundant and an important post-translational modification of proteins. Glycosylated proteins (glycoproteins) are involved in various cellular biological functions like protein folding, cell-cell interactions, cell recognition and host-pathogen interactions. A large number of eukaryotic glycoproteins also have therapeutic and potential technology applications. Therefore, characterization and analysis of glycosites (glycosylated residues) in these proteins is of great interest to biologists. In order to cater these needs a number of in silico tools have been developed over the years, however, a need to get even better prediction tools remains. Therefore, in this study we have developed a new webserver GlycoEP for more accurate prediction of N-linked, O-linked and C-linked glycosites in eukaryotic glycoproteins using two larger datasets, namely, standard and advanced datasets. In case of standard datasets no two glycosylated proteins are more similar than 40%; advanced datasets are highly non-redundant where no two glycosites' patterns (as defined in methods) have more than 60% similarity. Further, based on our results with several algorihtms developed using different machine-learning techniques, we found Support Vector Machine (SVM) as optimum tool to develop glycosite prediction models. Accordingly, using our more stringent and non-redundant advanced datasets, the SVM based models developed in this study achieved a prediction accuracy of 84.26%, 86.87% and 91.43% with corresponding MCC of 0.54, 0.20 and 0.78, for N-, O- and C-linked glycosites, respectively. The best performing models trained on advanced datasets were then implemented as a user-friendly web server GlycoEP (http://www.imtech.res.in/raghava/glycoep/). Additionally, this server provides prediction models developed on standard datasets and allows users to scan sequons in input protein sequences.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Conceived and designed the experiments: GPSR. Performed the experiments: JSC GPSR. Analyzed the data: JSC AR GPSR. Contributed reagents/materials/analysis tools: GPSR. Wrote the paper: JSC AR GPSR. Competing Interests: The authors have declared that no competing interests exist.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0067008