Identification of protein coding regions in RNA transcripts

Massive parallel sequencing of RNA transcripts by next-generation technology (RNA-Seq) generates critically important data for eukaryotic gene discovery. Gene finding in transcripts can be done by statistical (alignment-free) as well as by alignment-based methods. We describe a new tool, GeneMarkS-T...

Full description

Saved in:

Bibliographic Details
Published in:	Nucleic acids research Vol. 43; no. 12; p. e78
Main Authors:	Tang, Shiyuyun, Lomsadze, Alexandre, Borodovsky, Mark
Format:	Journal Article
Language:	English
Published:	England Oxford University Press 13.07.2015
Subjects:	Algorithms Animals Arabidopsis - genetics Drosophila melanogaster - genetics Gene Expression Profiling Genes High-Throughput Nucleotide Sequencing - methods Methods Online Mice Open Reading Frames Peptide Chain Initiation, Translational RNA, Messenger - chemistry Schizosaccharomyces - genetics Sequence Analysis, RNA - methods Software
ISSN:	0305-1048, 1362-4962, 1362-4962
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Massive parallel sequencing of RNA transcripts by next-generation technology (RNA-Seq) generates critically important data for eukaryotic gene discovery. Gene finding in transcripts can be done by statistical (alignment-free) as well as by alignment-based methods. We describe a new tool, GeneMarkS-T, for ab initio identification of protein-coding regions in RNA transcripts. The algorithm parameters are estimated by unsupervised training which makes unnecessary manually curated preparation of training sets. We demonstrate that (i) the unsupervised training is robust with respect to the presence of transcripts assembly errors and (ii) the accuracy of GeneMarkS-T in identifying protein-coding regions and, particularly, in predicting translation initiation sites in modelled as well as in assembled transcripts compares favourably to other existing methods.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0305-1048 1362-4962 1362-4962
DOI:	10.1093/nar/gkv227