Sufficient statistics and expectation maximization algorithms in phylogenetic tree models

Motivation: Measuring evolutionary conservation is a routine step in the identification of functional elements in genome sequences. Although a number of studies have proposed methods that use the continuous time Markov models (CTMMs) to find evolutionarily constrained elements, their probabilistic s...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Bioinformatics (Oxford, England) Ročník 27; číslo 17; s. 2346 - 2353
Hlavný autor: Kiryu, Hisanori
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Oxford Oxford University Press 01.09.2011
Predmet:
ISSN:1367-4803, 1367-4811, 1367-4811
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Motivation: Measuring evolutionary conservation is a routine step in the identification of functional elements in genome sequences. Although a number of studies have proposed methods that use the continuous time Markov models (CTMMs) to find evolutionarily constrained elements, their probabilistic structures have been less frequently investigated. Results: In this article, we investigate a sufficient statistic for CTMMs. The statistic is composed of the fractional duration of nucleotide characters over evolutionary time, Fd, and the number of substitutions occurring in phylogenetic trees, Ns. We first derive basic properties of the sufficient statistic. Then, we derive an expectation maximization (EM) algorithm for estimating the parameters of a phylogenetic model, which iteratively computes the expectation values of the sufficient statistic. We show that the EM algorithm exhibits much faster convergence than other optimization methods that use numerical gradient descent algorithms. Finally, we investigate the genome-wide distribution of fractional duration time Fd which, unlike the number of substitutions Ns, has rarely been investigated. We show that Fd has evolutionary information that is distinct from that in Ns, which may be useful for detecting novel types of evolutionary constraints existing in the human genome. Availability: The C++ source code of the ‘Fdur’ software is available at http://www.ncrna.org/software/fdur/ Contact:  kiryu-h@k.u-tokyo.ac.jp Supplementary information:  Supplementary data are available at Bioinformatics online.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1367-4803
1367-4811
1367-4811
DOI:10.1093/bioinformatics/btr420