Sufficient statistics and expectation maximization algorithms in phylogenetic tree models
Motivation: Measuring evolutionary conservation is a routine step in the identification of functional elements in genome sequences. Although a number of studies have proposed methods that use the continuous time Markov models (CTMMs) to find evolutionarily constrained elements, their probabilistic s...
Uložené v:
| Vydané v: | Bioinformatics (Oxford, England) Ročník 27; číslo 17; s. 2346 - 2353 |
|---|---|
| Hlavný autor: | |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Oxford
Oxford University Press
01.09.2011
|
| Predmet: | |
| ISSN: | 1367-4803, 1367-4811, 1367-4811 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | Motivation: Measuring evolutionary conservation is a routine step in the identification of functional elements in genome sequences. Although a number of studies have proposed methods that use the continuous time Markov models (CTMMs) to find evolutionarily constrained elements, their probabilistic structures have been less frequently investigated.
Results: In this article, we investigate a sufficient statistic for CTMMs. The statistic is composed of the fractional duration of nucleotide characters over evolutionary time, Fd, and the number of substitutions occurring in phylogenetic trees, Ns. We first derive basic properties of the sufficient statistic. Then, we derive an expectation maximization (EM) algorithm for estimating the parameters of a phylogenetic model, which iteratively computes the expectation values of the sufficient statistic. We show that the EM algorithm exhibits much faster convergence than other optimization methods that use numerical gradient descent algorithms. Finally, we investigate the genome-wide distribution of fractional duration time Fd which, unlike the number of substitutions Ns, has rarely been investigated. We show that Fd has evolutionary information that is distinct from that in Ns, which may be useful for detecting novel types of evolutionary constraints existing in the human genome.
Availability: The C++ source code of the ‘Fdur’ software is available at http://www.ncrna.org/software/fdur/
Contact: kiryu-h@k.u-tokyo.ac.jp
Supplementary information: Supplementary data are available at Bioinformatics online. |
|---|---|
| Bibliografia: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 1367-4803 1367-4811 1367-4811 |
| DOI: | 10.1093/bioinformatics/btr420 |