Phylogenetic mixtures and linear invariants for equal input models

The reconstruction of phylogenetic trees from molecular sequence data relies on modelling site substitutions by a Markov process, or a mixture of such processes. In general, allowing mixed processes can result in different tree topologies becoming indistinguishable from the data, even for infinitely...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of mathematical biology Ročník 74; číslo 5; s. 1107 - 1138
Hlavní autoři: Casanellas, Marta, Steel, Mike
Médium: Journal Article Publikace
Jazyk:angličtina
Vydáno: Berlin/Heidelberg Springer Berlin Heidelberg 01.04.2017
Springer Nature B.V
Témata:
ISSN:0303-6812, 1432-1416, 1432-1416
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The reconstruction of phylogenetic trees from molecular sequence data relies on modelling site substitutions by a Markov process, or a mixture of such processes. In general, allowing mixed processes can result in different tree topologies becoming indistinguishable from the data, even for infinitely long sequences. However, when the underlying Markov process supports linear phylogenetic invariants, then provided these are sufficiently informative, the identifiability of the tree topology can be restored. In this paper, we investigate a class of processes that support linear invariants once the stationary distribution is fixed, the ‘equal input model’. This model generalizes the ‘Felsenstein 1981’ model (and thereby the Jukes–Cantor model) from four states to an arbitrary number of states (finite or infinite), and it can also be described by a ‘random cluster’ process. We describe the structure and dimension of the vector spaces of phylogenetic mixtures and of linear invariants for any fixed phylogenetic tree (and for all trees—the so called ‘model invariants’), on any number n of leaves. We also provide a precise description of the space of mixtures and linear invariants for the special case of n = 4 leaves. By combining techniques from discrete random processes and (multi-) linear algebra, our results build on a classic result that was first established by James Lake (Mol Biol Evol 4:167–191, 1987 ).
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0303-6812
1432-1416
1432-1416
DOI:10.1007/s00285-016-1055-8