Gradients Do Grow on Trees: A Linear-Time O(N)-Dimensional Gradient for Statistical Phylogenetics

Calculation of the log-likelihood stands as the computational bottleneck for many statistical phylogenetic algorithms. Even worse is its gradient evaluation, often used to target regions of high probability. Order O(N)-dimensional gradient calculations based on the standard pruning algorithm require...

Full description

Saved in:

Bibliographic Details
Published in:	Molecular biology and evolution Vol. 37; no. 10; pp. 3047 - 3060
Main Authors:	Ji, Xiang, Zhang, Zhenyu, Holbrook, Andrew, Nishimura, Akihiko, Baele, Guy, Rambaut, Andrew, Lemey, Philippe, Suchard, Marc A
Format:	Journal Article
Language:	English
Published:	United States Oxford University Press 01.10.2020
Subjects:	Algorithms Bayesian analysis Computer applications Dengue fever Evolution, Molecular Flavivirus - genetics Heterogeneity Inference Lassa virus - genetics Markov processes Methods Models, Genetic Next-generation sequencing Phylogenetics Phylogeny Process parameters Sequences Statistical analysis Statistics Vector-borne diseases Viruses West Nile virus linear-time gradient algorithm random-effects molecular clock model Bayesian inference maximum likelihood
ISSN:	0737-4038, 1537-1719, 1537-1719
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Calculation of the log-likelihood stands as the computational bottleneck for many statistical phylogenetic algorithms. Even worse is its gradient evaluation, often used to target regions of high probability. Order O(N)-dimensional gradient calculations based on the standard pruning algorithm require O(N2) operations, where N is the number of sampled molecular sequences. With the advent of high-throughput sequencing, recent phylogenetic studies have analyzed hundreds to thousands of sequences, with an apparent trend toward even larger data sets as a result of advancing technology. Such large-scale analyses challenge phylogenetic reconstruction by requiring inference on larger sets of process parameters to model the increasing data heterogeneity. To make these analyses tractable, we present a linear-time algorithm for O(N)-dimensional gradient evaluation and apply it to general continuous-time Markov processes of sequence substitution on a phylogenetic tree without a need to assume either stationarity or reversibility. We apply this approach to learn the branch-specific evolutionary rates of three pathogenic viruses: West Nile virus, Dengue virus, and Lassa virus. Our proposed algorithm significantly improves inference efficiency with a 126- to 234-fold increase in maximum-likelihood optimization and a 16- to 33-fold computational performance increase in a Bayesian framework.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Undefined-1 ObjectType-Feature-3 content type line 23
ISSN:	0737-4038 1537-1719 1537-1719
DOI:	10.1093/molbev/msaa130