fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets

Tools for estimating population structure from genetic data are now used in a wide variety of applications in population genetics. However, inferring population structure in large modern data sets imposes severe computational challenges. Here, we develop efficient algorithms for approximate inferenc...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Genetics (Austin) Ročník 197; číslo 2; s. 573 - 589
Hlavní autoři: Raj, Anil, Stephens, Matthew, Pritchard, Jonathan K
Médium: Journal Article
Jazyk:angličtina
Vydáno: United States Genetics Society of America 01.06.2014
Témata:
ISSN:1943-2631, 0016-6731, 1943-2631
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Tools for estimating population structure from genetic data are now used in a wide variety of applications in population genetics. However, inferring population structure in large modern data sets imposes severe computational challenges. Here, we develop efficient algorithms for approximate inference of the model underlying the STRUCTURE program using a variational Bayesian framework. Variational methods pose the problem of computing relevant posterior distributions as an optimization problem, allowing us to build on recent advances in optimization theory to develop fast inference tools. In addition, we propose useful heuristic scores to identify the number of populations represented in a data set and a new hierarchical prior to detect weak population structure in the data. We test the variational algorithms on simulated data and illustrate using genotype data from the CEPH–Human Genome Diversity Panel. The variational algorithms are almost two orders of magnitude faster than STRUCTURE and achieve accuracies comparable to those of ADMIXTURE. Furthermore, our results show that the heuristic scores for choosing model complexity provide a reasonable range of values for the number of populations represented in the data, with minimal bias toward detecting structure when it is very weak. Our algorithm, fastSTRUCTURE, is freely available online at http://pritchardlab.stanford.edu/structure.html.
Bibliografie:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
Supporting information is available online at http://www.genetics.org/lookup/suppl/doi:10.1534/genetics.114.164350/-/DC1.
Available freely online through the author-supported open access option.
ISSN:1943-2631
0016-6731
1943-2631
DOI:10.1534/genetics.114.164350