Fast and accurate genotype imputation in genome-wide association studies through pre-phasing

Gonçalo Abecasis, Jonathan Marchini and colleagues report a pre-phasing strategy for genotype imputation in GWAS, which they show maintains accuracy while substantially lowering computational costs. Their approach has been implemented in both MACH and IMPUTE 2.0 software. The 1000 Genomes Project an...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Nature genetics Ročník 44; číslo 8; s. 955 - 959
Hlavní autori: Howie, Bryan, Fuchsberger, Christian, Stephens, Matthew, Marchini, Jonathan, Abecasis, Gonçalo R
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: New York Nature Publishing Group US 01.08.2012
Nature Publishing Group
Predmet:
ISSN:1061-4036, 1546-1718, 1546-1718
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Gonçalo Abecasis, Jonathan Marchini and colleagues report a pre-phasing strategy for genotype imputation in GWAS, which they show maintains accuracy while substantially lowering computational costs. Their approach has been implemented in both MACH and IMPUTE 2.0 software. The 1000 Genomes Project and disease-specific sequencing efforts are producing large collections of haplotypes that can be used as reference panels for genotype imputation in genome-wide association studies (GWAS). However, imputing from large reference panels with existing methods imposes a high computational burden. We introduce a strategy called 'pre-phasing' that maintains the accuracy of leading methods while reducing computational costs. We first statistically estimate the haplotypes for each individual within the GWAS sample (pre-phasing) and then impute missing genotypes into these estimated haplotypes. This reduces the computational cost because (i) the GWAS samples must be phased only once, whereas standard methods would implicitly repeat phasing with each reference panel update, and (ii) it is much faster to match a phased GWAS haplotype to one reference haplotype than to match two unphased GWAS genotypes to a pair of reference haplotypes. We implemented our approach in the MaCH and IMPUTE2 frameworks, and we tested it on data sets from the Wellcome Trust Case Control Consortium 2 (WTCCC2), the Genetic Association Information Network (GAIN), the Women's Health Initiative (WHI) and the 1000 Genomes Project. This strategy will be particularly valuable for repeated imputation as reference panels evolve.
Bibliografia:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
These authors contributed equally to this work
ISSN:1061-4036
1546-1718
1546-1718
DOI:10.1038/ng.2354