An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data

The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Genome research Ročník 25; číslo 6; s. 918
Hlavní autoři: Jun, Goo, Wing, Mary Kate, Abecasis, Gonçalo R, Kang, Hyun Min
Médium: Journal Article
Jazyk:angličtina
Vydáno: United States 01.06.2015
Témata:
ISSN:1549-5469, 1549-5469
On-line přístup:Zjistit podrobnosti o přístupu
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants. Our pipeline has already contributed to variant detection and genotyping in several large-scale sequencing projects, including the 1000 Genomes Project and the NHLBI Exome Sequencing Project. We hope it will now prove useful to many medical sequencing studies.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
ObjectType-Technical Report-3
content type line 23
ISSN:1549-5469
1549-5469
DOI:10.1101/gr.176552.114