A (fire)cloud-based DNA methylation data preprocessing and quality control platform

Background Bisulfite sequencing allows base-pair resolution profiling of DNA methylation and has recently been adapted for use in single-cells. Analyzing these data, including making comparisons with existing data, remains challenging due to the scale of the data and differences in preprocessing met...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	BMC bioinformatics Ročník 20; číslo 1; s. 160 - 5
Hlavní autori:	Kangeyan, Divy, Dunford, Andrew, Iyer, Sowmya, Stewart, Chip, Hanna, Megan, Getz, Gad, Aryee, Martin J.
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	London BioMed Central 29.03.2019 BioMed Central Ltd Springer Nature B.V BMC
Predmet:	Algorithms Analysis and modelling of complex systems Base sequence Bioinformatics Bioinformatics workflows Biomedical and Life Sciences Bisulfite Cancer Cloud computing Computational biology Computational Biology/Bioinformatics Computer Appl. in Life Sciences Containers CpG islands Criminal investigation Data processing Datasets Deoxyribonucleic acid DNA DNA fingerprinting DNA methylation DNA sequencing Epigenetics Gene expression Genetic research Genomes Genomics Integration Life Sciences Methods Methylation Microarrays Pipelines Pipelining (computers) Preprocessing Quality control Quality control analysis Reproducibility Software Sulfites Visualization (Computer) Workflow Workflow software United States DNA methylation Cloud computing Bioinformatics workflows Quality control analysis
ISSN:	1471-2105, 1471-2105
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Background Bisulfite sequencing allows base-pair resolution profiling of DNA methylation and has recently been adapted for use in single-cells. Analyzing these data, including making comparisons with existing data, remains challenging due to the scale of the data and differences in preprocessing methods between published datasets. Results We present a set of preprocessing pipelines for bisulfite sequencing DNA methylation data that include a new R/Bioconductor package, scmeth , for a series of efficient QC analyses of large datasets. The pipelines go from raw data to CpG-level methylation estimates and can be run, with identical results, either on a single computer, in an HPC cluster or on Google Cloud Compute resources. These pipelines are designed to allow users to 1) ensure reproducibility of analyses, 2) achieve scalability to large whole genome datasets with 100 GB+ of raw data per sample and to single-cell datasets with thousands of cells, 3) enable integration and comparison between user-provided data and publicly available data, as all samples can be processed through the same pipeline, and 4) access to best-practice analysis pipelines. Pipelines are provided for whole genome bisulfite sequencing (WGBS), reduced representation bisulfite sequencing (RRBS) and hybrid selection (capture) bisulfite sequencing (HSBS). Conclusions The workflows produce data quality metrics, visualization tracks, and aggregated output for further downstream analysis. Optional use of cloud computing resources facilitates analysis of large datasets, and integration with existing methylome profiles. The workflow design principles are applicable to other genomic data types.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1471-2105 1471-2105
DOI:	10.1186/s12859-019-2750-4