GenoGAM 2.0: scalable and efficient implementation of genome-wide generalized additive models for gigabase-scale genomes

Background GenoGAM (Genome-wide generalized additive models) is a powerful statistical modeling tool for the analysis of ChIP-Seq data with flexible factorial design experiments. However large runtime and memory requirements of its current implementation prohibit its application to gigabase-scale ge...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	BMC bioinformatics Ročník 19; číslo 1; s. 247 - 9
Hlavní autoři:	Stricker, Georg, Galinier, Mathilde, Gagneur, Julien
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	London BioMed Central 27.06.2018 BioMed Central Ltd Springer Nature B.V BMC
Témata:	Algorithms Applied mathematics Binomial distribution Bioinformatics Biomedical and Life Sciences ChIP-Seq Cholesky factorization Chromatin Computational Biology/Bioinformatics Computer Appl. in Life Sciences Computer memory Data processing Factorial design Generalized additive models Genome-wide analysis Genomes Genomic libraries Genomics Genomics - methods Humans Life Sciences Mathematical models Microarrays Parameters Proteins Results and data Scale (ratio) Software Sparse inverse subset algorithm Sparsity Statistical analysis Statistical models Transcription factors Yeast Transcription factors ChIP-Seq Sparse inverse subset algorithm Generalized additive models Genome-wide analysis
ISSN:	1471-2105, 1471-2105
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Background GenoGAM (Genome-wide generalized additive models) is a powerful statistical modeling tool for the analysis of ChIP-Seq data with flexible factorial design experiments. However large runtime and memory requirements of its current implementation prohibit its application to gigabase-scale genomes such as mammalian genomes. Results Here we present GenoGAM 2.0, a scalable and efficient implementation that is 2 to 3 orders of magnitude faster than the previous version. This is achieved by exploiting the sparsity of the model using the SuperLU direct solver for parameter fitting, and sparse Cholesky factorization together with the sparse inverse subset algorithm for computing standard errors. Furthermore the HDF5 library is employed to store data efficiently on hard drive, reducing memory footprint while keeping I/O low. Whole-genome fits for human ChIP-seq datasets (ca. 300 million parameters) could be obtained in less than 9 hours on a standard 60-core server. GenoGAM 2.0 is implemented as an open source R package and currently available on GitHub. A Bioconductor release of the new version is in preparation. Conclusions We have vastly improved the performance of the GenoGAM framework, opening up its application to all types of organisms. Moreover, our algorithmic improvements for fitting large GAMs could be of interest to the statistical community beyond the genomics field.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1471-2105 1471-2105
DOI:	10.1186/s12859-018-2238-7