BigMPI4py: Python Module for Parallelization of Big Data Objects Discloses Germ Layer Specific DNA Demethylation Motifs

Parallelization in Python integrates Message Passing Interface via the mpi4py module. Since mpi4py does not support parallelization of objects greater than <inline-formula><tex-math notation="LaTeX">2^{31}</tex-math> <mml:math><mml:msup><mml:mn>2</mml...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE/ACM transactions on computational biology and bioinformatics Ročník 19; číslo 3; s. 1507 - 1522
Hlavní autoři: Ascension, Alex M., Arauzo-Bravo, Marcos J.
Médium: Journal Article
Jazyk:angličtina
Vydáno: United States IEEE 01.05.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:1545-5963, 1557-9964, 1557-9964
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Parallelization in Python integrates Message Passing Interface via the mpi4py module. Since mpi4py does not support parallelization of objects greater than <inline-formula><tex-math notation="LaTeX">2^{31}</tex-math> <mml:math><mml:msup><mml:mn>2</mml:mn><mml:mn>31</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="ascension-ieq1-3043979.gif"/> </inline-formula> bytes, we developed BigMPI4py, a Python module that wraps mpi4py, supporting object sizes beyond this boundary. BigMPI4py automatically determines the optimal object distribution strategy, and uses vectorized methods, achieving higher parallelization efficiency. BigMPI4py facilitates the implementation of Python for Big Data applications in multicore workstations and High Performance Computer systems. We use BigMPI4py to speed-up the search for germ line specific de novo DNA methylated/unmethylated motifs from the 59 whole genome bisulfite sequencing DNA methylation samples from 27 human tissues of the ENCODE project. We developed a parallel implementation of the Kruskall-Wallis test to find CpGs with differential methylation across germ layers. The parallel evaluation of the significance of 55 million CpG achieved a 22x speedup with 25 cores allowing us an efficient identification of a set of hypermethylated genes in ectoderm and mesoderm-related tissues, and another set in endoderm-related tissues and finally, the discovery of germ layer specific DNA demethylation motifs. Our results point out that DNA methylation signal provide a higher degree of information for the demethylated state than for the methylated state. BigMPI4py is available at https://https://www.arauzolab.org/tools/bigmpi4py and https://gitlab.com/alexmascension/bigmpi4py and the Jupyter Notebook with WGBS analysis at https://gitlab.com/alexmascension/wgbs-analysis .
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1545-5963
1557-9964
1557-9964
DOI:10.1109/TCBB.2020.3043979