BigMPI4py: Python Module for Parallelization of Big Data Objects Discloses Germ Layer Specific DNA Demethylation Motifs

Parallelization in Python integrates Message Passing Interface via the mpi4py module. Since mpi4py does not support parallelization of objects greater than <inline-formula><tex-math notation="LaTeX">2^{31}</tex-math> <mml:math><mml:msup><mml:mn>2</mml...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM transactions on computational biology and bioinformatics Jg. 19; H. 3; S. 1507 - 1522
Hauptverfasser: Ascension, Alex M., Arauzo-Bravo, Marcos J.
Format: Journal Article
Sprache:Englisch
Veröffentlicht: United States IEEE 01.05.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:
ISSN:1545-5963, 1557-9964, 1557-9964
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Parallelization in Python integrates Message Passing Interface via the mpi4py module. Since mpi4py does not support parallelization of objects greater than <inline-formula><tex-math notation="LaTeX">2^{31}</tex-math> <mml:math><mml:msup><mml:mn>2</mml:mn><mml:mn>31</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="ascension-ieq1-3043979.gif"/> </inline-formula> bytes, we developed BigMPI4py, a Python module that wraps mpi4py, supporting object sizes beyond this boundary. BigMPI4py automatically determines the optimal object distribution strategy, and uses vectorized methods, achieving higher parallelization efficiency. BigMPI4py facilitates the implementation of Python for Big Data applications in multicore workstations and High Performance Computer systems. We use BigMPI4py to speed-up the search for germ line specific de novo DNA methylated/unmethylated motifs from the 59 whole genome bisulfite sequencing DNA methylation samples from 27 human tissues of the ENCODE project. We developed a parallel implementation of the Kruskall-Wallis test to find CpGs with differential methylation across germ layers. The parallel evaluation of the significance of 55 million CpG achieved a 22x speedup with 25 cores allowing us an efficient identification of a set of hypermethylated genes in ectoderm and mesoderm-related tissues, and another set in endoderm-related tissues and finally, the discovery of germ layer specific DNA demethylation motifs. Our results point out that DNA methylation signal provide a higher degree of information for the demethylated state than for the methylated state. BigMPI4py is available at https://https://www.arauzolab.org/tools/bigmpi4py and https://gitlab.com/alexmascension/bigmpi4py and the Jupyter Notebook with WGBS analysis at https://gitlab.com/alexmascension/wgbs-analysis .
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1545-5963
1557-9964
1557-9964
DOI:10.1109/TCBB.2020.3043979