GBC: a parallel toolkit based on highly addressable byte-encoding blocks for extremely large-scale genotypes of species

Whole -genome sequencing projects of millions of subjects contain enormous genotypes, entailing a huge memory burden and time for computation. Here, we present GBC, a toolkit for rapidly compressing large-scale genotypes into highly addressable byte-encoding blocks under an optimized parallel framew...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Genome Biology Jg. 24; H. 1; S. 76
Hauptverfasser:	Zhang, Liubin, Yuan, Yangyang, Peng, Wenjie, Tang, Bin, Li, Mulin Jun, Gui, Hongsheng, Wang, Qiang, Li, Miaoxin
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	London BioMed Central 17.04.2023 Springer Nature B.V BMC
Schlagworte:	Algorithms Animal Genetics and Genomics Bioinformatics Biomedical and Life Sciences Byte-encoding genotypes Chromosomes Compression Data Compression - methods Datasets Design Evolutionary Biology genome Genomes Genomics Genomics - methods Genotype Genotype & phenotype Genotype compression Genotype management Genotypes Highly addressable genotype blocks Human Genetics Humans Large-scale genotypes Life Sciences Localization memory Method Microbial Genetics and Genomics Parallelization algorithm Plant Genetics and Genomics Software species Whole genome sequencing Parallelization algorithm Cloud computation Genotype compression Highly addressable genotype blocks Genotype management Byte-encoding genotypes Large-scale genotypes
ISSN:	1474-760X, 1474-7596, 1474-760X
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Whole -genome sequencing projects of millions of subjects contain enormous genotypes, entailing a huge memory burden and time for computation. Here, we present GBC, a toolkit for rapidly compressing large-scale genotypes into highly addressable byte-encoding blocks under an optimized parallel framework. We demonstrate that GBC is up to 1000 times faster than state-of-the-art methods to access and manage compressed large-scale genotypes while maintaining a competitive compression ratio. We also showed that conventional analysis would be substantially sped up if built on GBC to access genotypes of a large population. GBC’s data structure and algorithms are valuable for accelerating large-scale genomic research.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1474-760X 1474-7596 1474-760X
DOI:	10.1186/s13059-023-02906-z