BOA: A partitioned view of genome assembly

De novo genome assembly is a fundamental problem in computational molecular biology that aims to reconstruct an unknown genome sequence from a set of short DNA sequences (or reads) obtained from the genome. The relative ordering of the reads along the target genome is not known a priori, which is on...

Full description

Saved in:
Bibliographic Details
Published in:iScience Vol. 25; no. 11; p. 105273
Main Authors: An, Xiaojing, Ghosh, Priyanka, Keppler, Patrick, Kurt, Sureyya Emre, Krishnamoorthy, Sriram, Sadayappan, Ponnuswamy, Rajam, Aravind Sukumaran, Çatalyürek, Ümit V., Kalyanaraman, Ananth
Format: Journal Article
Language:English
Published: Elsevier Inc 18.11.2022
Elsevier
Subjects:
ISSN:2589-0042, 2589-0042
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:De novo genome assembly is a fundamental problem in computational molecular biology that aims to reconstruct an unknown genome sequence from a set of short DNA sequences (or reads) obtained from the genome. The relative ordering of the reads along the target genome is not known a priori, which is one of the main contributors to the increased complexity of the assembly process. In this article, with the dual objective of improving assembly quality and exposing a high degree of parallelism, we present a partitioning-based approach. Our framework, BOA (bucket-order-assemble), uses a bucketing alongside graph- and hypergraph-based partitioning techniques to produce a partial ordering of the reads. This partial ordering enables us to divide the read set into disjoint blocks that can be independently assembled in parallel using any state-of-the-art serial assembler of choice. Experimental results show that BOA improves both the overall assembly quality and performance. [Display omitted] •A graph/hypergraph partitioning based method to improve assembly quality and runtime•Bucketing and graph/hypergraph partitioning to partition reads into blocks•Each block is then independently assembled using any standalone assembler•Hypergraph variant produces more precise contigs and is faster than state-of-the-art assemblers Genomics; Bioinformatics; High-performance computing in bioinformatics; Algorithms.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
These authors contributed equally
Lead contact
ISSN:2589-0042
2589-0042
DOI:10.1016/j.isci.2022.105273