Bit-parallel sequence-to-graph alignment

Graphs are commonly used to represent sets of sequences. Either edges or nodes can be labeled by sequences, so that each path in the graph spells a concatenated sequence. Examples include graphs to represent genome assemblies, such as string graphs and de Bruijn graphs, and graphs to represent a pan...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:bioRxiv
Hauptverfasser: Rautiainen, Mikko, Veli M kinen, Marschall, Tobias
Format: Paper
Sprache:Englisch
Veröffentlicht: Cold Spring Harbor Cold Spring Harbor Laboratory Press 15.05.2018
Cold Spring Harbor Laboratory
Ausgabe:1.1
Schlagworte:
ISSN:2692-8205, 2692-8205
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Graphs are commonly used to represent sets of sequences. Either edges or nodes can be labeled by sequences, so that each path in the graph spells a concatenated sequence. Examples include graphs to represent genome assemblies, such as string graphs and de Bruijn graphs, and graphs to represent a pan-genome and hence the genetic variation present in a population. Being able to align sequencing reads to such graphs is a key step for many analyses and its applications include genome assembly, read error correction, and variant calling with respect to a variation graph. Here, we generalize two linear sequence-to-sequence algorithms to graphs: the Shift-And algorithm for exact matching and Myers' bitvector algorithm for semi-global alignment. These linear algorithms are both based on processing w sequence characters with a constant number of operations, where w is the word size of the machine (commonly 64), and achieve a speedup of w over naive algorithms. Our bitvector-based graph alignment algorithm reaches a worst case runtime of O(V + m/w E log(w)) for acyclic graphs and O(V + mE log(w)) for arbitrary cyclic graphs. We apply it to four different types of graphs and observe a speedup between 3.1-fold and 10.1-fold compared to previous algorithms.
Bibliographie:SourceType-Working Papers-1
ObjectType-Working Paper/Pre-Print-1
content type line 50
ISSN:2692-8205
2692-8205
DOI:10.1101/323063