Bit-parallel sequence-to-graph alignment

Graphs are commonly used to represent sets of sequences. Either edges or nodes can be labeled by sequences, so that each path in the graph spells a concatenated sequence. Examples include graphs to represent genome assemblies, such as string graphs and de Bruijn graphs, and graphs to represent a pan...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:bioRxiv
Hlavní autoři: Rautiainen, Mikko, Veli M kinen, Marschall, Tobias
Médium: Paper
Jazyk:angličtina
Vydáno: Cold Spring Harbor Cold Spring Harbor Laboratory Press 15.05.2018
Cold Spring Harbor Laboratory
Vydání:1.1
Témata:
ISSN:2692-8205, 2692-8205
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Graphs are commonly used to represent sets of sequences. Either edges or nodes can be labeled by sequences, so that each path in the graph spells a concatenated sequence. Examples include graphs to represent genome assemblies, such as string graphs and de Bruijn graphs, and graphs to represent a pan-genome and hence the genetic variation present in a population. Being able to align sequencing reads to such graphs is a key step for many analyses and its applications include genome assembly, read error correction, and variant calling with respect to a variation graph. Here, we generalize two linear sequence-to-sequence algorithms to graphs: the Shift-And algorithm for exact matching and Myers' bitvector algorithm for semi-global alignment. These linear algorithms are both based on processing w sequence characters with a constant number of operations, where w is the word size of the machine (commonly 64), and achieve a speedup of w over naive algorithms. Our bitvector-based graph alignment algorithm reaches a worst case runtime of O(V + m/w E log(w)) for acyclic graphs and O(V + mE log(w)) for arbitrary cyclic graphs. We apply it to four different types of graphs and observe a speedup between 3.1-fold and 10.1-fold compared to previous algorithms.
Bibliografie:SourceType-Working Papers-1
ObjectType-Working Paper/Pre-Print-1
content type line 50
ISSN:2692-8205
2692-8205
DOI:10.1101/323063