LoRDEC: accurate and efficient long read error correction

Motivation: PacBio single molecule real-time sequencing is a third-generation sequencing technique producing long reads, with comparatively lower throughput and higher error rate. Errors include numerous indels and complicate downstream analysis like mapping or de novo assembly. A hybrid strategy th...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Bioinformatics (Oxford, England) Ročník 30; číslo 24; s. 3506 - 3514
Hlavní autori: Salmela, Leena, Rivals, Eric
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: England Oxford University Press (OUP) 15.12.2014
Oxford University Press
Predmet:
ISSN:1367-4803, 1367-4811, 1367-4811
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Motivation: PacBio single molecule real-time sequencing is a third-generation sequencing technique producing long reads, with comparatively lower throughput and higher error rate. Errors include numerous indels and complicate downstream analysis like mapping or de novo assembly. A hybrid strategy that takes advantage of the high accuracy of second-generation short reads has been proposed for correcting long reads. Mapping of short reads on long reads provides sufficient coverage to eliminate up to 99% of errors, however, at the expense of prohibitive running times and considerable amounts of disk and memory space. Results : We present LoRDEC, a hybrid error correction method that builds a succinct de Bruijn graph representing the short reads, and seeks a corrective sequence for each erroneous region in the long reads by traversing chosen paths in the graph. In comparison, LoRDEC is at least six times faster and requires at least 93% less memory or disk space than available tools, while achieving comparable accuracy. Availability and implementaion : LoRDEC is written in C++, tested on Linux platforms and freely available at http://atgc.lirmm.fr/lordec . Contact:  lordec@lirmm.fr . Supplementary information:  Supplementary data are available at Bioinformatics online.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Associate Editor: Michael Brudno
ISSN:1367-4803
1367-4811
1367-4811
DOI:10.1093/bioinformatics/btu538