RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification

In order to determine the role of the database in taxonomic sequence classification, we examine the influence of the database over time on k -mer-based lowest common ancestor taxonomic classification. We present three major findings: the number of new species added to the NCBI RefSeq database greatl...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Genome Biology Ročník 19; číslo 1; s. 165
Hlavní autoři: Nasko, Daniel J., Koren, Sergey, Phillippy, Adam M., Treangen, Todd J.
Médium: Journal Article
Jazyk:angličtina
Vydáno: London BioMed Central 30.10.2018
Springer Nature B.V
BMC
Témata:
ISSN:1474-760X, 1474-7596, 1474-760X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:In order to determine the role of the database in taxonomic sequence classification, we examine the influence of the database over time on k -mer-based lowest common ancestor taxonomic classification. We present three major findings: the number of new species added to the NCBI RefSeq database greatly outpaces the number of new genera; as a result, more reads are classified with newer database versions, but fewer are classified at the species level; and Bayesian-based re-estimation mitigates this effect but struggles with novel genomes. These results suggest a need for new classification approaches specially adapted for large databases.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ObjectType-Article-2
ObjectType-Correspondence-1
content type line 23
ISSN:1474-760X
1474-7596
1474-760X
DOI:10.1186/s13059-018-1554-6