RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification
In order to determine the role of the database in taxonomic sequence classification, we examine the influence of the database over time on k -mer-based lowest common ancestor taxonomic classification. We present three major findings: the number of new species added to the NCBI RefSeq database greatl...
Uloženo v:
| Vydáno v: | Genome Biology Ročník 19; číslo 1; s. 165 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
London
BioMed Central
30.10.2018
Springer Nature B.V BMC |
| Témata: | |
| ISSN: | 1474-760X, 1474-7596, 1474-760X |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | In order to determine the role of the database in taxonomic sequence classification, we examine the influence of the database over time on
k
-mer-based lowest common ancestor taxonomic classification. We present three major findings: the number of new species added to the NCBI RefSeq database greatly outpaces the number of new genera; as a result, more reads are classified with newer database versions, but fewer are classified at the species level; and Bayesian-based re-estimation mitigates this effect but struggles with novel genomes. These results suggest a need for new classification approaches specially adapted for large databases. |
|---|---|
| Bibliografie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Correspondence-1 content type line 23 |
| ISSN: | 1474-760X 1474-7596 1474-760X |
| DOI: | 10.1186/s13059-018-1554-6 |