CheckV assesses the quality and completeness of metagenome-assembled viral genomes

Millions of new viral sequences have been identified from metagenomes, but the quality and completeness of these sequences vary considerably. Here we present CheckV, an automated pipeline for identifying closed viral genomes, estimating the completeness of genome fragments and removing flanking host...

Full description

Saved in:
Bibliographic Details
Published in:Nature biotechnology Vol. 39; no. 5; pp. 578 - 585
Main Authors: Nayfach, Stephen, Camargo, Antonio Pedro, Schulz, Frederik, Eloe-Fadrosh, Emiley, Roux, Simon, Kyrpides, Nikos C.
Format: Journal Article
Language:English
Published: New York Nature Publishing Group US 01.05.2021
Nature Publishing Group
Springer Nature
Subjects:
ISSN:1087-0156, 1546-1696, 1546-1696
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Millions of new viral sequences have been identified from metagenomes, but the quality and completeness of these sequences vary considerably. Here we present CheckV, an automated pipeline for identifying closed viral genomes, estimating the completeness of genome fragments and removing flanking host regions from integrated proviruses. CheckV estimates completeness by comparing sequences with a large database of complete viral genomes, including 76,262 identified from a systematic search of publicly available metagenomes, metatranscriptomes and metaviromes. After validation on mock datasets and comparison to existing methods, we applied CheckV to large and diverse collections of metagenome-assembled viral sequences, including IMG/VR and the Global Ocean Virome. This revealed 44,652 high-quality viral genomes (that is, >90% complete), although the vast majority of sequences were small fragments, which highlights the challenge of assembling viral genomes from short-read metagenomes. Additionally, we found that removal of host contamination substantially improved the accurate identification of auxiliary metabolic genes and interpretation of viral-encoded functions. The quality of viral genomes assembled from metagenome data is assessed by CheckV.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
AC02-05CH11231; 2016/23218-0; 2018/04240-0
São Paulo Research Foundation (FAPESP)
USDOE Office of Science (SC), Biological and Environmental Research (BER)
ISSN:1087-0156
1546-1696
1546-1696
DOI:10.1038/s41587-020-00774-7