Identification of mobile genetic elements with geNomad

Identifying and characterizing mobile genetic elements in sequencing data is essential for understanding their diversity, ecology, biotechnological applications and impact on public health. Here we introduce geNomad, a classification and annotation framework that combines information from gene conte...

Full description

Saved in:
Bibliographic Details
Published in:Nature biotechnology Vol. 42; no. 8; pp. 1303 - 1312
Main Authors: Camargo, Antonio Pedro, Roux, Simon, Schulz, Frederik, Babinski, Michal, Xu, Yan, Hu, Bin, Chain, Patrick S. G., Nayfach, Stephen, Kyrpides, Nikos C.
Format: Journal Article
Language:English
Published: New York Nature Publishing Group US 01.08.2024
Nature Publishing Group
Springer Nature
Subjects:
ISSN:1087-0156, 1546-1696, 1546-1696
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Identifying and characterizing mobile genetic elements in sequencing data is essential for understanding their diversity, ecology, biotechnological applications and impact on public health. Here we introduce geNomad, a classification and annotation framework that combines information from gene content and a deep neural network to identify sequences of plasmids and viruses. geNomad uses a dataset of more than 200,000 marker protein profiles to provide functional gene annotation and taxonomic assignment of viral genomes. Using a conditional random field model, geNomad also detects proviruses integrated into host genomes with high precision. In benchmarks, geNomad achieved high classification performance for diverse plasmids and viruses (Matthews correlation coefficient of 77.8% and 95.3%, respectively), substantially outperforming other tools. Leveraging geNomad’s speed and scalability, we processed over 2.7 trillion base pairs of sequencing data, leading to the discovery of millions of viruses and plasmids that are available through the IMG/VR and IMG/PR databases. geNomad is available at https://portal.nersc.gov/genomad . geNomad identifies mobile genetic elements in sequencing data.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
AC02-05CH11231; 89233218CNA000001; AC05-00OR22725; AC05-76RL01830
USDOE Office of Science (SC), Basic Energy Sciences (BES). Scientific User Facilities (SUF)
USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science (BSS)
USDOE National Nuclear Security Administration (NNSA)
ISSN:1087-0156
1546-1696
1546-1696
DOI:10.1038/s41587-023-01953-y