Improved metagenome binning and assembly using deep variational autoencoders

Despite recent advances in metagenomic binning, reconstruction of microbial species from metagenomics data remains challenging. Here we develop variational autoencoders for metagenomic binning (VAMB), a program that uses deep variational autoencoders to encode sequence coabundance and k -mer distrib...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Nature biotechnology Ročník 39; číslo 5; s. 555 - 560
Hlavní autoři: Nissen, Jakob Nybo, Johansen, Joachim, Allesøe, Rosa Lundbye, Sønderby, Casper Kaae, Armenteros, Jose Juan Almagro, Grønbech, Christopher Heje, Jensen, Lars Juhl, Nielsen, Henrik Bjørn, Petersen, Thomas Nordahl, Winther, Ole, Rasmussen, Simon
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York Nature Publishing Group US 01.05.2021
Nature Publishing Group
Témata:
ISSN:1087-0156, 1546-1696, 1546-1696
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Despite recent advances in metagenomic binning, reconstruction of microbial species from metagenomics data remains challenging. Here we develop variational autoencoders for metagenomic binning (VAMB), a program that uses deep variational autoencoders to encode sequence coabundance and k -mer distribution information before clustering. We show that a variational autoencoder is able to integrate these two distinct data types without any previous knowledge of the datasets. VAMB outperforms existing state-of-the-art binners, reconstructing 29–98% and 45% more near-complete (NC) genomes on simulated and real data, respectively. Furthermore, VAMB is able to separate closely related strains up to 99.5% average nucleotide identity (ANI), and reconstructed 255 and 91 NC Bacteroides vulgatus and Bacteroides dorei sample-specific genomes as two distinct clusters from a dataset of 1,000 human gut microbiome samples. We use 2,606 NC bins from this dataset to show that species of the human gut microbiome have different geographical distribution patterns. VAMB can be run on standard hardware and is freely available at https://github.com/RasmussenLab/vamb . Metagenomics data are resolved into their constituent genomes using a new deep learning method.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1087-0156
1546-1696
1546-1696
DOI:10.1038/s41587-020-00777-4