Interpretable metric learning in comparative metagenomics: The adaptive Haar-like distance

Random forests have emerged as a promising tool in comparative metagenomics because they can predict environmental characteristics based on microbial composition in datasets where β -diversity metrics fall short of revealing meaningful relationships between samples. Nevertheless, despite this effica...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:PLoS computational biology Ročník 20; číslo 5; s. e1011543
Hlavní autoři: Gorman, Evan D., Lladser, Manuel E.
Médium: Journal Article
Jazyk:angličtina
Vydáno: United States Public Library of Science 01.05.2024
Public Library of Science (PLoS)
Témata:
ISSN:1553-7358, 1553-734X, 1553-7358
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Random forests have emerged as a promising tool in comparative metagenomics because they can predict environmental characteristics based on microbial composition in datasets where β -diversity metrics fall short of revealing meaningful relationships between samples. Nevertheless, despite this efficacy, they lack biological insight in tandem with their predictions, potentially hindering scientific advancement. To overcome this limitation, we leverage a geometric characterization of random forests to introduce a data-driven phylogenetic β -diversity metric, the adaptive Haar-like distance. This new metric assigns a weight to each internal node (i.e., split or bifurcation) of a reference phylogeny, indicating the relative importance of that node in discerning environmental samples based on their microbial composition. Alongside this, a weighted nearest-neighbors classifier, constructed using the adaptive metric, can be used as a proxy for the random forest while maintaining accuracy on par with that of the original forest and another state-of-the-art classifier, CoDaCoRe. As shown in datasets from diverse microbial environments, however, the new metric and classifier significantly enhance the biological interpretability and visualization of high-dimensional metagenomic samples.
Bibliografie:new_version
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1553-7358
1553-734X
1553-7358
DOI:10.1371/journal.pcbi.1011543