Interpretable metric learning in comparative metagenomics: The adaptive Haar-like distance

Random forests have emerged as a promising tool in comparative metagenomics because they can predict environmental characteristics based on microbial composition in datasets where β -diversity metrics fall short of revealing meaningful relationships between samples. Nevertheless, despite this effica...

Full description

Saved in:
Bibliographic Details
Published in:PLoS computational biology Vol. 20; no. 5; p. e1011543
Main Authors: Gorman, Evan D., Lladser, Manuel E.
Format: Journal Article
Language:English
Published: United States Public Library of Science 01.05.2024
Public Library of Science (PLoS)
Subjects:
ISSN:1553-7358, 1553-734X, 1553-7358
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Random forests have emerged as a promising tool in comparative metagenomics because they can predict environmental characteristics based on microbial composition in datasets where β -diversity metrics fall short of revealing meaningful relationships between samples. Nevertheless, despite this efficacy, they lack biological insight in tandem with their predictions, potentially hindering scientific advancement. To overcome this limitation, we leverage a geometric characterization of random forests to introduce a data-driven phylogenetic β -diversity metric, the adaptive Haar-like distance. This new metric assigns a weight to each internal node (i.e., split or bifurcation) of a reference phylogeny, indicating the relative importance of that node in discerning environmental samples based on their microbial composition. Alongside this, a weighted nearest-neighbors classifier, constructed using the adaptive metric, can be used as a proxy for the random forest while maintaining accuracy on par with that of the original forest and another state-of-the-art classifier, CoDaCoRe. As shown in datasets from diverse microbial environments, however, the new metric and classifier significantly enhance the biological interpretability and visualization of high-dimensional metagenomic samples.
Bibliography:new_version
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1553-7358
1553-734X
1553-7358
DOI:10.1371/journal.pcbi.1011543