Empathi: embedding-based phage protein annotation tool by hierarchical assignment
Bacteriophages, viruses infecting bacteria, are estimated to outnumber their cellular hosts by 10-fold, acting as key players in all microbial ecosystems. Under evolutionary pressure by their host, they evolve rapidly and encode a large diversity of protein sequences. Consequently, the majority of f...
Saved in:
| Published in: | Nature communications Vol. 16; no. 1; pp. 9114 - 9 |
|---|---|
| Main Authors: | , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
London
Nature Publishing Group UK
14.10.2025
Nature Publishing Group Nature Portfolio |
| Subjects: | |
| ISSN: | 2041-1723, 2041-1723 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Bacteriophages, viruses infecting bacteria, are estimated to outnumber their cellular hosts by 10-fold, acting as key players in all microbial ecosystems. Under evolutionary pressure by their host, they evolve rapidly and encode a large diversity of protein sequences. Consequently, the majority of functions carried by phage proteins remain elusive. Current tools to comprehensively identify phage protein functions from their sequence either lack sensitivity (those relying on homology for instance) or specificity (assigning a single coarse grain function to a protein). Here, we introduce Empathi, a protein-embedding-based classifier that assigns functions in a hierarchical manner. New categories were specifically elaborated for phage protein functions and organized such that molecular-level functions are respected in each category, making them well suited for training machine learning classifiers based on protein embeddings. Empathi outperforms homology-based methods on a dataset of cultured phage genomes, tripling the number of annotated homologous groups. On the EnVhogDB database, the most recent and extensive database of metagenomically-sourced phage proteins, Empathi doubled the annotated fraction of protein families from 16% to 33%. Having a more global view of the repertoire of functions a phage possesses will assuredly help to understand them and their interactions with bacteria better.
Bacteriophages (the viruses that infect bacteria) play key roles in microbial communities, but the functions of most of their genes remain unknown. Here, Boulay et al. present a machine-learning classifier that uses protein language models to assign functions to bacteriophage proteins more accurately than existing approaches. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ISSN: | 2041-1723 2041-1723 |
| DOI: | 10.1038/s41467-025-64177-5 |