Selection of an Ideal Machine Learning Framework for Predicting Perturbation Effects on Network Topology of Bacterial KEGG Pathways

Biological networks for bacterial species are used to assign functional information to newly sequenced organisms but network quality can be largely affected by poor gene annotations. Current methods of gene annotation use homologous alignment to determine orthology, and have been shown to degrade ne...

Full description

Saved in:
Bibliographic Details
Published in:bioRxiv
Main Authors: Robben, Michael, Mohammad Sadegh Nasr, Das, Avishek, Huber, Manfred, Jaworski, Justyn, Weidanz, Jon, Luber, Jacob M
Format: Paper
Language:English
Published: Cold Spring Harbor Cold Spring Harbor Laboratory Press 22.07.2022
Cold Spring Harbor Laboratory
Edition:1.1
Subjects:
ISSN:2692-8205, 2692-8205
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Biological networks for bacterial species are used to assign functional information to newly sequenced organisms but network quality can be largely affected by poor gene annotations. Current methods of gene annotation use homologous alignment to determine orthology, and have been shown to degrade network accuracy in non-model bacterial species. To address these issues in the KEGG pathway database, we investigated the ability for machine learning (ML) algorithms to re-annotate bacterial genes based on motif or homology information. The majority of the ensemble, clustering, and deep learning algorithms that we explored showed higher prediction accuracy than CD-hit in predicting EC ID, Map ID, and partial Map ID. Motif-based, machine-learning methods of annotation in new species were more accurate, faster, and had higher precisionrecall than methods of homologous alignment or orthologous gen clustering. Gradient boosted ensemble methods and neural networks also predicted higher connectivity of networks, finding twice as many new pathway interactions than blast alignment. The use of motif-based, machine-learning algorithms in annotation software will allow researchers to develop powerful network tools to interact with bacterial microbiomes in ways previously unachievable through homologous sequence alignment. Competing Interest Statement The authors have declared no competing interest. Footnotes * https://github.com/RobbenUTA/Functional-ML
Bibliography:SourceType-Working Papers-1
ObjectType-Working Paper/Pre-Print-1
content type line 50
Competing Interest Statement: The authors have declared no competing interest.
ISSN:2692-8205
2692-8205
DOI:10.1101/2022.07.21.501034