Predicting potential microbe-disease associations based on heterogeneous graph attention network and deep sparse autoencoder

Identifying potential associations between microbes and diseases is crucial for explaining disease pathogenesis and designing targeted therapeutic strategies. Basic biological experiments for microbe-disease association (MDA) prediction are costly, time-consuming, and labor-intensive, whereas comput...

Full description

Saved in:
Bibliographic Details
Published in:Engineering applications of artificial intelligence Vol. 147; p. 110301
Main Authors: Wang, Bo, Zhao, Wenlong, Du, Xiaoxin, Zhang, Jianfei, Zhang, Chunyu, Wang, Liping, He, Yang
Format: Journal Article
Language:English
Published: Elsevier Ltd 01.05.2025
Subjects:
ISSN:0952-1976
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Identifying potential associations between microbes and diseases is crucial for explaining disease pathogenesis and designing targeted therapeutic strategies. Basic biological experiments for microbe-disease association (MDA) prediction are costly, time-consuming, and labor-intensive, whereas computational methods can effectively complement traditional biological experiments. We propose a computational framework called graph attention convolutional deep sparse autoencoder microbe-disease association (GCDSAEMDA) to predict unknown MDAs. First, we calculate the semantic similarity and Gaussian interaction profile (GIP) similarity of diseases, as well as the functional similarity and GIP similarity of microbes, and integrate these similarity matrices to construct a heterogeneous graph. Next, a multi-head dynamic graph attention mechanism is employed to extract low-order features of microbe and disease nodes in the heterogeneous graph, while multiple convolutional neural networks with different kernels aggregate and concatenate these low-order features to form new high-order representations. Third, we apply a cosine distance-based k-means clustering to select reliable negative samples and use a deep sparse autoencoder to extract high-order features of microbe-disease pairs. Finally, an ensemble Light Gradient Boosting Machine (LightGBM) algorithm is used to predict potential MDAs. GCDSAEMDA was compared to four state-of-the-art MDA models on the Human Microbe-Disease Association Database (HMDAD) and Disbiome databases and validated through five-fold cross-validation on diseases, microbes, and microbe-disease pairs. Results indicate that GCDSAEMDA outperforms the other four models in MDA prediction. Additionally, case studies demonstrate the robust predictive capability of GCDSAEMDA. The source code and datasets for GCDSAEMDA are available at https://github.com/chenyunmolu/GCDSAEMDA. [Display omitted] •Low-level feature extraction based on multi head dynamic graph attention mechanism.•Negative sample selection in k-means clustering based on cosine distance.•High-level feature extraction based on deep sparse autoencoder.
ISSN:0952-1976
DOI:10.1016/j.engappai.2025.110301