Predicting potential microbe-disease associations based on heterogeneous graph attention network and deep sparse autoencoder
Identifying potential associations between microbes and diseases is crucial for explaining disease pathogenesis and designing targeted therapeutic strategies. Basic biological experiments for microbe-disease association (MDA) prediction are costly, time-consuming, and labor-intensive, whereas comput...
Saved in:
| Published in: | Engineering applications of artificial intelligence Vol. 147; p. 110301 |
|---|---|
| Main Authors: | , , , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier Ltd
01.05.2025
|
| Subjects: | |
| ISSN: | 0952-1976 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Identifying potential associations between microbes and diseases is crucial for explaining disease pathogenesis and designing targeted therapeutic strategies. Basic biological experiments for microbe-disease association (MDA) prediction are costly, time-consuming, and labor-intensive, whereas computational methods can effectively complement traditional biological experiments. We propose a computational framework called graph attention convolutional deep sparse autoencoder microbe-disease association (GCDSAEMDA) to predict unknown MDAs. First, we calculate the semantic similarity and Gaussian interaction profile (GIP) similarity of diseases, as well as the functional similarity and GIP similarity of microbes, and integrate these similarity matrices to construct a heterogeneous graph. Next, a multi-head dynamic graph attention mechanism is employed to extract low-order features of microbe and disease nodes in the heterogeneous graph, while multiple convolutional neural networks with different kernels aggregate and concatenate these low-order features to form new high-order representations. Third, we apply a cosine distance-based k-means clustering to select reliable negative samples and use a deep sparse autoencoder to extract high-order features of microbe-disease pairs. Finally, an ensemble Light Gradient Boosting Machine (LightGBM) algorithm is used to predict potential MDAs. GCDSAEMDA was compared to four state-of-the-art MDA models on the Human Microbe-Disease Association Database (HMDAD) and Disbiome databases and validated through five-fold cross-validation on diseases, microbes, and microbe-disease pairs. Results indicate that GCDSAEMDA outperforms the other four models in MDA prediction. Additionally, case studies demonstrate the robust predictive capability of GCDSAEMDA. The source code and datasets for GCDSAEMDA are available at https://github.com/chenyunmolu/GCDSAEMDA.
[Display omitted]
•Low-level feature extraction based on multi head dynamic graph attention mechanism.•Negative sample selection in k-means clustering based on cosine distance.•High-level feature extraction based on deep sparse autoencoder. |
|---|---|
| ISSN: | 0952-1976 |
| DOI: | 10.1016/j.engappai.2025.110301 |