Protein structural domain-disease association prediction based on heterogeneous networks
Background Domains can be viewed as portable units of protein structure, folding, function, evolution, and design. Small proteins are often found to be composed of only a single domain, while most large proteins consist of multiple domains for achieving various composite cellular functions. A dysfun...
Saved in:
| Published in: | BMC genomics Vol. 23; no. Suppl 6; pp. 869 - 15 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
London
BioMed Central
10.04.2025
BioMed Central Ltd Springer Nature B.V BMC |
| Subjects: | |
| ISSN: | 1471-2164, 1471-2164 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Background
Domains can be viewed as portable units of protein structure, folding, function, evolution, and design. Small proteins are often found to be composed of only a single domain, while most large proteins consist of multiple domains for achieving various composite cellular functions. A dysfunction in domains may affect the function of proteins in some disease. Inferring the disease-related domains will help our understanding of the mechanism of human complex diseases.
Results
In this study, we firstly build a global heterogeneous information network based on structural-based domains, proteins, and diseases. Then the topological features of the network are extracted according to the meta-paths between domain and disease nodes. Finally, we train a binary classifier based on the XGBOOST (eXtreme Gradient Boosting) algorithm to predict the potential associations between domains and diseases. The results show that the binary classification model using the XGBOOST algorithm performs significantly better than models using other machine learning algorithms, achieving an AUC (Area Under Curve) score of 0.94 in the leave-one-out cross-validation experiment.
Conclusions
We develop a method to build a binary classifier using the topological features based on meta-paths and predict the potential associations between domains and diseases. Based on its predictive performance in independent test sets, the method is proved to be powerful. Moreover, representing domains and diseases through integrating more multi-omic data will further optimize predictive performance. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ISSN: | 1471-2164 1471-2164 |
| DOI: | 10.1186/s12864-024-11117-0 |