A Deep Learning Model to Predict the ncRNA-Protein Interactions Based on Sequences Information Only
Noncoding RNAs (ncRNAs) play significant roles in multiple fundamental biological processes, in particular, ncRNAs interactions provide valuable insights into protein synthesis, controlling gene expression, RNA processing, regulation of localization, etc. The dysregulation of ncRNA interaction may c...
Gespeichert in:
| Veröffentlicht in: | Bioinformatics and biology insights Jg. 19; S. 11779322251391075 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
United States
SAGE Publishing
01.01.2025
|
| Schlagworte: | |
| ISSN: | 1177-9322, 1177-9322 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | Noncoding RNAs (ncRNAs) play significant roles in multiple fundamental biological processes, in particular, ncRNAs interactions provide valuable insights into protein synthesis, controlling gene expression, RNA processing, regulation of localization, etc. The dysregulation of ncRNA interaction may cause severe diseases including cancer. Therefore, developing computational methods for investigating ncRNA-protein interaction has become a problem of interest for researchers. In this study, we proposed a novel deep learning (DL) model named RPI-SDA-XGBoost for predicting the interaction between ncRNA and proteins. We utilized the 3-mer conjoint triad feature (CTF) to encode the protein sequence, and the 4-mer frequency to encode the RNA sequence, resulting in the extraction of a total of 599-dimensional vector features. The DL approach is developed based on stack denoising autoencoder (SDA) to discover high-level hidden characteristics from 2 separate networks representing proteins and ncRNAs. Composition of features were fed into XGBoost based meta-learner for the final prediction. Proposed model, RPI-SDA-XGBoost, outperformed most of the individual baseline models and significantly improved the performance on multiple benchmark data sets. We validate the generalization power of the proposed model on five benchmark data sets, namely, RPI_ 369, RP_I488, RPI_1807, RPI_ 2241, and NPInterv2.0. RPI-SDA-XGBoost achieved similar levels of state-of-the-art accuracy on data sets RPI_488, RPI_1807, and RPI_NPInter v2.0. Proposed model achieved the best precision of 87.9% and 94.6% in the largest two data sets RPI_ 2241, and RPI_NPInter v2.0, respectively. We believe the proposed model provides useful direction for upcoming biological research and suggesting more sophisticated computational approaches are warranted in near future for ncRNA protein interaction predictions. |
|---|---|
| Bibliographie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 1177-9322 1177-9322 |
| DOI: | 10.1177/11779322251391075 |