Inferring protein from transcript abundances using convolutional neural networks
Background Although transcript abundance is often used as a proxy for protein abundance, it is an unreliable predictor. As proteins execute biological functions and their expression levels influence phenotypic outcomes, we developed a convolutional neural network (CNN) to predict protein abundances...
Saved in:
| Published in: | BioData mining Vol. 18; no. 1; pp. 18 - 15 |
|---|---|
| Main Authors: | , |
| Format: | Journal Article |
| Language: | English |
| Published: |
London
BioMed Central
27.02.2025
BioMed Central Ltd Springer Nature B.V BMC |
| Subjects: | |
| ISSN: | 1756-0381, 1756-0381 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Background
Although transcript abundance is often used as a proxy for protein abundance, it is an unreliable predictor. As proteins execute biological functions and their expression levels influence phenotypic outcomes, we developed a convolutional neural network (CNN) to predict protein abundances from mRNA abundances, protein sequence, and mRNA sequence in
Homo sapiens (H. sapiens)
and the reference plant
Arabidopsis thaliana (A. thaliana)
.
Results
After hyperparameter optimization and initial data exploration, we implemented distinct training modules for value-based and sequence-based data. By analyzing the learned weights, we revealed common and organism-specific sequence features that influence protein-to-mRNA ratios (PTRs), including known and putative sequence motifs. Adding condition-specific protein interaction information identified genes correlated with many PTRs but did not improve predictions, likely due to insufficient data. The integrated model predicted protein abundance on unseen genes with a coefficient of determination (r
2
) of 0.30 in
H. sapiens
and 0.32 in
A. thaliana.
Conclusions
For
H. sapiens,
our model improves prediction performance by nearly 50% compared to previous sequence-based approaches, and for
A. thaliana
it represents the first model of its kind. The model’s learned motifs recapitulate known regulatory elements, supporting its utility in systems-level and hypothesis-driven research approaches related to protein regulation. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ISSN: | 1756-0381 1756-0381 |
| DOI: | 10.1186/s13040-025-00434-z |