SPACE: STRING proteins as complementary embeddings
Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but they have proven challenges being used in machine learning, especially in a cross-species set...
Saved in:
| Published in: | bioRxiv |
|---|---|
| Main Authors: | , , , |
| Format: | Paper |
| Language: | English |
| Published: |
Cold Spring Harbor
Cold Spring Harbor Laboratory Press
26.11.2024
Cold Spring Harbor Laboratory |
| Edition: | 1.1 |
| Subjects: | |
| ISSN: | 2692-8205, 2692-8205 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but they have proven challenges being used in machine learning, especially in a cross-species setting. To address this, we leveraged the STRING database of protein networks and orthology relations for 1,322 eukaryotes to generate network-based cross-species protein embeddings. We did this by first creating species-specific network embeddings and subsequently aligning them based on orthology relations to facilitate direct cross-species comparisons. We show that these aligned network embeddings ensure consistency across species without sacrificing quality compared to species-specific network embeddings. We also show that the aligned network embeddings are complementary to sequence embedding techniques, despite the use of seqeuence-based orthology relations in the alignment process. Finally, we demonstrate the utility and quality of the embeddings by using them for two well-established tasks: subcellular localization prediction and protein function prediction. Training logistic regression classifiers on aligned network embeddings and sequence embeddings improved the accuracy over using sequence alone, reaching performance numbers close to state-of-the-art deep-learning methods. A set of precomputed cross-species network embeddings and ProtT5 embeddings for all eukaryotic proteins have been included in STRING version 12.0.Competing Interest StatementThe authors have declared no competing interest.Footnotes* https://github.com/deweihu96/SPACE |
|---|---|
| AbstractList | Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but the use of protein networks has proven to be challenging in the context of machine learning, especially in a cross-species setting. To address this, we leveraged the STRING database of protein networks and orthology relations for 1,322 eukaryotes to generate network-based cross-species protein embeddings. We did this by first creating species-specific network embeddings and subsequently aligning them based on orthology relations to facilitate direct cross-species comparisons. We show that these aligned network embeddings ensure consistency across species without sacrificing quality compared to species-specific network embeddings. We also show that the aligned network embeddings are complementary to sequence embedding techniques, despite the use of seqeuence-based orthology relations in the alignment process. Finally, we demonstrate the utility and quality of the embeddings by using them for two well-established tasks: subcellular localization prediction and protein function prediction. Training logistic regression classifiers on aligned network embeddings and sequence embeddings improved the accuracy over using sequence alone, reaching performance numbers close to state-of-the-art deep-learning methods. A set of precomputed cross-species network embeddings and ProtT5 embeddings for all eukaryotic proteins have been included in STRING version 12.0. Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but they have proven challenges being used in machine learning, especially in a cross-species setting. To address this, we leveraged the STRING database of protein networks and orthology relations for 1,322 eukaryotes to generate network-based cross-species protein embeddings. We did this by first creating species-specific network embeddings and subsequently aligning them based on orthology relations to facilitate direct cross-species comparisons. We show that these aligned network embeddings ensure consistency across species without sacrificing quality compared to species-specific network embeddings. We also show that the aligned network embeddings are complementary to sequence embedding techniques, despite the use of seqeuence-based orthology relations in the alignment process. Finally, we demonstrate the utility and quality of the embeddings by using them for two well-established tasks: subcellular localization prediction and protein function prediction. Training logistic regression classifiers on aligned network embeddings and sequence embeddings improved the accuracy over using sequence alone, reaching performance numbers close to state-of-the-art deep-learning methods. A set of precomputed cross-species network embeddings and ProtT5 embeddings for all eukaryotic proteins have been included in STRING version 12.0.Competing Interest StatementThe authors have declared no competing interest.Footnotes* https://github.com/deweihu96/SPACE |
| Author | Christian Von Mering Hu, Dewei Szklarczyk, Damian Lars Juhl Jensen |
| Author_xml | – sequence: 1 givenname: Dewei surname: Hu fullname: Hu, Dewei – sequence: 2 givenname: Damian surname: Szklarczyk fullname: Szklarczyk, Damian – sequence: 3 fullname: Christian Von Mering – sequence: 4 fullname: Lars Juhl Jensen |
| BookMark | eNpNj0tLw0AUhQepYK39Ae4Cbtyk3nsnk2TclVBroajYuh4yj0hK8zDTiv57R-rCzbkH7uHwnUs2arvWMXaNMEMEvCOgJLgZiVlKAhM4Y2NKJcU5gRj98xds6v0OAEimyLNkzGjzMi8W99Fm-7p6Wkb90B1c3fqo9JHpmn7vGtceyuE7co121tbtu79i51W59276dyfs7WGxLR7j9fNyVczXsUZIIJaU5RXoqiLpQBsrRG65sZhZKVObGshlUklCZ40hTYaC2PDTsrTCSeITdnvq1XU3fNWfqh_qJqCo37kKUZFQp7khenOKBv6Po_MHteuOQxvoFEfOgcssR_4DFPhUfw |
| ContentType | Paper |
| Copyright | 2024. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. 2024, Posted by Cold Spring Harbor Laboratory |
| Copyright_xml | – notice: 2024. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: 2024, Posted by Cold Spring Harbor Laboratory |
| DBID | 8FE 8FH ABUWG AFKRA AZQEC BBNVY BENPR BHPHI CCPQU DWQXO GNUQQ HCIFZ LK8 M7P PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS FX. |
| DOI | 10.1101/2024.11.25.625140 |
| DatabaseName | ProQuest SciTech Collection ProQuest Natural Science Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials - QC Biological Science Collection ProQuest Central Natural Science Collection ProQuest One ProQuest Central ProQuest Central Student SciTech Premium Collection ProQuest Biological Science Collection Biological Science Database ProQuest Central Premium ProQuest One Academic (New) Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition ProQuest Central China bioRxiv |
| DatabaseTitle | Publicly Available Content Database ProQuest Central Student ProQuest One Academic Middle East (New) ProQuest Biological Science Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Natural Science Collection Biological Science Database ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest One Academic UKI Edition Natural Science Collection ProQuest Central Korea Biological Science Collection ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New) |
| DatabaseTitleList | Publicly Available Content Database |
| Database_xml | – sequence: 1 dbid: PIMPY name: Publicly Available Content Database url: http://search.proquest.com/publiccontent sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Biology |
| EISSN | 2692-8205 |
| Edition | 1.1 |
| ExternalDocumentID | 2024.11.25.625140v1 |
| Genre | Working Paper/Pre-Print |
| GroupedDBID | 8FE 8FH ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BBNVY BENPR BHPHI CCPQU DWQXO GNUQQ HCIFZ LK8 M7P NQS PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PROAC RHI FX. |
| ID | FETCH-LOGICAL-b1040-9278f0bff29e0bcd558d3cd17d996d6c0894f921edcc2b2c22b2dd99b9ad5e923 |
| IEDL.DBID | PIMPY |
| ISSN | 2692-8205 |
| IngestDate | Tue Jan 07 18:49:53 EST 2025 Fri Jul 25 09:17:09 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | false |
| Keywords | Protein embedding Function prediction Networks |
| Language | English |
| License | This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at http://creativecommons.org/licenses/by/4.0 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-b1040-9278f0bff29e0bcd558d3cd17d996d6c0894f921edcc2b2c22b2dd99b9ad5e923 |
| Notes | SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50 Competing Interest Statement: The authors have declared no competing interest. |
| ORCID | 0000-0001-7885-715X 0000-0002-4052-5069 0000-0001-7734-9102 0009-0005-5823-1498 |
| OpenAccessLink | https://www.proquest.com/publiccontent/docview/3133039781?pq-origsite=%requestingapplication% |
| PQID | 3133039781 |
| PQPubID | 2050091 |
| PageCount | 13 |
| ParticipantIDs | biorxiv_primary_2024_11_25_625140 proquest_journals_3133039781 |
| PublicationCentury | 2000 |
| PublicationDate | 20241126 |
| PublicationDateYYYYMMDD | 2024-11-26 |
| PublicationDate_xml | – month: 11 year: 2024 text: 20241126 day: 26 |
| PublicationDecade | 2020 |
| PublicationPlace | Cold Spring Harbor |
| PublicationPlace_xml | – name: Cold Spring Harbor |
| PublicationTitle | bioRxiv |
| PublicationYear | 2024 |
| Publisher | Cold Spring Harbor Laboratory Press Cold Spring Harbor Laboratory |
| Publisher_xml | – name: Cold Spring Harbor Laboratory Press – name: Cold Spring Harbor Laboratory |
| References | Kanehisa, Furumichi, Sato, Ishiguro-Watanabe, Tanabe (2024.11.25.625140v1.41) 2021; 49 Heimann, Shen, Safavi, Koutra (2024.11.25.625140v1.32) 2018 Le (2024.11.25.625140v1.55) 2017 Thumuluri, Armenteros, Johansen, Nielsen, Winther (2024.11.25.625140v1.12) 2022; 50 Dönnes, Höglund (2024.11.25.625140v1.56) 2004; 2 Munro (2024.11.25.625140v1.43) 1998; 8 Saleem, Smith, Aitchison (2024.11.25.625140v1.45) 2006; 1763 Rost, Radivojac, Bromberg (2024.11.25.625140v1.5) 2016; 590 Wang, You, Liu, Xiong, Zhu (2024.11.25.625140v1.11) 2023 Lin, Akin, Rao, Hie, Zhu, Lu, Smetanin, Verkuil, Kabeli, Shmueli (2024.11.25.625140v1.9) 2023; 379 Joulin, Bojanowski, Mikolov, Jégou, Grave (2024.11.25.625140v1.30) 2018 McInnes, Healy, Melville (2024.11.25.625140v1.39) 2018 Ashburner, Ball, Blake, Botstein, Butler, Cherry, Davis, Dolinski, Dwight, Eppig (2024.11.25.625140v1.47) 2000; 25 Dubey, Chouhan (2024.11.25.625140v1.57) 2011; 3 Aleksander, Balhoff, Carbon, Cherry, Drabkin, Ebert, Feuermann, Gaudet, Harris (2024.11.25.625140v1.48) 2023; 224 Rives, Meier, Sercu, Goyal, Lin, Liu, Guo, Ott, Zitnick, Ma (2024.11.25.625140v1.8) 2021; 118 Vendruscolo, Fuxreiter (2024.11.25.625140v1.15) 2022; 13 Rivas, Fontanillo (2024.11.25.625140v1.14) 2010; 6 Szklarczyk, Kirsch, Koutrouli, Nastou, Mehryary, Hachilif, Gable, Fang, Doncheva, Pyysalo (2024.11.25.625140v1.19) 2023; 51 Chu, Xinxin Fan, Zhu, Huang, Bi (2024.11.25.625140v1.33) 2019 Niimura, Nei (2024.11.25.625140v1.40) 2006; 51 Grover, Leskovec (2024.11.25.625140v1.21) 2016 Yao, You, Wang, Xiong, Huang, Zhu (2024.11.25.625140v1.46) 2021; 49 Bonetta, Valentino (2024.11.25.625140v1.2) 2020; 88 Khoshraftar, An (2024.11.25.625140v1.24) 2024; 15 Mikolov, Le, Sutskever (2024.11.25.625140v1.29) 2013 Wang, Zou, Jiang, Ju, Zeng (2024.11.25.625140v1.3) 2014; 9 Heinzinger, Littmann, Sillitoe, Bordin, Orengo, Rost (2024.11.25.625140v1.51) 2022; 4 Perozzi, Al-Rfou, Skiena (2024.11.25.625140v1.20) 2014 Kouba, Kohout, Haddadi, Bushuiev, Samusevich, Sedlar, Damborsky, Pluskal, Sivic, Mazurenko (2024.11.25.625140v1.1) 2023; 13 De Matteis, Luini (2024.11.25.625140v1.44) 2008; 9 You, Yao, Xiong, Huang, Sun, Mamitsuka, Zhu (2024.11.25.625140v1.18) 2019; 47 Mancuso, Johnson, Liu, Krishnan (2024.11.25.625140v1.37) 2024; 20 Kipf, Welling (2024.11.25.625140v1.22) 2016 Vamathevan, Clark, Czodrowski, Dunham, Ferran, Lee, Madabhushi, Shah, Spitzer (2024.11.25.625140v1.4) 2019; 18 Tong (2024.11.25.625140v1.34) 2019 Raffel, Shazeer, Roberts, Lee, Narang, Matena, Zhou, Li, Liu (2024.11.25.625140v1.50) 2020; 21 Villegas-Morcillo, Gomez, Sanchez (2024.11.25.625140v1.53) 2022; 23 Zhou, Cui, Hu, Zhang, Yang, Liu, Wang, Li, Sun (2024.11.25.625140v1.23) 2020; 1 Brandes, Ofer, Peleg, Rappoport, Linial (2024.11.25.625140v1.7) 2022; 38 Martins (2024.11.25.625140v1.27) 2023 Pokharel, Pratyush, Heinzinger, Newman, Kc (2024.11.25.625140v1.13) 2022; 12 Baumgartner, Dell’Aglio, Paulheim, Bernstein (2024.11.25.625140v1.38) 2023; 75 Elnaggar, Heinzinger, Dallago, Rehawi, Wang, Jones, Gibbs, Feher, Angerer, Steinegger (2024.11.25.625140v1.6) 2021; 44 Hasselgren, Oprea (2024.11.25.625140v1.16) 2024; 64 Xia, Liu, Nie, Fu, Wan, Kong (2024.11.25.625140v1.54) 2019; 4 Kalinowski, An (2024.11.25.625140v1.28) 2020 Yuan, Mancuso, Johnson, Braasch, Krishnan (2024.11.25.625140v1.26) 2024 Braulke, Bonifacino (2024.11.25.625140v1.42) 2009; 1793 Bernhofer, Rost (2024.11.25.625140v1.52) 2022; 23 Suzek, Wang, Huang, McGarvey, Wu (2024.11.25.625140v1.10) 2015; 31 Fan, Cannistra, Fried, Lim, Schaffner, Crovella, Hescott, Leiserson (2024.11.25.625140v1.35) 2019; 47 Liu, Hirn, Krishnan (2024.11.25.625140v1.25) 2023; 39 Patra, Moniz, Garg, Gormley, Neubig (2024.11.25.625140v1.31) 2019 Hernández-Plaza, Szklarczyk, Botas, Cantalapiedra, Giner-Lamia, Mende, Kirsch, Rattei, Letunic, Jensen (2024.11.25.625140v1.49) 2023; 51 Pan, Chen, Liu, Niu, Huang, Cai (2024.11.25.625140v1.17) 2021; 19 Li, Dannenfelser, Zhu, Hejduk, Segarra, Yao (2024.11.25.625140v1.36) 2023; 39 |
| References_xml | – volume: 50 start-page: W228 issue: W1 year: 2022 end-page: W234 ident: 2024.11.25.625140v1.12 article-title: Deeploc 2.0: multi-label subcellular localization prediction using protein language models publication-title: Nucleic acids research – volume: 39 start-page: btad047 issue: 1 year: 2023 ident: 2024.11.25.625140v1.25 article-title: Accurately modeling biased random walks on weighted networks using node2vec+ publication-title: Bioinformatics – volume: 20 start-page: e1011773 issue: 1 year: 2024 ident: 2024.11.25.625140v1.37 article-title: Joint rep-resentation of molecular networks from multiple species improves gene classification publication-title: PLOS Computational Biology – volume: 2 start-page: 209 issue: 4 year: 2004 end-page: 215 ident: 2024.11.25.625140v1.56 article-title: Predicting protein subcellular localization: past, present, and future publication-title: Genomics, Proteomics and Bioinformatics – year: 2016 ident: 2024.11.25.625140v1.22 article-title: Semi-supervised classification with graph convolutional networks publication-title: arXiv preprint – start-page: 242 year: 2017 end-page: 247 ident: 2024.11.25.625140v1.55 publication-title: In 2017 4th NAFOSTED Conference on Information and Computer Science – volume: 51 start-page: D638 issue: D1 year: 2023 end-page: D646 ident: 2024.11.25.625140v1.19 article-title: The string database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest publication-title: Nucleic Acids Research – volume: 38 start-page: 2102 issue: 8 year: 2022 end-page: 2110 ident: 2024.11.25.625140v1.7 article-title: Protein-bert: a universal deep-learning model of protein sequence and function publication-title: Bioinformatics – start-page: 273 year: 2019 end-page: 284 ident: 2024.11.25.625140v1.33 article-title: Cross-network embedding for multi-network alignment publication-title: In The world wide web conference – start-page: 855 year: 2016 end-page: 864 ident: 2024.11.25.625140v1.21 article-title: node2vec: Scalable feature learning for networks publication-title: In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining – volume: 23 start-page: 326 issue: 1 year: 2022 ident: 2024.11.25.625140v1.52 article-title: Tmbed: transmembrane proteins predicted through language model embeddings publication-title: BMC bioinformatics – volume: 3 start-page: 392 issue: 6 year: 2011 end-page: 401 ident: 2024.11.25.625140v1.57 article-title: Subcellular localization of proteins publication-title: Archives of Applied Science Research – volume: 118 start-page: e2016239118 issue: 15 year: 2021 ident: 2024.11.25.625140v1.8 article-title: Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences publication-title: Proceedings of the National Academy of Sciences – start-page: 2023 year: 2023 end-page: 06 ident: 2024.11.25.625140v1.27 article-title: Analysis of protein-protein interactions networks and crossspecies transfer learning comparison for seven organisms publication-title: bioRxiv – volume: 4 issue: 2 year: 2022 ident: 2024.11.25.625140v1.51 article-title: Contrastive learning on protein embeddings enlightens midnight zone publication-title: NAR genomics and bioinformatics – year: 2018 ident: 2024.11.25.625140v1.39 article-title: Umap: Uniform manifold approxima-tion and projection for dimension reduction publication-title: arXiv preprint – volume: 4 start-page: 95 issue: 2 year: 2019 end-page: 107 ident: 2024.11.25.625140v1.54 article-title: Random walks: A review of algorithms and applications publication-title: IEEE Transactions on Emerging Topics in Computational Intelligence – volume: 64 year: 2024 ident: 2024.11.25.625140v1.16 article-title: Artificial intelligence for drug discovery: Are we there yet? publication-title: Annual Review of Pharmacology and Toxicology – volume: 23 start-page: bbac142 issue: 3 year: 2022 ident: 2024.11.25.625140v1.53 article-title: An analysis of protein language model embeddings for fold prediction publication-title: Briefings in Bioinformatics – volume: 12 start-page: 16933 issue: 1 year: 2022 ident: 2024.11.25.625140v1.13 article-title: Improving protein succinylation sites prediction using embeddings from protein language model publication-title: Scientific reports – start-page: 479 year: 2019 end-page: 488 ident: 2024.11.25.625140v1.34 article-title: Mrmine: Multi-resolution multi-network embedding publication-title: In Proceedings of the 28th ACM International Conference on Information and Knowledge Management – volume: 15 start-page: 1 issue: 1 year: 2024 end-page: 55 ident: 2024.11.25.625140v1.24 article-title: A survey on graph representation learning methods publication-title: ACM Transactions on Intelligent Systems and Technology – volume: 21 start-page: 1 issue: 140 year: 2020 end-page: 67 ident: 2024.11.25.625140v1.50 article-title: Exploring the limits of transfer learning with a unified text-to-text transformer publication-title: Journal of Machine Learning Research – volume: 6 start-page: e1000807 issue: 6 year: 2010 ident: 2024.11.25.625140v1.14 article-title: Protein–protein interactions essentials: key concepts to building and analyzing interactome networks publication-title: PLoS computational biology – volume: 51 start-page: D389 issue: D1 year: 2023 end-page: D394 ident: 2024.11.25.625140v1.49 article-title: eggnog 6.0: enabling comparative genomics across 12 535 organisms publication-title: Nucleic Acids Research – volume: 44 start-page: 7112 issue: 10 year: 2021 end-page: 7127 ident: 2024.11.25.625140v1.6 article-title: Prottrans: Toward understanding the language of life through self-supervised learning publication-title: IEEE transactions on pattern analysis and machine intelligence – volume: 25 start-page: 25 issue: 1 year: 2000 end-page: 29 ident: 2024.11.25.625140v1.47 article-title: Gene ontology: tool for the unification of biology publication-title: Nature genetics – volume: 9 start-page: 331 issue: 3 year: 2014 end-page: 342 ident: 2024.11.25.625140v1.3 article-title: Review of protein subcellular localization prediction publication-title: Current Bioinformatics – volume: 47 start-page: W379 issue: W1 year: 2019 end-page: W387 ident: 2024.11.25.625140v1.18 article-title: Netgo: improving large-scale protein function prediction with massive network information publication-title: Nucleic acids research – volume: 19 start-page: 666 issue: 2 year: 2021 end-page: 675 ident: 2024.11.25.625140v1.17 article-title: Identifying protein subcellular locations with embeddings-based node2loc publication-title: IEEE/ACM Transactions on Computational Biology and Bioinformatics – volume: 88 start-page: 397 issue: 3 year: 2020 end-page: 413 ident: 2024.11.25.625140v1.2 article-title: Machine learning techniques for protein function prediction publication-title: Proteins: Structure, Function, and Bioinformatics – year: 2018 ident: 2024.11.25.625140v1.30 article-title: Loss in translation: Learning bilingual word mapping with a retrieval criterion publication-title: arXiv preprint – year: 2019 ident: 2024.11.25.625140v1.31 article-title: Bilingual lexicon induction with semi-supervision in non-isometric embedding spaces publication-title: arXiv preprint – year: 2020 ident: 2024.11.25.625140v1.28 article-title: A survey of embedding space alignment methods for language and knowledge graphs publication-title: arXiv preprint – volume: 9 start-page: 273 issue: 4 year: 2008 end-page: 284 ident: 2024.11.25.625140v1.44 article-title: Exiting the golgi complex publication-title: Nature re-views Molecular cell biology – start-page: 701 year: 2014 end-page: 710 ident: 2024.11.25.625140v1.20 article-title: Deepwalk: Online learning of social representations publication-title: In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining – volume: 590 start-page: 2327 issue: 15 year: 2016 end-page: 2341 ident: 2024.11.25.625140v1.5 article-title: Protein function in precision medicine: deep understanding with machine learning publication-title: FEBS letters – volume: 8 start-page: 11 issue: 1 year: 1998 end-page: 15 ident: 2024.11.25.625140v1.43 article-title: Localization of proteins to the golgi apparatus publication-title: Trends in cell biology – volume: 47 start-page: e51 issue: 9 year: 2019 end-page: e51 ident: 2024.11.25.625140v1.35 article-title: Functional protein representations from biological networks enable diverse cross-species inference publication-title: Nucleic acids research – volume: 13 start-page: 13863 issue: 21 year: 2023 end-page: 13895 ident: 2024.11.25.625140v1.1 article-title: Machine learning-guided protein engineering publication-title: ACS catalysis – volume: 39 start-page: btad529 issue: 9 year: 2023 ident: 2024.11.25.625140v1.36 article-title: Joint embedding of biological networks for cross-species functional alignment publication-title: Bioinformatics – volume: 1 start-page: 57 year: 2020 end-page: 81 ident: 2024.11.25.625140v1.23 article-title: Graph neural networks: A review of methods and applications publication-title: AI open – volume: 1793 start-page: 605 issue: 4 year: 2009 end-page: 614 ident: 2024.11.25.625140v1.42 article-title: Sorting of lysosomal proteins publication-title: Biochimica et Biophysica Acta (BBA)-Molecular Cell Research – volume: 1763 start-page: 1541 issue: 12 year: 2006 end-page: 1551 ident: 2024.11.25.625140v1.45 article-title: Proteomics of the peroxisome publication-title: Biochimica et Biophysica Acta (BBA)-Molecular Cell Research – volume: 49 start-page: W469 issue: W1 year: 2021 end-page: W475 ident: 2024.11.25.625140v1.46 article-title: Netgo 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information publication-title: Nucleic acids research – year: 2023 ident: 2024.11.25.625140v1.11 article-title: Netgo 3.0: Protein language model improves large-scale functional annotations publication-title: Genomics, Proteomics & Bioinformatics – volume: 51 start-page: 505 issue: 6 year: 2006 end-page: 517 ident: 2024.11.25.625140v1.40 article-title: Evolutionary dynamics of olfactory and other chemosensory receptor genes in vertebrates publication-title: Journal of human genetics – volume: 379 start-page: 1123 issue: 6637 year: 2023 end-page: 1130 ident: 2024.11.25.625140v1.9 article-title: Evolutionary-scale prediction of atomic-level protein structure with a language model publication-title: Science – volume: 49 start-page: D545 issue: D1 year: 2021 end-page: D551 ident: 2024.11.25.625140v1.41 article-title: Kegg: integrating viruses and cellular organisms publication-title: Nucleic acids research – volume: 224 start-page: iyad031 issue: 1 year: 2023 ident: 2024.11.25.625140v1.48 article-title: The gene ontology knowledgebase in 2023 publication-title: Genetics – start-page: 117 year: 2018 end-page: 126 ident: 2024.11.25.625140v1.32 article-title: Regal: Representation learning-based graph alignment publication-title: In Proceedings of the 27th ACM international conference on information and knowledge management – volume: 18 start-page: 463 issue: 6 year: 2019 end-page: 477 ident: 2024.11.25.625140v1.4 article-title: Applications of machine learning in drug discovery and development publication-title: Nature reviews Drug discovery – volume: 31 start-page: 926 issue: 6 year: 2015 end-page: 932 ident: 2024.11.25.625140v1.10 article-title: and UniProt Consortium. Uniref clusters: a comprehensive and scalable alternative for improving sequence similarity searches publication-title: Bioinformatics – volume: 13 start-page: 5550 issue: 1 year: 2022 ident: 2024.11.25.625140v1.15 article-title: Protein condensation diseases: therapeutic opportunities publication-title: Nature communications – volume: 75 start-page: 100741 year: 2023 ident: 2024.11.25.625140v1.38 article-title: Towards the web of embeddings: Integrating multiple knowledge graph embedding spaces with fedcoder publication-title: Journal of Web Semantics – year: 2024 ident: 2024.11.25.625140v1.26 article-title: Computational strategies for cross-species knowledge transfer and translational biomedicine publication-title: arXiv preprint – year: 2013 ident: 2024.11.25.625140v1.29 article-title: Exploiting similarities among languages for machine translation publication-title: arXiv preprint |
| SSID | ssj0002961374 |
| Score | 1.7411779 |
| SecondaryResourceType | preprint |
| Snippet | Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source... |
| SourceID | biorxiv proquest |
| SourceType | Open Access Repository Aggregation Database |
| SubjectTerms | Amino acid sequence Bioinformatics Embedding Localization Orthology Predictions Protein sources Proteins Species |
| Title | SPACE: STRING proteins as complementary embeddings |
| URI | https://www.proquest.com/docview/3133039781 https://www.biorxiv.org/content/10.1101/2024.11.25.625140 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1NS8MwGA66KXjyG6dzVPDaLX3btI0X0bGhB0dxE-apNF_Qg9ts53D_3qTN9CB48lIKgRJe0vfzyfMgdM39kGae4q7PAu7qeK3cLGI6kfNA4YwqQYBVYhPRaBRPpzSx16NLC6vc-MTKUddszwa3rZ1wT8y56Zj3fF1aYd_wNd0u3l2jIWVmrVZQYxs1DfEWbqBm8viUvH73XIDq4FURM0NItSMATOygUx9M0wYI9FsXSFcXBZ5ph-yyfF585qtfjrqKPsP9_933gd5vtpDFIdqSsyO0W6tRro8RjJO7_uDGGU8MQsKpGBzyWelkpVMBz2ucebF25BuToppZnaCX4WDSf3CtpILLPIMdpBDFCjOlgErMuCAkFj4XXiR03SNCjmMaKAqeFJwDAw76IfQao5kgUieDp6gxm8_kGXKopwIOoU5HYj8w2qGRxwVlUlISYQ6yha6s9dJFTZyRGgvrkiMFktYWbqH2xmip_XfK9MdG538vX6A980VzMxDCNmosiw95iXb4apmXRQc17wej5Llj0JxJxx6FL1cGvaY |
| linkProvider | ProQuest |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1JS8NAFH5ItejJHXcj6DGavKwjiIhaWtRSaAU9xcwGPdjWxK1_yt_omyTVg-DNg5cQGAhhvjdvn_cB7AsvZKmrhe1xX9hkr7WdRpwcORe1kzItA-QF2UTUbsd3d6wzBR-TuzCmrXKiEwtFLYfC5MiPPAqmHM9MaDodPdmGNcpUVycUGqVYXKnxG4Vs-UnrgvA9QGxc9s6bdsUqYHPXtM8xjGLtcK2RKYcLGQSx9IR0I0muvwyFEzNfM3SVFAI5CqSHpDXOUhkoZgYdkMqf9knYnRpMd1o3nfuvrA4yMo_F6GcMGakadIKqlEqibxINPr0dYnBIYYdrEi513h9m7_3XH6agsG-N-f-2Mwu0I-lIZYswpQZLUC8ZNcfLgN3O2fnlsdXtmS4Pq5hC0R_kVppbRfN82SufjS31yJUs6m4rcPsnv7kKtcFwoNbAYq72BYbkUsWeb_hPI1dIxpViQeQIVOuwV-GTjMrhH4nBkMKmBIOkxHAdtiawJNX5z5NvTDZ-X96F2Wbv5jq5brWvNmHOfN3cdMRwC2rP2Yvahhnx-tzPs51K1Cx4-GsMPwGr6g34 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SPACE%3A+STRING+proteins+as+complementary+embeddings&rft.jtitle=bioRxiv&rft.au=Hu%2C+Dewei&rft.au=Szklarczyk%2C+Damian&rft.au=von+Mering%2C+Christian&rft.au=Jensen%2C+Lars+Juhl&rft.date=2024-11-26&rft.pub=Cold+Spring+Harbor+Laboratory&rft.eissn=2692-8205&rft_id=info:doi/10.1101%2F2024.11.25.625140&rft.externalDocID=2024.11.25.625140v1 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2692-8205&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2692-8205&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2692-8205&client=summon |