SPACE: STRING proteins as complementary embeddings

Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but they have proven challenges being used in machine learning, especially in a cross-species set...

Full description

Saved in:
Bibliographic Details
Published in:bioRxiv
Main Authors: Hu, Dewei, Szklarczyk, Damian, Christian Von Mering, Lars Juhl Jensen
Format: Paper
Language:English
Published: Cold Spring Harbor Cold Spring Harbor Laboratory Press 26.11.2024
Cold Spring Harbor Laboratory
Edition:1.1
Subjects:
ISSN:2692-8205, 2692-8205
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but they have proven challenges being used in machine learning, especially in a cross-species setting. To address this, we leveraged the STRING database of protein networks and orthology relations for 1,322 eukaryotes to generate network-based cross-species protein embeddings. We did this by first creating species-specific network embeddings and subsequently aligning them based on orthology relations to facilitate direct cross-species comparisons. We show that these aligned network embeddings ensure consistency across species without sacrificing quality compared to species-specific network embeddings. We also show that the aligned network embeddings are complementary to sequence embedding techniques, despite the use of seqeuence-based orthology relations in the alignment process. Finally, we demonstrate the utility and quality of the embeddings by using them for two well-established tasks: subcellular localization prediction and protein function prediction. Training logistic regression classifiers on aligned network embeddings and sequence embeddings improved the accuracy over using sequence alone, reaching performance numbers close to state-of-the-art deep-learning methods. A set of precomputed cross-species network embeddings and ProtT5 embeddings for all eukaryotic proteins have been included in STRING version 12.0.Competing Interest StatementThe authors have declared no competing interest.Footnotes* https://github.com/deweihu96/SPACE
AbstractList Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but the use of protein networks has proven to be challenging in the context of machine learning, especially in a cross-species setting. To address this, we leveraged the STRING database of protein networks and orthology relations for 1,322 eukaryotes to generate network-based cross-species protein embeddings. We did this by first creating species-specific network embeddings and subsequently aligning them based on orthology relations to facilitate direct cross-species comparisons. We show that these aligned network embeddings ensure consistency across species without sacrificing quality compared to species-specific network embeddings. We also show that the aligned network embeddings are complementary to sequence embedding techniques, despite the use of seqeuence-based orthology relations in the alignment process. Finally, we demonstrate the utility and quality of the embeddings by using them for two well-established tasks: subcellular localization prediction and protein function prediction. Training logistic regression classifiers on aligned network embeddings and sequence embeddings improved the accuracy over using sequence alone, reaching performance numbers close to state-of-the-art deep-learning methods. A set of precomputed cross-species network embeddings and ProtT5 embeddings for all eukaryotic proteins have been included in STRING version 12.0.
Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but they have proven challenges being used in machine learning, especially in a cross-species setting. To address this, we leveraged the STRING database of protein networks and orthology relations for 1,322 eukaryotes to generate network-based cross-species protein embeddings. We did this by first creating species-specific network embeddings and subsequently aligning them based on orthology relations to facilitate direct cross-species comparisons. We show that these aligned network embeddings ensure consistency across species without sacrificing quality compared to species-specific network embeddings. We also show that the aligned network embeddings are complementary to sequence embedding techniques, despite the use of seqeuence-based orthology relations in the alignment process. Finally, we demonstrate the utility and quality of the embeddings by using them for two well-established tasks: subcellular localization prediction and protein function prediction. Training logistic regression classifiers on aligned network embeddings and sequence embeddings improved the accuracy over using sequence alone, reaching performance numbers close to state-of-the-art deep-learning methods. A set of precomputed cross-species network embeddings and ProtT5 embeddings for all eukaryotic proteins have been included in STRING version 12.0.Competing Interest StatementThe authors have declared no competing interest.Footnotes* https://github.com/deweihu96/SPACE
Author Christian Von Mering
Hu, Dewei
Szklarczyk, Damian
Lars Juhl Jensen
Author_xml – sequence: 1
  givenname: Dewei
  surname: Hu
  fullname: Hu, Dewei
– sequence: 2
  givenname: Damian
  surname: Szklarczyk
  fullname: Szklarczyk, Damian
– sequence: 3
  fullname: Christian Von Mering
– sequence: 4
  fullname: Lars Juhl Jensen
BookMark eNpNj0tLw0AUhQepYK39Ae4Cbtyk3nsnk2TclVBroajYuh4yj0hK8zDTiv57R-rCzbkH7uHwnUs2arvWMXaNMEMEvCOgJLgZiVlKAhM4Y2NKJcU5gRj98xds6v0OAEimyLNkzGjzMi8W99Fm-7p6Wkb90B1c3fqo9JHpmn7vGtceyuE7co121tbtu79i51W59276dyfs7WGxLR7j9fNyVczXsUZIIJaU5RXoqiLpQBsrRG65sZhZKVObGshlUklCZ40hTYaC2PDTsrTCSeITdnvq1XU3fNWfqh_qJqCo37kKUZFQp7khenOKBv6Po_MHteuOQxvoFEfOgcssR_4DFPhUfw
ContentType Paper
Copyright 2024. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
2024, Posted by Cold Spring Harbor Laboratory
Copyright_xml – notice: 2024. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
– notice: 2024, Posted by Cold Spring Harbor Laboratory
DBID 8FE
8FH
ABUWG
AFKRA
AZQEC
BBNVY
BENPR
BHPHI
CCPQU
DWQXO
GNUQQ
HCIFZ
LK8
M7P
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
FX.
DOI 10.1101/2024.11.25.625140
DatabaseName ProQuest SciTech Collection
ProQuest Natural Science Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Central Essentials - QC
Biological Science Collection
ProQuest Central
Natural Science Collection
ProQuest One
ProQuest Central
ProQuest Central Student
SciTech Premium Collection
ProQuest Biological Science Collection
Biological Science Database
ProQuest Central Premium
ProQuest One Academic (New)
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central China
bioRxiv
DatabaseTitle Publicly Available Content Database
ProQuest Central Student
ProQuest One Academic Middle East (New)
ProQuest Biological Science Collection
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Natural Science Collection
Biological Science Database
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest One Academic UKI Edition
Natural Science Collection
ProQuest Central Korea
Biological Science Collection
ProQuest Central (New)
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList
Publicly Available Content Database
Database_xml – sequence: 1
  dbid: PIMPY
  name: Publicly Available Content Database
  url: http://search.proquest.com/publiccontent
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 2692-8205
Edition 1.1
ExternalDocumentID 2024.11.25.625140v1
Genre Working Paper/Pre-Print
GroupedDBID 8FE
8FH
ABUWG
AFKRA
ALMA_UNASSIGNED_HOLDINGS
AZQEC
BBNVY
BENPR
BHPHI
CCPQU
DWQXO
GNUQQ
HCIFZ
LK8
M7P
NQS
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PROAC
RHI
FX.
ID FETCH-LOGICAL-b1040-9278f0bff29e0bcd558d3cd17d996d6c0894f921edcc2b2c22b2dd99b9ad5e923
IEDL.DBID PIMPY
ISSN 2692-8205
IngestDate Tue Jan 07 18:49:53 EST 2025
Fri Jul 25 09:17:09 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Keywords Protein embedding
Function prediction
Networks
Language English
License This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-b1040-9278f0bff29e0bcd558d3cd17d996d6c0894f921edcc2b2c22b2dd99b9ad5e923
Notes SourceType-Working Papers-1
ObjectType-Working Paper/Pre-Print-1
content type line 50
Competing Interest Statement: The authors have declared no competing interest.
ORCID 0000-0001-7885-715X
0000-0002-4052-5069
0000-0001-7734-9102
0009-0005-5823-1498
OpenAccessLink https://www.proquest.com/publiccontent/docview/3133039781?pq-origsite=%requestingapplication%
PQID 3133039781
PQPubID 2050091
PageCount 13
ParticipantIDs biorxiv_primary_2024_11_25_625140
proquest_journals_3133039781
PublicationCentury 2000
PublicationDate 20241126
PublicationDateYYYYMMDD 2024-11-26
PublicationDate_xml – month: 11
  year: 2024
  text: 20241126
  day: 26
PublicationDecade 2020
PublicationPlace Cold Spring Harbor
PublicationPlace_xml – name: Cold Spring Harbor
PublicationTitle bioRxiv
PublicationYear 2024
Publisher Cold Spring Harbor Laboratory Press
Cold Spring Harbor Laboratory
Publisher_xml – name: Cold Spring Harbor Laboratory Press
– name: Cold Spring Harbor Laboratory
References Kanehisa, Furumichi, Sato, Ishiguro-Watanabe, Tanabe (2024.11.25.625140v1.41) 2021; 49
Heimann, Shen, Safavi, Koutra (2024.11.25.625140v1.32) 2018
Le (2024.11.25.625140v1.55) 2017
Thumuluri, Armenteros, Johansen, Nielsen, Winther (2024.11.25.625140v1.12) 2022; 50
Dönnes, Höglund (2024.11.25.625140v1.56) 2004; 2
Munro (2024.11.25.625140v1.43) 1998; 8
Saleem, Smith, Aitchison (2024.11.25.625140v1.45) 2006; 1763
Rost, Radivojac, Bromberg (2024.11.25.625140v1.5) 2016; 590
Wang, You, Liu, Xiong, Zhu (2024.11.25.625140v1.11) 2023
Lin, Akin, Rao, Hie, Zhu, Lu, Smetanin, Verkuil, Kabeli, Shmueli (2024.11.25.625140v1.9) 2023; 379
Joulin, Bojanowski, Mikolov, Jégou, Grave (2024.11.25.625140v1.30) 2018
McInnes, Healy, Melville (2024.11.25.625140v1.39) 2018
Ashburner, Ball, Blake, Botstein, Butler, Cherry, Davis, Dolinski, Dwight, Eppig (2024.11.25.625140v1.47) 2000; 25
Dubey, Chouhan (2024.11.25.625140v1.57) 2011; 3
Aleksander, Balhoff, Carbon, Cherry, Drabkin, Ebert, Feuermann, Gaudet, Harris (2024.11.25.625140v1.48) 2023; 224
Rives, Meier, Sercu, Goyal, Lin, Liu, Guo, Ott, Zitnick, Ma (2024.11.25.625140v1.8) 2021; 118
Vendruscolo, Fuxreiter (2024.11.25.625140v1.15) 2022; 13
Rivas, Fontanillo (2024.11.25.625140v1.14) 2010; 6
Szklarczyk, Kirsch, Koutrouli, Nastou, Mehryary, Hachilif, Gable, Fang, Doncheva, Pyysalo (2024.11.25.625140v1.19) 2023; 51
Chu, Xinxin Fan, Zhu, Huang, Bi (2024.11.25.625140v1.33) 2019
Niimura, Nei (2024.11.25.625140v1.40) 2006; 51
Grover, Leskovec (2024.11.25.625140v1.21) 2016
Yao, You, Wang, Xiong, Huang, Zhu (2024.11.25.625140v1.46) 2021; 49
Bonetta, Valentino (2024.11.25.625140v1.2) 2020; 88
Khoshraftar, An (2024.11.25.625140v1.24) 2024; 15
Mikolov, Le, Sutskever (2024.11.25.625140v1.29) 2013
Wang, Zou, Jiang, Ju, Zeng (2024.11.25.625140v1.3) 2014; 9
Heinzinger, Littmann, Sillitoe, Bordin, Orengo, Rost (2024.11.25.625140v1.51) 2022; 4
Perozzi, Al-Rfou, Skiena (2024.11.25.625140v1.20) 2014
Kouba, Kohout, Haddadi, Bushuiev, Samusevich, Sedlar, Damborsky, Pluskal, Sivic, Mazurenko (2024.11.25.625140v1.1) 2023; 13
De Matteis, Luini (2024.11.25.625140v1.44) 2008; 9
You, Yao, Xiong, Huang, Sun, Mamitsuka, Zhu (2024.11.25.625140v1.18) 2019; 47
Mancuso, Johnson, Liu, Krishnan (2024.11.25.625140v1.37) 2024; 20
Kipf, Welling (2024.11.25.625140v1.22) 2016
Vamathevan, Clark, Czodrowski, Dunham, Ferran, Lee, Madabhushi, Shah, Spitzer (2024.11.25.625140v1.4) 2019; 18
Tong (2024.11.25.625140v1.34) 2019
Raffel, Shazeer, Roberts, Lee, Narang, Matena, Zhou, Li, Liu (2024.11.25.625140v1.50) 2020; 21
Villegas-Morcillo, Gomez, Sanchez (2024.11.25.625140v1.53) 2022; 23
Zhou, Cui, Hu, Zhang, Yang, Liu, Wang, Li, Sun (2024.11.25.625140v1.23) 2020; 1
Brandes, Ofer, Peleg, Rappoport, Linial (2024.11.25.625140v1.7) 2022; 38
Martins (2024.11.25.625140v1.27) 2023
Pokharel, Pratyush, Heinzinger, Newman, Kc (2024.11.25.625140v1.13) 2022; 12
Baumgartner, Dell’Aglio, Paulheim, Bernstein (2024.11.25.625140v1.38) 2023; 75
Elnaggar, Heinzinger, Dallago, Rehawi, Wang, Jones, Gibbs, Feher, Angerer, Steinegger (2024.11.25.625140v1.6) 2021; 44
Hasselgren, Oprea (2024.11.25.625140v1.16) 2024; 64
Xia, Liu, Nie, Fu, Wan, Kong (2024.11.25.625140v1.54) 2019; 4
Kalinowski, An (2024.11.25.625140v1.28) 2020
Yuan, Mancuso, Johnson, Braasch, Krishnan (2024.11.25.625140v1.26) 2024
Braulke, Bonifacino (2024.11.25.625140v1.42) 2009; 1793
Bernhofer, Rost (2024.11.25.625140v1.52) 2022; 23
Suzek, Wang, Huang, McGarvey, Wu (2024.11.25.625140v1.10) 2015; 31
Fan, Cannistra, Fried, Lim, Schaffner, Crovella, Hescott, Leiserson (2024.11.25.625140v1.35) 2019; 47
Liu, Hirn, Krishnan (2024.11.25.625140v1.25) 2023; 39
Patra, Moniz, Garg, Gormley, Neubig (2024.11.25.625140v1.31) 2019
Hernández-Plaza, Szklarczyk, Botas, Cantalapiedra, Giner-Lamia, Mende, Kirsch, Rattei, Letunic, Jensen (2024.11.25.625140v1.49) 2023; 51
Pan, Chen, Liu, Niu, Huang, Cai (2024.11.25.625140v1.17) 2021; 19
Li, Dannenfelser, Zhu, Hejduk, Segarra, Yao (2024.11.25.625140v1.36) 2023; 39
References_xml – volume: 50
  start-page: W228
  issue: W1
  year: 2022
  end-page: W234
  ident: 2024.11.25.625140v1.12
  article-title: Deeploc 2.0: multi-label subcellular localization prediction using protein language models
  publication-title: Nucleic acids research
– volume: 39
  start-page: btad047
  issue: 1
  year: 2023
  ident: 2024.11.25.625140v1.25
  article-title: Accurately modeling biased random walks on weighted networks using node2vec+
  publication-title: Bioinformatics
– volume: 20
  start-page: e1011773
  issue: 1
  year: 2024
  ident: 2024.11.25.625140v1.37
  article-title: Joint rep-resentation of molecular networks from multiple species improves gene classification
  publication-title: PLOS Computational Biology
– volume: 2
  start-page: 209
  issue: 4
  year: 2004
  end-page: 215
  ident: 2024.11.25.625140v1.56
  article-title: Predicting protein subcellular localization: past, present, and future
  publication-title: Genomics, Proteomics and Bioinformatics
– year: 2016
  ident: 2024.11.25.625140v1.22
  article-title: Semi-supervised classification with graph convolutional networks
  publication-title: arXiv preprint
– start-page: 242
  year: 2017
  end-page: 247
  ident: 2024.11.25.625140v1.55
  publication-title: In 2017 4th NAFOSTED Conference on Information and Computer Science
– volume: 51
  start-page: D638
  issue: D1
  year: 2023
  end-page: D646
  ident: 2024.11.25.625140v1.19
  article-title: The string database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest
  publication-title: Nucleic Acids Research
– volume: 38
  start-page: 2102
  issue: 8
  year: 2022
  end-page: 2110
  ident: 2024.11.25.625140v1.7
  article-title: Protein-bert: a universal deep-learning model of protein sequence and function
  publication-title: Bioinformatics
– start-page: 273
  year: 2019
  end-page: 284
  ident: 2024.11.25.625140v1.33
  article-title: Cross-network embedding for multi-network alignment
  publication-title: In The world wide web conference
– start-page: 855
  year: 2016
  end-page: 864
  ident: 2024.11.25.625140v1.21
  article-title: node2vec: Scalable feature learning for networks
  publication-title: In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining
– volume: 23
  start-page: 326
  issue: 1
  year: 2022
  ident: 2024.11.25.625140v1.52
  article-title: Tmbed: transmembrane proteins predicted through language model embeddings
  publication-title: BMC bioinformatics
– volume: 3
  start-page: 392
  issue: 6
  year: 2011
  end-page: 401
  ident: 2024.11.25.625140v1.57
  article-title: Subcellular localization of proteins
  publication-title: Archives of Applied Science Research
– volume: 118
  start-page: e2016239118
  issue: 15
  year: 2021
  ident: 2024.11.25.625140v1.8
  article-title: Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences
  publication-title: Proceedings of the National Academy of Sciences
– start-page: 2023
  year: 2023
  end-page: 06
  ident: 2024.11.25.625140v1.27
  article-title: Analysis of protein-protein interactions networks and crossspecies transfer learning comparison for seven organisms
  publication-title: bioRxiv
– volume: 4
  issue: 2
  year: 2022
  ident: 2024.11.25.625140v1.51
  article-title: Contrastive learning on protein embeddings enlightens midnight zone
  publication-title: NAR genomics and bioinformatics
– year: 2018
  ident: 2024.11.25.625140v1.39
  article-title: Umap: Uniform manifold approxima-tion and projection for dimension reduction
  publication-title: arXiv preprint
– volume: 4
  start-page: 95
  issue: 2
  year: 2019
  end-page: 107
  ident: 2024.11.25.625140v1.54
  article-title: Random walks: A review of algorithms and applications
  publication-title: IEEE Transactions on Emerging Topics in Computational Intelligence
– volume: 64
  year: 2024
  ident: 2024.11.25.625140v1.16
  article-title: Artificial intelligence for drug discovery: Are we there yet?
  publication-title: Annual Review of Pharmacology and Toxicology
– volume: 23
  start-page: bbac142
  issue: 3
  year: 2022
  ident: 2024.11.25.625140v1.53
  article-title: An analysis of protein language model embeddings for fold prediction
  publication-title: Briefings in Bioinformatics
– volume: 12
  start-page: 16933
  issue: 1
  year: 2022
  ident: 2024.11.25.625140v1.13
  article-title: Improving protein succinylation sites prediction using embeddings from protein language model
  publication-title: Scientific reports
– start-page: 479
  year: 2019
  end-page: 488
  ident: 2024.11.25.625140v1.34
  article-title: Mrmine: Multi-resolution multi-network embedding
  publication-title: In Proceedings of the 28th ACM International Conference on Information and Knowledge Management
– volume: 15
  start-page: 1
  issue: 1
  year: 2024
  end-page: 55
  ident: 2024.11.25.625140v1.24
  article-title: A survey on graph representation learning methods
  publication-title: ACM Transactions on Intelligent Systems and Technology
– volume: 21
  start-page: 1
  issue: 140
  year: 2020
  end-page: 67
  ident: 2024.11.25.625140v1.50
  article-title: Exploring the limits of transfer learning with a unified text-to-text transformer
  publication-title: Journal of Machine Learning Research
– volume: 6
  start-page: e1000807
  issue: 6
  year: 2010
  ident: 2024.11.25.625140v1.14
  article-title: Protein–protein interactions essentials: key concepts to building and analyzing interactome networks
  publication-title: PLoS computational biology
– volume: 51
  start-page: D389
  issue: D1
  year: 2023
  end-page: D394
  ident: 2024.11.25.625140v1.49
  article-title: eggnog 6.0: enabling comparative genomics across 12 535 organisms
  publication-title: Nucleic Acids Research
– volume: 44
  start-page: 7112
  issue: 10
  year: 2021
  end-page: 7127
  ident: 2024.11.25.625140v1.6
  article-title: Prottrans: Toward understanding the language of life through self-supervised learning
  publication-title: IEEE transactions on pattern analysis and machine intelligence
– volume: 25
  start-page: 25
  issue: 1
  year: 2000
  end-page: 29
  ident: 2024.11.25.625140v1.47
  article-title: Gene ontology: tool for the unification of biology
  publication-title: Nature genetics
– volume: 9
  start-page: 331
  issue: 3
  year: 2014
  end-page: 342
  ident: 2024.11.25.625140v1.3
  article-title: Review of protein subcellular localization prediction
  publication-title: Current Bioinformatics
– volume: 47
  start-page: W379
  issue: W1
  year: 2019
  end-page: W387
  ident: 2024.11.25.625140v1.18
  article-title: Netgo: improving large-scale protein function prediction with massive network information
  publication-title: Nucleic acids research
– volume: 19
  start-page: 666
  issue: 2
  year: 2021
  end-page: 675
  ident: 2024.11.25.625140v1.17
  article-title: Identifying protein subcellular locations with embeddings-based node2loc
  publication-title: IEEE/ACM Transactions on Computational Biology and Bioinformatics
– volume: 88
  start-page: 397
  issue: 3
  year: 2020
  end-page: 413
  ident: 2024.11.25.625140v1.2
  article-title: Machine learning techniques for protein function prediction
  publication-title: Proteins: Structure, Function, and Bioinformatics
– year: 2018
  ident: 2024.11.25.625140v1.30
  article-title: Loss in translation: Learning bilingual word mapping with a retrieval criterion
  publication-title: arXiv preprint
– year: 2019
  ident: 2024.11.25.625140v1.31
  article-title: Bilingual lexicon induction with semi-supervision in non-isometric embedding spaces
  publication-title: arXiv preprint
– year: 2020
  ident: 2024.11.25.625140v1.28
  article-title: A survey of embedding space alignment methods for language and knowledge graphs
  publication-title: arXiv preprint
– volume: 9
  start-page: 273
  issue: 4
  year: 2008
  end-page: 284
  ident: 2024.11.25.625140v1.44
  article-title: Exiting the golgi complex
  publication-title: Nature re-views Molecular cell biology
– start-page: 701
  year: 2014
  end-page: 710
  ident: 2024.11.25.625140v1.20
  article-title: Deepwalk: Online learning of social representations
  publication-title: In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
– volume: 590
  start-page: 2327
  issue: 15
  year: 2016
  end-page: 2341
  ident: 2024.11.25.625140v1.5
  article-title: Protein function in precision medicine: deep understanding with machine learning
  publication-title: FEBS letters
– volume: 8
  start-page: 11
  issue: 1
  year: 1998
  end-page: 15
  ident: 2024.11.25.625140v1.43
  article-title: Localization of proteins to the golgi apparatus
  publication-title: Trends in cell biology
– volume: 47
  start-page: e51
  issue: 9
  year: 2019
  end-page: e51
  ident: 2024.11.25.625140v1.35
  article-title: Functional protein representations from biological networks enable diverse cross-species inference
  publication-title: Nucleic acids research
– volume: 13
  start-page: 13863
  issue: 21
  year: 2023
  end-page: 13895
  ident: 2024.11.25.625140v1.1
  article-title: Machine learning-guided protein engineering
  publication-title: ACS catalysis
– volume: 39
  start-page: btad529
  issue: 9
  year: 2023
  ident: 2024.11.25.625140v1.36
  article-title: Joint embedding of biological networks for cross-species functional alignment
  publication-title: Bioinformatics
– volume: 1
  start-page: 57
  year: 2020
  end-page: 81
  ident: 2024.11.25.625140v1.23
  article-title: Graph neural networks: A review of methods and applications
  publication-title: AI open
– volume: 1793
  start-page: 605
  issue: 4
  year: 2009
  end-page: 614
  ident: 2024.11.25.625140v1.42
  article-title: Sorting of lysosomal proteins
  publication-title: Biochimica et Biophysica Acta (BBA)-Molecular Cell Research
– volume: 1763
  start-page: 1541
  issue: 12
  year: 2006
  end-page: 1551
  ident: 2024.11.25.625140v1.45
  article-title: Proteomics of the peroxisome
  publication-title: Biochimica et Biophysica Acta (BBA)-Molecular Cell Research
– volume: 49
  start-page: W469
  issue: W1
  year: 2021
  end-page: W475
  ident: 2024.11.25.625140v1.46
  article-title: Netgo 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information
  publication-title: Nucleic acids research
– year: 2023
  ident: 2024.11.25.625140v1.11
  article-title: Netgo 3.0: Protein language model improves large-scale functional annotations
  publication-title: Genomics, Proteomics & Bioinformatics
– volume: 51
  start-page: 505
  issue: 6
  year: 2006
  end-page: 517
  ident: 2024.11.25.625140v1.40
  article-title: Evolutionary dynamics of olfactory and other chemosensory receptor genes in vertebrates
  publication-title: Journal of human genetics
– volume: 379
  start-page: 1123
  issue: 6637
  year: 2023
  end-page: 1130
  ident: 2024.11.25.625140v1.9
  article-title: Evolutionary-scale prediction of atomic-level protein structure with a language model
  publication-title: Science
– volume: 49
  start-page: D545
  issue: D1
  year: 2021
  end-page: D551
  ident: 2024.11.25.625140v1.41
  article-title: Kegg: integrating viruses and cellular organisms
  publication-title: Nucleic acids research
– volume: 224
  start-page: iyad031
  issue: 1
  year: 2023
  ident: 2024.11.25.625140v1.48
  article-title: The gene ontology knowledgebase in 2023
  publication-title: Genetics
– start-page: 117
  year: 2018
  end-page: 126
  ident: 2024.11.25.625140v1.32
  article-title: Regal: Representation learning-based graph alignment
  publication-title: In Proceedings of the 27th ACM international conference on information and knowledge management
– volume: 18
  start-page: 463
  issue: 6
  year: 2019
  end-page: 477
  ident: 2024.11.25.625140v1.4
  article-title: Applications of machine learning in drug discovery and development
  publication-title: Nature reviews Drug discovery
– volume: 31
  start-page: 926
  issue: 6
  year: 2015
  end-page: 932
  ident: 2024.11.25.625140v1.10
  article-title: and UniProt Consortium. Uniref clusters: a comprehensive and scalable alternative for improving sequence similarity searches
  publication-title: Bioinformatics
– volume: 13
  start-page: 5550
  issue: 1
  year: 2022
  ident: 2024.11.25.625140v1.15
  article-title: Protein condensation diseases: therapeutic opportunities
  publication-title: Nature communications
– volume: 75
  start-page: 100741
  year: 2023
  ident: 2024.11.25.625140v1.38
  article-title: Towards the web of embeddings: Integrating multiple knowledge graph embedding spaces with fedcoder
  publication-title: Journal of Web Semantics
– year: 2024
  ident: 2024.11.25.625140v1.26
  article-title: Computational strategies for cross-species knowledge transfer and translational biomedicine
  publication-title: arXiv preprint
– year: 2013
  ident: 2024.11.25.625140v1.29
  article-title: Exploiting similarities among languages for machine translation
  publication-title: arXiv preprint
SSID ssj0002961374
Score 1.7411779
SecondaryResourceType preprint
Snippet Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source...
SourceID biorxiv
proquest
SourceType Open Access Repository
Aggregation Database
SubjectTerms Amino acid sequence
Bioinformatics
Embedding
Localization
Orthology
Predictions
Protein sources
Proteins
Species
Title SPACE: STRING proteins as complementary embeddings
URI https://www.proquest.com/docview/3133039781
https://www.biorxiv.org/content/10.1101/2024.11.25.625140
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1NS8MwGA66KXjyG6dzVPDaLX3btI0X0bGhB0dxE-apNF_Qg9ts53D_3qTN9CB48lIKgRJe0vfzyfMgdM39kGae4q7PAu7qeK3cLGI6kfNA4YwqQYBVYhPRaBRPpzSx16NLC6vc-MTKUddszwa3rZ1wT8y56Zj3fF1aYd_wNd0u3l2jIWVmrVZQYxs1DfEWbqBm8viUvH73XIDq4FURM0NItSMATOygUx9M0wYI9FsXSFcXBZ5ph-yyfF585qtfjrqKPsP9_933gd5vtpDFIdqSsyO0W6tRro8RjJO7_uDGGU8MQsKpGBzyWelkpVMBz2ucebF25BuToppZnaCX4WDSf3CtpILLPIMdpBDFCjOlgErMuCAkFj4XXiR03SNCjmMaKAqeFJwDAw76IfQao5kgUieDp6gxm8_kGXKopwIOoU5HYj8w2qGRxwVlUlISYQ6yha6s9dJFTZyRGgvrkiMFktYWbqH2xmip_XfK9MdG538vX6A980VzMxDCNmosiw95iXb4apmXRQc17wej5Llj0JxJxx6FL1cGvaY
linkProvider ProQuest
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1JS8NAFH5ItejJHXcj6DGavKwjiIhaWtRSaAU9xcwGPdjWxK1_yt_omyTVg-DNg5cQGAhhvjdvn_cB7AsvZKmrhe1xX9hkr7WdRpwcORe1kzItA-QF2UTUbsd3d6wzBR-TuzCmrXKiEwtFLYfC5MiPPAqmHM9MaDodPdmGNcpUVycUGqVYXKnxG4Vs-UnrgvA9QGxc9s6bdsUqYHPXtM8xjGLtcK2RKYcLGQSx9IR0I0muvwyFEzNfM3SVFAI5CqSHpDXOUhkoZgYdkMqf9knYnRpMd1o3nfuvrA4yMo_F6GcMGakadIKqlEqibxINPr0dYnBIYYdrEi513h9m7_3XH6agsG-N-f-2Mwu0I-lIZYswpQZLUC8ZNcfLgN3O2fnlsdXtmS4Pq5hC0R_kVppbRfN82SufjS31yJUs6m4rcPsnv7kKtcFwoNbAYq72BYbkUsWeb_hPI1dIxpViQeQIVOuwV-GTjMrhH4nBkMKmBIOkxHAdtiawJNX5z5NvTDZ-X96F2Wbv5jq5brWvNmHOfN3cdMRwC2rP2Yvahhnx-tzPs51K1Cx4-GsMPwGr6g34
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SPACE%3A+STRING+proteins+as+complementary+embeddings&rft.jtitle=bioRxiv&rft.au=Hu%2C+Dewei&rft.au=Szklarczyk%2C+Damian&rft.au=von+Mering%2C+Christian&rft.au=Jensen%2C+Lars+Juhl&rft.date=2024-11-26&rft.pub=Cold+Spring+Harbor+Laboratory&rft.eissn=2692-8205&rft_id=info:doi/10.1101%2F2024.11.25.625140&rft.externalDocID=2024.11.25.625140v1
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2692-8205&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2692-8205&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2692-8205&client=summon