SPACE: STRING proteins as complementary embeddings

Abstract Motivation Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but the use of protein networks has proven to be challenging in the context o...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Bioinformatics (Oxford, England) Ročník 41; číslo 9
Hlavní autoři: Hu, Dewei, Szklarczyk, Damian, von Mering, Christian, Jensen, Lars Juhl
Médium: Journal Article
Jazyk:angličtina
Vydáno: England Oxford University Press 01.09.2025
Oxford Publishing Limited (England)
Témata:
ISSN:1367-4811, 1367-4803, 1367-4811
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Abstract Motivation Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but the use of protein networks has proven to be challenging in the context of machine learning, especially in a cross-species setting. Results We leveraged the STRING database of protein networks and orthology relations for 1322 eukaryotes to generate network-based cross-species protein embeddings. We did this by first creating species-specific network embeddings and subsequently aligning them based on orthology relations to facilitate direct cross-species comparisons. We show that these aligned network embeddings ensure consistency across species without sacrificing quality compared to species-specific network embeddings. We also show that the aligned network embeddings are complementary to sequence embedding techniques, despite the use of sequence-based orthology relations in the alignment process. Finally, we validated the embeddings by using them for two well-established tasks: subcellular localization prediction and protein function prediction. Training logistic regression classifiers on aligned network embeddings and sequence embeddings improved the accuracy over using sequence alone, reaching performance numbers close to state-of-the-art deep-learning methods. Availability and implementation The source code and scripts for generating the network-based cross-species protein embeddings are available at https://github.com/deweihu96/SPACE. Precomputed network embeddings and sequence embeddings for all eukaryotic proteins are included in STRING version 12.0 (https://string-db.org/cgi/download).
AbstractList Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but the use of protein networks has proven to be challenging in the context of machine learning, especially in a cross-species setting. We leveraged the STRING database of protein networks and orthology relations for 1322 eukaryotes to generate network-based cross-species protein embeddings. We did this by first creating species-specific network embeddings and subsequently aligning them based on orthology relations to facilitate direct cross-species comparisons. We show that these aligned network embeddings ensure consistency across species without sacrificing quality compared to species-specific network embeddings. We also show that the aligned network embeddings are complementary to sequence embedding techniques, despite the use of sequence-based orthology relations in the alignment process. Finally, we validated the embeddings by using them for two well-established tasks: subcellular localization prediction and protein function prediction. Training logistic regression classifiers on aligned network embeddings and sequence embeddings improved the accuracy over using sequence alone, reaching performance numbers close to state-of-the-art deep-learning methods. The source code and scripts for generating the network-based cross-species protein embeddings are available at https://github.com/deweihu96/SPACE. Precomputed network embeddings and sequence embeddings for all eukaryotic proteins are included in STRING version 12.0 (https://string-db.org/cgi/download).
Motivation Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but the use of protein networks has proven to be challenging in the context of machine learning, especially in a cross-species setting. Results We leveraged the STRING database of protein networks and orthology relations for 1322 eukaryotes to generate network-based cross-species protein embeddings. We did this by first creating species-specific network embeddings and subsequently aligning them based on orthology relations to facilitate direct cross-species comparisons. We show that these aligned network embeddings ensure consistency across species without sacrificing quality compared to species-specific network embeddings. We also show that the aligned network embeddings are complementary to sequence embedding techniques, despite the use of sequence-based orthology relations in the alignment process. Finally, we validated the embeddings by using them for two well-established tasks: subcellular localization prediction and protein function prediction. Training logistic regression classifiers on aligned network embeddings and sequence embeddings improved the accuracy over using sequence alone, reaching performance numbers close to state-of-the-art deep-learning methods. Availability and implementation The source code and scripts for generating the network-based cross-species protein embeddings are available at https://github.com/deweihu96/SPACE. Precomputed network embeddings and sequence embeddings for all eukaryotic proteins are included in STRING version 12.0 (https://string-db.org/cgi/download).
Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but the use of protein networks has proven to be challenging in the context of machine learning, especially in a cross-species setting.MOTIVATIONRepresentation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but the use of protein networks has proven to be challenging in the context of machine learning, especially in a cross-species setting.We leveraged the STRING database of protein networks and orthology relations for 1,322 eukaryotes to generate network-based cross-species protein embeddings. We did this by first creating species-specific network embeddings and subsequently aligning them based on orthology relations to facilitate direct cross-species comparisons. We show that these aligned network embeddings ensure consistency across species without sacrificing quality compared to species-specific network embeddings. We also show that the aligned network embeddings are complementary to sequence embedding techniques, despite the use of seqeuence-based orthology relations in the alignment process. Finally, we validated the embeddings by using them for two well-established tasks: subcellular localization prediction and protein function prediction. Training logistic regression classifiers on aligned network embeddings and sequence embeddings improved the accuracy over using sequence alone, reaching performance numbers close to state-of-the-art deep-learning methods.RESULTSWe leveraged the STRING database of protein networks and orthology relations for 1,322 eukaryotes to generate network-based cross-species protein embeddings. We did this by first creating species-specific network embeddings and subsequently aligning them based on orthology relations to facilitate direct cross-species comparisons. We show that these aligned network embeddings ensure consistency across species without sacrificing quality compared to species-specific network embeddings. We also show that the aligned network embeddings are complementary to sequence embedding techniques, despite the use of seqeuence-based orthology relations in the alignment process. Finally, we validated the embeddings by using them for two well-established tasks: subcellular localization prediction and protein function prediction. Training logistic regression classifiers on aligned network embeddings and sequence embeddings improved the accuracy over using sequence alone, reaching performance numbers close to state-of-the-art deep-learning methods.The source code and scripts for generating the network-based cross-species protein embeddings are available at https://github.com/deweihu96/SPACE. Precomputed network embeddings and sequence embeddings for all eukaryotic proteins are included in STRING version 12.0 (https://string-db.org/cgi/download).AVAILABILITY AND IMPLEMENTATIONThe source code and scripts for generating the network-based cross-species protein embeddings are available at https://github.com/deweihu96/SPACE. Precomputed network embeddings and sequence embeddings for all eukaryotic proteins are included in STRING version 12.0 (https://string-db.org/cgi/download).Supplementary data are available at Bioinformatics online.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
Abstract Motivation Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but the use of protein networks has proven to be challenging in the context of machine learning, especially in a cross-species setting. Results We leveraged the STRING database of protein networks and orthology relations for 1322 eukaryotes to generate network-based cross-species protein embeddings. We did this by first creating species-specific network embeddings and subsequently aligning them based on orthology relations to facilitate direct cross-species comparisons. We show that these aligned network embeddings ensure consistency across species without sacrificing quality compared to species-specific network embeddings. We also show that the aligned network embeddings are complementary to sequence embedding techniques, despite the use of sequence-based orthology relations in the alignment process. Finally, we validated the embeddings by using them for two well-established tasks: subcellular localization prediction and protein function prediction. Training logistic regression classifiers on aligned network embeddings and sequence embeddings improved the accuracy over using sequence alone, reaching performance numbers close to state-of-the-art deep-learning methods. Availability and implementation The source code and scripts for generating the network-based cross-species protein embeddings are available at https://github.com/deweihu96/SPACE. Precomputed network embeddings and sequence embeddings for all eukaryotic proteins are included in STRING version 12.0 (https://string-db.org/cgi/download).
Author Hu, Dewei
Jensen, Lars Juhl
Szklarczyk, Damian
von Mering, Christian
Author_xml – sequence: 1
  givenname: Dewei
  orcidid: 0009-0005-5823-1498
  surname: Hu
  fullname: Hu, Dewei
  email: larsjuhl.jensen@zs.com
– sequence: 2
  givenname: Damian
  surname: Szklarczyk
  fullname: Szklarczyk, Damian
– sequence: 3
  givenname: Christian
  orcidid: 0000-0001-7734-9102
  surname: von Mering
  fullname: von Mering, Christian
– sequence: 4
  givenname: Lars Juhl
  surname: Jensen
  fullname: Jensen, Lars Juhl
  email: larsjuhl.jensen@zs.com
BackLink https://www.ncbi.nlm.nih.gov/pubmed/40924541$$D View this record in MEDLINE/PubMed
BookMark eNqNkD1PwzAQhi1URD_gL1SRWFhC7cRxYraqKqVSBYiWOXLsM0qV2CFOhv57jFoQMDHdDc97eu8Zo4GxBhCaEnxLMI9nRWlLo21bi66UblZ0QlPOztCIxCwNaUbI4Mc-RGPn9hjjBCfsAg0p5hFNKBmhaPs8Xyzvgu3uZf24CprWdlAaFwgXSFs3FdRgOtEeAqgLUKo0b-4SnWtRObg6zQl6vV_uFg_h5mm1Xsw3oYzTrAuVTKVSSsdSi6iAJGOaF4xy4JgowTOFZQpU8pQqJlhEJdMZpiQCKbQQGOIJujne9aXee3BdXpdOQlUJA7Z3eRzRjFIWEeLR6z_o3vat8e08lcTEF6LYU9MT1Rc1qLxpy9q_ln_Z8AA7ArK1zrWgvxGC80_t-W_t-Um7D5Jj0PbNfzMfSsCLhg
Cites_doi 10.1073/pnas.2016239118
10.1093/bioinformatics/btad529
10.1093/bioinformatics/btad047
10.1093/genetics/iyad031
10.1002/1873-3468.12307
10.1146/annurev-pharmtox-040323-040828
10.1093/nar/gkab398
10.1093/nar/gkz388
10.1038/75556
10.1016/j.gpb.2023.04.001
10.1093/nar/gkac278
10.1109/NAFOSTED.2017.8108071
10.1109/TCBB.2021.3080386
10.1016/j.bbamcr.2006.09.005
10.1038/s41598-022-21366-2
10.1371/journal.pcbi.1000807
10.1016/j.bbamcr.2008.10.016
10.1093/nargab/lqac043
10.18653/v1/P19-1018
10.1002/prot.25832
10.1038/nrm2378
10.1016/j.aiopen.2021.01.001
10.1109/TETCI.2019.2952908
10.1109/TPAMI.2021.3095381
10.1371/journal.pcbi.1011773
10.1016/S1672-0229(04)02027-3
10.1126/science.ade2574
10.1093/nar/gkac1000
10.1093/nar/gkac1022
10.1186/s12859-022-04873-x
10.1093/bioinformatics/btac020
10.1093/bib/bbac142
10.1016/j.websem.2022.100741
10.1145/3633518
10.1186/s12864-019-6413-7
10.1021/acscatal.3c02743
10.1093/bioinformatics/btu739
10.1038/s41467-022-32940-7
10.1016/S0962-8924(97)01197-5
10.2174/1574893609666140212000304
10.1093/nar/gkz132
10.1038/s41573-019-0024-5
10.1093/nar/gkaa970
ContentType Journal Article
Copyright The Author(s) 2025. Published by Oxford University Press. 2025
The Author(s) 2025. Published by Oxford University Press.
2025 The Author(s) 2025. Published by Oxford University Press. This work is published under https://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: The Author(s) 2025. Published by Oxford University Press. 2025
– notice: The Author(s) 2025. Published by Oxford University Press.
– notice: 2025 The Author(s) 2025. Published by Oxford University Press. This work is published under https://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID TOX
AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7QF
7QO
7QQ
7SC
7SE
7SP
7SR
7TA
7TB
7TM
7TO
7U5
8BQ
8FD
F28
FR3
H8D
H8G
H94
JG9
JQ2
K9.
KR7
L7M
L~C
L~D
P64
7X8
DOI 10.1093/bioinformatics/btaf496
DatabaseName Oxford Journals Open Access Collection
CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Aluminium Industry Abstracts
Biotechnology Research Abstracts
Ceramic Abstracts
Computer and Information Systems Abstracts
Corrosion Abstracts
Electronics & Communications Abstracts
Engineered Materials Abstracts
Materials Business File
Mechanical & Transportation Engineering Abstracts
Nucleic Acids Abstracts
Oncogenes and Growth Factors Abstracts
Solid State and Superconductivity Abstracts
METADEX
Technology Research Database
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
Aerospace Database
Copper Technical Reference Library
AIDS and Cancer Research Abstracts
Materials Research Database
ProQuest Computer Science Collection
ProQuest Health & Medical Complete (Alumni)
Civil Engineering Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Biotechnology and BioEngineering Abstracts
MEDLINE - Academic
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Materials Research Database
Oncogenes and Growth Factors Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Mechanical & Transportation Engineering Abstracts
Nucleic Acids Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Health & Medical Complete (Alumni)
Materials Business File
Aerospace Database
Copper Technical Reference Library
Engineered Materials Abstracts
Biotechnology Research Abstracts
AIDS and Cancer Research Abstracts
Advanced Technologies Database with Aerospace
ANTE: Abstracts in New Technology & Engineering
Civil Engineering Abstracts
Aluminium Industry Abstracts
Electronics & Communications Abstracts
Ceramic Abstracts
METADEX
Biotechnology and BioEngineering Abstracts
Computer and Information Systems Abstracts Professional
Solid State and Superconductivity Abstracts
Engineering Research Database
Corrosion Abstracts
MEDLINE - Academic
DatabaseTitleList MEDLINE
Materials Research Database
MEDLINE - Academic

Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: TOX
  name: Oxford Open
  url: https://academic.oup.com/journals/
  sourceTypes: Publisher
– sequence: 3
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1367-4811
ExternalDocumentID 40924541
10_1093_bioinformatics_btaf496
10.1093/bioinformatics/btaf496
Genre Journal Article
GrantInformation_xml – fundername: Novo Nordisk Foundation
  grantid: NNF20SA0035590
– fundername: Swiss Institute of Bioinformatics
– fundername: Novo Nordisk Foundation
  grantid: NNF14CC0001
– fundername: Novo Nordisk Foundation
GroupedDBID ---
-E4
-~X
.-4
.2P
.DC
.GJ
.I3
0R~
1TH
23N
2WC
4.4
48X
53G
5GY
5WA
70D
AAIJN
AAIMJ
AAJKP
AAJQQ
AAKPC
AAMDB
AAMVS
AAOGV
AAPQZ
AAPXW
AAUQX
AAVAP
AAVLN
ABEFU
ABEJV
ABEUO
ABGNP
ABIXL
ABNGD
ABNKS
ABPQP
ABPTD
ABQLI
ABWST
ABXVV
ABZBJ
ACGFS
ACIWK
ACPRK
ACUFI
ACUKT
ACUXJ
ACYTK
ADBBV
ADEYI
ADEZT
ADFTL
ADGKP
ADGZP
ADHKW
ADHZD
ADMLS
ADOCK
ADPDF
ADRDM
ADRTK
ADVEK
ADYVW
ADZTZ
ADZXQ
AECKG
AEGPL
AEJOX
AEKKA
AEKSI
AELWJ
AEMDU
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFNX
AFFZL
AFGWE
AFIYH
AFOFC
AFRAH
AGINJ
AGKEF
AGQPQ
AGQXC
AGSYK
AHMBA
AHXPO
AI.
AIJHB
AJEEA
AJEUX
AKHUL
AKWXX
ALMA_UNASSIGNED_HOLDINGS
ALTZX
ALUQC
AMNDL
APIBT
APWMN
AQDSO
ARIXL
ASPBG
ATTQO
AVWKF
AXUDD
AYOIW
AZFZN
AZVOD
BAWUL
BAYMD
BHONS
BQDIO
BQUQU
BSWAC
BTQHN
C1A
C45
CAG
CDBKE
COF
CS3
CZ4
DAKXR
DIK
DILTD
DU5
D~K
EBD
EBS
EE~
EJD
ELUNK
EMOBN
F5P
F9B
FEDTE
FHSFR
FLIZI
FLUFQ
FOEOM
FQBLK
GAUVT
GJXCC
GROUPED_DOAJ
GX1
H13
H5~
HAR
HVGLF
HW0
HZ~
IOX
J21
JXSIZ
KAQDR
KOP
KQ8
KSI
KSN
M-Z
MK~
ML0
N9A
NGC
NLBLG
NMDNZ
NOMLY
NTWIH
NU-
NVLIB
O0~
O9-
OAWHX
ODMLO
OJQWA
OK1
OVD
OVEED
O~Y
P2P
PAFKI
PB-
PEELM
PQQKQ
Q1.
Q5Y
R44
RD5
RNI
RNS
ROL
ROX
RPM
RUSNO
RW1
RXO
RZF
RZO
SV3
TEORI
TJP
TLC
TOX
TR2
VH1
W8F
WOQ
X7H
YAYTL
YKOAZ
YXANX
ZGI
ZKX
~91
~KM
AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7QF
7QO
7QQ
7SC
7SE
7SP
7SR
7TA
7TB
7TM
7TO
7U5
8BQ
8FD
F28
FR3
H8D
H8G
H94
JG9
JQ2
K9.
KR7
L7M
L~C
L~D
P64
7X8
ID FETCH-LOGICAL-c378t-dc7cdddf3cfa2be586f9b649e901da98d0c7e4c974d6a624c6f80412ecafaa0e3
IEDL.DBID TOX
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001575687900001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1367-4811
1367-4803
IngestDate Thu Sep 11 00:06:07 EDT 2025
Wed Nov 26 12:51:51 EST 2025
Thu Sep 25 01:51:33 EDT 2025
Sat Nov 29 07:29:10 EST 2025
Mon Dec 01 07:41:35 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 9
Language English
License This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
https://creativecommons.org/licenses/by/4.0
The Author(s) 2025. Published by Oxford University Press.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c378t-dc7cdddf3cfa2be586f9b649e901da98d0c7e4c974d6a624c6f80412ecafaa0e3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0009-0005-5823-1498
0000-0001-7734-9102
OpenAccessLink https://dx.doi.org/10.1093/bioinformatics/btaf496
PMID 40924541
PQID 3253137840
PQPubID 36124
ParticipantIDs proquest_miscellaneous_3248446211
proquest_journals_3253137840
pubmed_primary_40924541
crossref_primary_10_1093_bioinformatics_btaf496
oup_primary_10_1093_bioinformatics_btaf496
PublicationCentury 2000
PublicationDate 2025-09-01
PublicationDateYYYYMMDD 2025-09-01
PublicationDate_xml – month: 09
  year: 2025
  text: 2025-09-01
  day: 01
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
– name: Oxford
PublicationTitle Bioinformatics (Oxford, England)
PublicationTitleAlternate Bioinformatics
PublicationYear 2025
Publisher Oxford University Press
Oxford Publishing Limited (England)
Publisher_xml – name: Oxford University Press
– name: Oxford Publishing Limited (England)
References Saleem (2025092219523514700_btaf496-B44) 2006; 1763
Pokharel (2025092219523514700_btaf496-B40) 2022; 12
Suzek (2025092219523514700_btaf496-B45) 2015; 31
Perozzi (2025092219523514700_btaf496-B39) 2014
De Matteis (2025092219523514700_btaf496-B11) 2008; 9
Mikolov (2025092219523514700_btaf496-B35) 2013
Li (2025092219523514700_btaf496-B29) 2023; 39
Dubey (2025092219523514700_btaf496-B14) 2011; 3
Aleksander (2025092219523514700_btaf496-B1) 2023; 224
Mancuso (2025092219523514700_btaf496-B32) 2024; 20
Le (2025092219523514700_btaf496-B28) 2017
Baumgartner (2025092219523514700_btaf496-B3) 2023; 75
Rives (2025092219523514700_btaf496-B42) 2021; 118
Joulin (2025092219523514700_btaf496-B22) 2018
Lin (2025092219523514700_btaf496-B30) 2023; 379
Vendruscolo (2025092219523514700_btaf496-B49) 2022; 13
Grover (2025092219523514700_btaf496-B17) 2016
Ashburner (2025092219523514700_btaf496-B2) 2000; 25
Fan (2025092219523514700_btaf496-B16) 2019; 47
Braulke (2025092219523514700_btaf496-B7) 2009; 1793
Kalinowski (2025092219523514700_btaf496-B23) 2020
Bernhofer (2025092219523514700_btaf496-B4) 2022; 23
Elnaggar (2025092219523514700_btaf496-B15) 2022; 44
Thumuluri (2025092219523514700_btaf496-B47) 2022; 50
Khoshraftar (2025092219523514700_btaf496-B25) 2024; 15
Chu (2025092219523514700_btaf496-B9) 2019
Kanehisa (2025092219523514700_btaf496-B24) 2021; 49
Kouba (2025092219523514700_btaf496-B27) 2023; 13
Bonetta (2025092219523514700_btaf496-B5) 2020; 88
De Las Rivas (2025092219523514700_btaf496-B10) 2010; 6
Patra (2025092219523514700_btaf496-B38) 2019
Szklarczyk (2025092219523514700_btaf496-B46) 2023; 51
Yuan (2025092219523514700_btaf496-B57) 2024
Raffel (2025092219523514700_btaf496-B41) 2020; 21
Yao (2025092219523514700_btaf496-B55) 2021; 49
Hasselgren (2025092219523514700_btaf496-B18) 2024; 64
Du (2025092219523514700_btaf496-B13) 2019
Chicco (2025092219523514700_btaf496-B8) 2020; 21
Kipf (2025092219523514700_btaf496-B26) 2016
Pan (2025092219523514700_btaf496-B37) 2022; 19
Vamathevan (2025092219523514700_btaf496-B48) 2019; 18
Villegas-Morcillo (2025092219523514700_btaf496-B50) 2022; 23
Dönnes (2025092219523514700_btaf496-B12) 2004; 2
Liu (2025092219523514700_btaf496-B31) 2023; 39
Martins (2025092219523514700_btaf496-B33) 2023
Zhou (2025092219523514700_btaf496-B58) 2020; 1
Heinzinger (2025092219523514700_btaf496-B20) 2022; 4
Brandes (2025092219523514700_btaf496-B6) 2022; 38
Wang (2025092219523514700_btaf496-B52) 2014; 9
Xia (2025092219523514700_btaf496-B54) 2020; 4
Heimann (2025092219523514700_btaf496-B19) 2018
Wang (2025092219523514700_btaf496-B51) 2023; 21
Hernández-Plaza (2025092219523514700_btaf496-B21) 2023; 51
Munro (2025092219523514700_btaf496-B36) 1998; 8
Rost (2025092219523514700_btaf496-B43) 2016; 590
You (2025092219523514700_btaf496-B56) 2019; 47
References_xml – volume: 118
  start-page: e2016239118
  year: 2021
  ident: 2025092219523514700_btaf496-B42
  article-title: Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences
  publication-title: Proc Natl Acad Sci USA
  doi: 10.1073/pnas.2016239118
– volume: 39
  start-page: btad529
  year: 2023
  ident: 2025092219523514700_btaf496-B29
  article-title: Joint embedding of biological networks for cross-species functional alignment
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btad529
– volume: 39
  start-page: btad047
  year: 2023
  ident: 2025092219523514700_btaf496-B31
  article-title: Accurately modeling biased random walks on weighted networks using node2vec
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btad047
– volume: 224
  start-page: iyad031
  year: 2023
  ident: 2025092219523514700_btaf496-B1
  article-title: The gene ontology knowledgebase in 2023
  publication-title: Genetics
  doi: 10.1093/genetics/iyad031
– volume: 590
  start-page: 2327
  year: 2016
  ident: 2025092219523514700_btaf496-B43
  article-title: Protein function in precision medicine: deep understanding with machine learning
  publication-title: FEBS Lett
  doi: 10.1002/1873-3468.12307
– year: 2016
  ident: 2025092219523514700_btaf496-B17
– volume: 64
  start-page: 527
  year: 2024
  ident: 2025092219523514700_btaf496-B18
  article-title: Artificial intelligence for drug discovery: are we there yet?
  publication-title: Annu Rev Pharmacol Toxicol
  doi: 10.1146/annurev-pharmtox-040323-040828
– volume: 49
  start-page: W469
  year: 2021
  ident: 2025092219523514700_btaf496-B55
  article-title: Netgo 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkab398
– volume: 47
  start-page: W379
  year: 2019
  ident: 2025092219523514700_btaf496-B56
  article-title: Netgo: improving large-scale protein function prediction with massive network information
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkz388
– volume: 25
  start-page: 25
  year: 2000
  ident: 2025092219523514700_btaf496-B2
  article-title: Gene ontology: tool for the unification of biology
  publication-title: Nat Genet
  doi: 10.1038/75556
– volume: 21
  start-page: 349
  year: 2023
  ident: 2025092219523514700_btaf496-B51
  article-title: Netgo 3.0: protein language model improves large-scale functional annotations
  publication-title: Genom Proteom Bioinform
  doi: 10.1016/j.gpb.2023.04.001
– volume: 50
  start-page: W228
  year: 2022
  ident: 2025092219523514700_btaf496-B47
  article-title: Deeploc 2.0: multi-label subcellular localization prediction using protein language models
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkac278
– start-page: 117
  year: 2018
  ident: 2025092219523514700_btaf496-B19
– start-page: 242
  volume-title: 2017 4th NAFOSTED Conference on Information and Computer Science
  year: 2017
  ident: 2025092219523514700_btaf496-B28
  doi: 10.1109/NAFOSTED.2017.8108071
– volume: 19
  start-page: 666
  year: 2022
  ident: 2025092219523514700_btaf496-B37
  article-title: Identifying protein subcellular locations with embeddings-based node2loc
  publication-title: IEEE/ACM Trans Comput Biol Bioinform
  doi: 10.1109/TCBB.2021.3080386
– year: 2020
  ident: 2025092219523514700_btaf496-B23
– volume: 1763
  start-page: 1541
  year: 2006
  ident: 2025092219523514700_btaf496-B44
  article-title: Proteomics of the peroxisome
  publication-title: Biochim Biophys Acta
  doi: 10.1016/j.bbamcr.2006.09.005
– volume: 12
  start-page: 16933
  year: 2022
  ident: 2025092219523514700_btaf496-B40
  article-title: Improving protein succinylation sites prediction using embeddings from protein language model
  publication-title: Sci Rep
  doi: 10.1038/s41598-022-21366-2
– volume: 6
  start-page: e1000807
  year: 2010
  ident: 2025092219523514700_btaf496-B10
  article-title: Protein–protein interactions essentials: key concepts to building and analyzing interactome networks
  publication-title: PLoS Comput Biol
  doi: 10.1371/journal.pcbi.1000807
– volume: 1793
  start-page: 605
  year: 2009
  ident: 2025092219523514700_btaf496-B7
  article-title: Sorting of lysosomal proteins
  publication-title: Biochim Biophys Acta
  doi: 10.1016/j.bbamcr.2008.10.016
– volume: 4
  start-page: lqac043
  year: 2022
  ident: 2025092219523514700_btaf496-B20
  article-title: Contrastive learning on protein embeddings enlightens midnight zone
  publication-title: NAR Genom Bioinform
  doi: 10.1093/nargab/lqac043
– start-page: 184
  volume-title: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
  year: 2019
  ident: 2025092219523514700_btaf496-B38
  doi: 10.18653/v1/P19-1018
– volume: 88
  start-page: 397
  year: 2020
  ident: 2025092219523514700_btaf496-B5
  article-title: Machine learning techniques for protein function prediction
  publication-title: Proteins: Struct Funct Bioinform
  doi: 10.1002/prot.25832
– volume: 9
  start-page: 273
  year: 2008
  ident: 2025092219523514700_btaf496-B11
  article-title: Exiting the Golgi complex
  publication-title: Nat Rev Mol Cell Biol
  doi: 10.1038/nrm2378
– volume: 1
  start-page: 57
  year: 2020
  ident: 2025092219523514700_btaf496-B58
  article-title: Graph neural networks: a review of methods and applications
  publication-title: AI Open
  doi: 10.1016/j.aiopen.2021.01.001
– start-page: 479
  year: 2019
  ident: 2025092219523514700_btaf496-B13
– start-page: 273
  year: 2019
  ident: 2025092219523514700_btaf496-B9
– volume: 4
  start-page: 95
  year: 2020
  ident: 2025092219523514700_btaf496-B54
  article-title: Random walks: a review of algorithms and applications
  publication-title: IEEE Trans Emerg Top Comput Intell
  doi: 10.1109/TETCI.2019.2952908
– year: 2013
  ident: 2025092219523514700_btaf496-B35
– year: 2024
  ident: 2025092219523514700_btaf496-B57
– volume: 44
  start-page: 7112
  year: 2022
  ident: 2025092219523514700_btaf496-B15
  article-title: ProtTrans: toward understanding the language of life through self-supervised learning
  publication-title: IEEE Trans Pattern Anal Mach Intell
  doi: 10.1109/TPAMI.2021.3095381
– volume: 20
  start-page: e1011773
  year: 2024
  ident: 2025092219523514700_btaf496-B32
  article-title: Joint representation of molecular networks from multiple species improves gene classification
  publication-title: PLoS Comput Biol
  doi: 10.1371/journal.pcbi.1011773
– volume: 2
  start-page: 209
  year: 2004
  ident: 2025092219523514700_btaf496-B12
  article-title: Predicting protein subcellular localization: past, present, and future
  publication-title: Genom Proteom Bioinform
  doi: 10.1016/S1672-0229(04)02027-3
– volume: 379
  start-page: 1123
  year: 2023
  ident: 2025092219523514700_btaf496-B30
  article-title: Evolutionary-scale prediction of atomic-level protein structure with a language model
  publication-title: Science
  doi: 10.1126/science.ade2574
– start-page: 701
  year: 2014
  ident: 2025092219523514700_btaf496-B39
– volume: 51
  start-page: D638
  year: 2023
  ident: 2025092219523514700_btaf496-B46
  article-title: The string database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkac1000
– volume: 3
  start-page: 392
  year: 2011
  ident: 2025092219523514700_btaf496-B14
  article-title: Subcellular localization of proteins
  publication-title: Arch Appl Sci Res
– year: 2018
  ident: 2025092219523514700_btaf496-B22
– volume: 21
  start-page: 1
  year: 2020
  ident: 2025092219523514700_btaf496-B41
  article-title: Exploring the limits of transfer learning with a unified text-to-text transformer
  publication-title: J Mach Learn Res
– volume: 51
  start-page: D389
  year: 2023
  ident: 2025092219523514700_btaf496-B21
  article-title: eggnog 6.0: enabling comparative genomics across 12 535 organisms
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkac1022
– volume: 23
  start-page: 326
  year: 2022
  ident: 2025092219523514700_btaf496-B4
  article-title: TMbed: transmembrane proteins predicted through language model embeddings
  publication-title: BMC Bioinformatics
  doi: 10.1186/s12859-022-04873-x
– volume: 38
  start-page: 2102
  year: 2022
  ident: 2025092219523514700_btaf496-B6
  article-title: ProteinBERT: a universal deep-learning model of protein sequence and function
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btac020
– volume: 23
  start-page: bbac142
  year: 2022
  ident: 2025092219523514700_btaf496-B50
  article-title: An analysis of protein language model embeddings for fold prediction
  publication-title: Brief Bioinform
  doi: 10.1093/bib/bbac142
– year: 2016
  ident: 2025092219523514700_btaf496-B26
– volume: 75
  start-page: 100741
  year: 2023
  ident: 2025092219523514700_btaf496-B3
  article-title: Towards the web of embeddings: integrating multiple knowledge graph embedding spaces with FedCoder
  publication-title: J Web Semant
  doi: 10.1016/j.websem.2022.100741
– volume: 15
  start-page: 1
  year: 2024
  ident: 2025092219523514700_btaf496-B25
  article-title: A survey on graph representation learning methods
  publication-title: ACM Trans Intell Syst Technol
  doi: 10.1145/3633518
– volume: 21
  start-page: 6
  year: 2020
  ident: 2025092219523514700_btaf496-B8
  article-title: The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation
  publication-title: BMC Genomics
  doi: 10.1186/s12864-019-6413-7
– volume: 13
  start-page: 13863
  year: 2023
  ident: 2025092219523514700_btaf496-B27
  article-title: Machine learning-guided protein engineering
  publication-title: ACS Catal
  doi: 10.1021/acscatal.3c02743
– volume: 31
  start-page: 926
  year: 2015
  ident: 2025092219523514700_btaf496-B45
  article-title: Uniref clusters: a comprehensive and scalable alternative for improving sequence similarity searches
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btu739
– year: 2023
  ident: 2025092219523514700_btaf496-B33
– volume: 13
  start-page: 5550
  year: 2022
  ident: 2025092219523514700_btaf496-B49
  article-title: Protein condensation diseases: therapeutic opportunities
  publication-title: Nat Commun
  doi: 10.1038/s41467-022-32940-7
– volume: 8
  start-page: 11
  year: 1998
  ident: 2025092219523514700_btaf496-B36
  article-title: Localization of proteins to the Golgi apparatus
  publication-title: Trends Cell Biol
  doi: 10.1016/S0962-8924(97)01197-5
– volume: 9
  start-page: 331
  year: 2014
  ident: 2025092219523514700_btaf496-B52
  article-title: Review of protein subcellular localization prediction
  publication-title: CBIO
  doi: 10.2174/1574893609666140212000304
– volume: 47
  start-page: e51
  year: 2019
  ident: 2025092219523514700_btaf496-B16
  article-title: Functional protein representations from biological networks enable diverse cross-species inference
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkz132
– volume: 18
  start-page: 463
  year: 2019
  ident: 2025092219523514700_btaf496-B48
  article-title: Applications of machine learning in drug discovery and development
  publication-title: Nat Rev Drug Discov
  doi: 10.1038/s41573-019-0024-5
– volume: 49
  start-page: D545
  year: 2021
  ident: 2025092219523514700_btaf496-B24
  article-title: KEGG: integrating viruses and cellular organisms
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkaa970
SSID ssj0005056
Score 2.4889717
Snippet Abstract Motivation Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are...
Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source...
Motivation Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an...
SourceID proquest
pubmed
crossref
oup
SourceType Aggregation Database
Index Database
Publisher
SubjectTerms Amino acid sequence
Availability
Computational Biology - methods
Databases, Protein
Deep Learning
Embedding
Eukaryotes
Humans
Localization
Machine Learning
Networks
Orthology
Predictions
Proteins
Proteins - chemistry
Proteins - metabolism
Sequence Analysis, Protein - methods
Source code
Species
Species comparisons
Strings
Title SPACE: STRING proteins as complementary embeddings
URI https://www.ncbi.nlm.nih.gov/pubmed/40924541
https://www.proquest.com/docview/3253137840
https://www.proquest.com/docview/3248446211
Volume 41
WOSCitedRecordID wos001575687900001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: Directory of Open Access Journals
  customDbUrl:
  eissn: 1367-4811
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0005056
  issn: 1367-4811
  databaseCode: DOA
  dateStart: 20230101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVASL
  databaseName: Oxford Open
  customDbUrl:
  eissn: 1367-4811
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0005056
  issn: 1367-4811
  databaseCode: TOX
  dateStart: 19850101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8NAEB5qUfDi-1GtJYInITTdbHaz3kqxeJBaaJXcwmYf0IOpNKnQf-9uHpUqgpprNslm9jH7Md98A3DDOaaMh9yAHC1dLImtBkgSNzHOQVNGqC5YlS-PdDQKo4iNG9Crc2G-hvCZ301m80pE1AoXd5Oca8ysyHYvCO3Mnj5Fn6QO48_rPOAfH91wQRtpbd9Ol4WXGe7_o38HsFcdKZ1-OQcOoaHSI9gpi0yujgFNxv3B_Z0zmVrig1MIM8zSzOGZU_DJS_r4YuWo10TJIhR1As_D--ngwa0qJbjCp2HuSkGFlFL7QnOUqCAkmiUEM2W8veQslJ6gCguDHSThBGFBtNUdQkpwzbmn_FNopvNUnYPDGBaB53FKDVIS5jXmYhohRandHnQLurXx4rdSECMuA9l-vGmJuLJEC26NjX_duF0PRVytpiz2kdkpzI9irwXX69tmHdjgBk_VfGnb4NBAW4NnW3BWDuH6kwbDIhzg3sVfenIJu8jW-i34ZG1o5ouluoJt8Z7PskUHtmgUdgoE3ymm4Aff1eDc
linkProvider Oxford University Press
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SPACE%3A+STRING+proteins+as+complementary+embeddings&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=Hu%2C+Dewei&rft.au=Szklarczyk%2C+Damian&rft.au=von+Mering%2C+Christian&rft.au=Jensen%2C+Lars+Juhl&rft.date=2025-09-01&rft.issn=1367-4811&rft.eissn=1367-4811&rft.volume=41&rft.issue=9&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtaf496&rft.externalDBID=n%2Fa&rft.externalDocID=10_1093_bioinformatics_btaf496
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4811&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4811&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4811&client=summon