SPACE: STRING proteins as complementary embeddings
Abstract Motivation Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but the use of protein networks has proven to be challenging in the context o...
Uloženo v:
| Vydáno v: | Bioinformatics (Oxford, England) Ročník 41; číslo 9 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
England
Oxford University Press
01.09.2025
Oxford Publishing Limited (England) |
| Témata: | |
| ISSN: | 1367-4811, 1367-4803, 1367-4811 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Abstract
Motivation
Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but the use of protein networks has proven to be challenging in the context of machine learning, especially in a cross-species setting.
Results
We leveraged the STRING database of protein networks and orthology relations for 1322 eukaryotes to generate network-based cross-species protein embeddings. We did this by first creating species-specific network embeddings and subsequently aligning them based on orthology relations to facilitate direct cross-species comparisons. We show that these aligned network embeddings ensure consistency across species without sacrificing quality compared to species-specific network embeddings. We also show that the aligned network embeddings are complementary to sequence embedding techniques, despite the use of sequence-based orthology relations in the alignment process. Finally, we validated the embeddings by using them for two well-established tasks: subcellular localization prediction and protein function prediction. Training logistic regression classifiers on aligned network embeddings and sequence embeddings improved the accuracy over using sequence alone, reaching performance numbers close to state-of-the-art deep-learning methods.
Availability and implementation
The source code and scripts for generating the network-based cross-species protein embeddings are available at https://github.com/deweihu96/SPACE. Precomputed network embeddings and sequence embeddings for all eukaryotic proteins are included in STRING version 12.0 (https://string-db.org/cgi/download). |
|---|---|
| AbstractList | Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but the use of protein networks has proven to be challenging in the context of machine learning, especially in a cross-species setting.
We leveraged the STRING database of protein networks and orthology relations for 1322 eukaryotes to generate network-based cross-species protein embeddings. We did this by first creating species-specific network embeddings and subsequently aligning them based on orthology relations to facilitate direct cross-species comparisons. We show that these aligned network embeddings ensure consistency across species without sacrificing quality compared to species-specific network embeddings. We also show that the aligned network embeddings are complementary to sequence embedding techniques, despite the use of sequence-based orthology relations in the alignment process. Finally, we validated the embeddings by using them for two well-established tasks: subcellular localization prediction and protein function prediction. Training logistic regression classifiers on aligned network embeddings and sequence embeddings improved the accuracy over using sequence alone, reaching performance numbers close to state-of-the-art deep-learning methods.
The source code and scripts for generating the network-based cross-species protein embeddings are available at https://github.com/deweihu96/SPACE. Precomputed network embeddings and sequence embeddings for all eukaryotic proteins are included in STRING version 12.0 (https://string-db.org/cgi/download). Motivation Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but the use of protein networks has proven to be challenging in the context of machine learning, especially in a cross-species setting. Results We leveraged the STRING database of protein networks and orthology relations for 1322 eukaryotes to generate network-based cross-species protein embeddings. We did this by first creating species-specific network embeddings and subsequently aligning them based on orthology relations to facilitate direct cross-species comparisons. We show that these aligned network embeddings ensure consistency across species without sacrificing quality compared to species-specific network embeddings. We also show that the aligned network embeddings are complementary to sequence embedding techniques, despite the use of sequence-based orthology relations in the alignment process. Finally, we validated the embeddings by using them for two well-established tasks: subcellular localization prediction and protein function prediction. Training logistic regression classifiers on aligned network embeddings and sequence embeddings improved the accuracy over using sequence alone, reaching performance numbers close to state-of-the-art deep-learning methods. Availability and implementation The source code and scripts for generating the network-based cross-species protein embeddings are available at https://github.com/deweihu96/SPACE. Precomputed network embeddings and sequence embeddings for all eukaryotic proteins are included in STRING version 12.0 (https://string-db.org/cgi/download). Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but the use of protein networks has proven to be challenging in the context of machine learning, especially in a cross-species setting.MOTIVATIONRepresentation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but the use of protein networks has proven to be challenging in the context of machine learning, especially in a cross-species setting.We leveraged the STRING database of protein networks and orthology relations for 1,322 eukaryotes to generate network-based cross-species protein embeddings. We did this by first creating species-specific network embeddings and subsequently aligning them based on orthology relations to facilitate direct cross-species comparisons. We show that these aligned network embeddings ensure consistency across species without sacrificing quality compared to species-specific network embeddings. We also show that the aligned network embeddings are complementary to sequence embedding techniques, despite the use of seqeuence-based orthology relations in the alignment process. Finally, we validated the embeddings by using them for two well-established tasks: subcellular localization prediction and protein function prediction. Training logistic regression classifiers on aligned network embeddings and sequence embeddings improved the accuracy over using sequence alone, reaching performance numbers close to state-of-the-art deep-learning methods.RESULTSWe leveraged the STRING database of protein networks and orthology relations for 1,322 eukaryotes to generate network-based cross-species protein embeddings. We did this by first creating species-specific network embeddings and subsequently aligning them based on orthology relations to facilitate direct cross-species comparisons. We show that these aligned network embeddings ensure consistency across species without sacrificing quality compared to species-specific network embeddings. We also show that the aligned network embeddings are complementary to sequence embedding techniques, despite the use of seqeuence-based orthology relations in the alignment process. Finally, we validated the embeddings by using them for two well-established tasks: subcellular localization prediction and protein function prediction. Training logistic regression classifiers on aligned network embeddings and sequence embeddings improved the accuracy over using sequence alone, reaching performance numbers close to state-of-the-art deep-learning methods.The source code and scripts for generating the network-based cross-species protein embeddings are available at https://github.com/deweihu96/SPACE. Precomputed network embeddings and sequence embeddings for all eukaryotic proteins are included in STRING version 12.0 (https://string-db.org/cgi/download).AVAILABILITY AND IMPLEMENTATIONThe source code and scripts for generating the network-based cross-species protein embeddings are available at https://github.com/deweihu96/SPACE. Precomputed network embeddings and sequence embeddings for all eukaryotic proteins are included in STRING version 12.0 (https://string-db.org/cgi/download).Supplementary data are available at Bioinformatics online.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online. Abstract Motivation Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but the use of protein networks has proven to be challenging in the context of machine learning, especially in a cross-species setting. Results We leveraged the STRING database of protein networks and orthology relations for 1322 eukaryotes to generate network-based cross-species protein embeddings. We did this by first creating species-specific network embeddings and subsequently aligning them based on orthology relations to facilitate direct cross-species comparisons. We show that these aligned network embeddings ensure consistency across species without sacrificing quality compared to species-specific network embeddings. We also show that the aligned network embeddings are complementary to sequence embedding techniques, despite the use of sequence-based orthology relations in the alignment process. Finally, we validated the embeddings by using them for two well-established tasks: subcellular localization prediction and protein function prediction. Training logistic regression classifiers on aligned network embeddings and sequence embeddings improved the accuracy over using sequence alone, reaching performance numbers close to state-of-the-art deep-learning methods. Availability and implementation The source code and scripts for generating the network-based cross-species protein embeddings are available at https://github.com/deweihu96/SPACE. Precomputed network embeddings and sequence embeddings for all eukaryotic proteins are included in STRING version 12.0 (https://string-db.org/cgi/download). |
| Author | Hu, Dewei Jensen, Lars Juhl Szklarczyk, Damian von Mering, Christian |
| Author_xml | – sequence: 1 givenname: Dewei orcidid: 0009-0005-5823-1498 surname: Hu fullname: Hu, Dewei email: larsjuhl.jensen@zs.com – sequence: 2 givenname: Damian surname: Szklarczyk fullname: Szklarczyk, Damian – sequence: 3 givenname: Christian orcidid: 0000-0001-7734-9102 surname: von Mering fullname: von Mering, Christian – sequence: 4 givenname: Lars Juhl surname: Jensen fullname: Jensen, Lars Juhl email: larsjuhl.jensen@zs.com |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/40924541$$D View this record in MEDLINE/PubMed |
| BookMark | eNqNkD1PwzAQhi1URD_gL1SRWFhC7cRxYraqKqVSBYiWOXLsM0qV2CFOhv57jFoQMDHdDc97eu8Zo4GxBhCaEnxLMI9nRWlLo21bi66UblZ0QlPOztCIxCwNaUbI4Mc-RGPn9hjjBCfsAg0p5hFNKBmhaPs8Xyzvgu3uZf24CprWdlAaFwgXSFs3FdRgOtEeAqgLUKo0b-4SnWtRObg6zQl6vV_uFg_h5mm1Xsw3oYzTrAuVTKVSSsdSi6iAJGOaF4xy4JgowTOFZQpU8pQqJlhEJdMZpiQCKbQQGOIJujne9aXee3BdXpdOQlUJA7Z3eRzRjFIWEeLR6z_o3vat8e08lcTEF6LYU9MT1Rc1qLxpy9q_ln_Z8AA7ArK1zrWgvxGC80_t-W_t-Um7D5Jj0PbNfzMfSsCLhg |
| Cites_doi | 10.1073/pnas.2016239118 10.1093/bioinformatics/btad529 10.1093/bioinformatics/btad047 10.1093/genetics/iyad031 10.1002/1873-3468.12307 10.1146/annurev-pharmtox-040323-040828 10.1093/nar/gkab398 10.1093/nar/gkz388 10.1038/75556 10.1016/j.gpb.2023.04.001 10.1093/nar/gkac278 10.1109/NAFOSTED.2017.8108071 10.1109/TCBB.2021.3080386 10.1016/j.bbamcr.2006.09.005 10.1038/s41598-022-21366-2 10.1371/journal.pcbi.1000807 10.1016/j.bbamcr.2008.10.016 10.1093/nargab/lqac043 10.18653/v1/P19-1018 10.1002/prot.25832 10.1038/nrm2378 10.1016/j.aiopen.2021.01.001 10.1109/TETCI.2019.2952908 10.1109/TPAMI.2021.3095381 10.1371/journal.pcbi.1011773 10.1016/S1672-0229(04)02027-3 10.1126/science.ade2574 10.1093/nar/gkac1000 10.1093/nar/gkac1022 10.1186/s12859-022-04873-x 10.1093/bioinformatics/btac020 10.1093/bib/bbac142 10.1016/j.websem.2022.100741 10.1145/3633518 10.1186/s12864-019-6413-7 10.1021/acscatal.3c02743 10.1093/bioinformatics/btu739 10.1038/s41467-022-32940-7 10.1016/S0962-8924(97)01197-5 10.2174/1574893609666140212000304 10.1093/nar/gkz132 10.1038/s41573-019-0024-5 10.1093/nar/gkaa970 |
| ContentType | Journal Article |
| Copyright | The Author(s) 2025. Published by Oxford University Press. 2025 The Author(s) 2025. Published by Oxford University Press. 2025 The Author(s) 2025. Published by Oxford University Press. This work is published under https://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| Copyright_xml | – notice: The Author(s) 2025. Published by Oxford University Press. 2025 – notice: The Author(s) 2025. Published by Oxford University Press. – notice: 2025 The Author(s) 2025. Published by Oxford University Press. This work is published under https://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| DBID | TOX AAYXX CITATION CGR CUY CVF ECM EIF NPM 7QF 7QO 7QQ 7SC 7SE 7SP 7SR 7TA 7TB 7TM 7TO 7U5 8BQ 8FD F28 FR3 H8D H8G H94 JG9 JQ2 K9. KR7 L7M L~C L~D P64 7X8 |
| DOI | 10.1093/bioinformatics/btaf496 |
| DatabaseName | Oxford Journals Open Access Collection CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Aluminium Industry Abstracts Biotechnology Research Abstracts Ceramic Abstracts Computer and Information Systems Abstracts Corrosion Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts Materials Business File Mechanical & Transportation Engineering Abstracts Nucleic Acids Abstracts Oncogenes and Growth Factors Abstracts Solid State and Superconductivity Abstracts METADEX Technology Research Database ANTE: Abstracts in New Technology & Engineering Engineering Research Database Aerospace Database Copper Technical Reference Library AIDS and Cancer Research Abstracts Materials Research Database ProQuest Computer Science Collection ProQuest Health & Medical Complete (Alumni) Civil Engineering Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Biotechnology and BioEngineering Abstracts MEDLINE - Academic |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Materials Research Database Oncogenes and Growth Factors Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts Nucleic Acids Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Health & Medical Complete (Alumni) Materials Business File Aerospace Database Copper Technical Reference Library Engineered Materials Abstracts Biotechnology Research Abstracts AIDS and Cancer Research Abstracts Advanced Technologies Database with Aerospace ANTE: Abstracts in New Technology & Engineering Civil Engineering Abstracts Aluminium Industry Abstracts Electronics & Communications Abstracts Ceramic Abstracts METADEX Biotechnology and BioEngineering Abstracts Computer and Information Systems Abstracts Professional Solid State and Superconductivity Abstracts Engineering Research Database Corrosion Abstracts MEDLINE - Academic |
| DatabaseTitleList | MEDLINE Materials Research Database MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: TOX name: Oxford Open url: https://academic.oup.com/journals/ sourceTypes: Publisher – sequence: 3 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Biology |
| EISSN | 1367-4811 |
| ExternalDocumentID | 40924541 10_1093_bioinformatics_btaf496 10.1093/bioinformatics/btaf496 |
| Genre | Journal Article |
| GrantInformation_xml | – fundername: Novo Nordisk Foundation grantid: NNF20SA0035590 – fundername: Swiss Institute of Bioinformatics – fundername: Novo Nordisk Foundation grantid: NNF14CC0001 – fundername: Novo Nordisk Foundation |
| GroupedDBID | --- -E4 -~X .-4 .2P .DC .GJ .I3 0R~ 1TH 23N 2WC 4.4 48X 53G 5GY 5WA 70D AAIJN AAIMJ AAJKP AAJQQ AAKPC AAMDB AAMVS AAOGV AAPQZ AAPXW AAUQX AAVAP AAVLN ABEFU ABEJV ABEUO ABGNP ABIXL ABNGD ABNKS ABPQP ABPTD ABQLI ABWST ABXVV ABZBJ ACGFS ACIWK ACPRK ACUFI ACUKT ACUXJ ACYTK ADBBV ADEYI ADEZT ADFTL ADGKP ADGZP ADHKW ADHZD ADMLS ADOCK ADPDF ADRDM ADRTK ADVEK ADYVW ADZTZ ADZXQ AECKG AEGPL AEJOX AEKKA AEKSI AELWJ AEMDU AENEX AENZO AEPUE AETBJ AEWNT AFFNX AFFZL AFGWE AFIYH AFOFC AFRAH AGINJ AGKEF AGQPQ AGQXC AGSYK AHMBA AHXPO AI. AIJHB AJEEA AJEUX AKHUL AKWXX ALMA_UNASSIGNED_HOLDINGS ALTZX ALUQC AMNDL APIBT APWMN AQDSO ARIXL ASPBG ATTQO AVWKF AXUDD AYOIW AZFZN AZVOD BAWUL BAYMD BHONS BQDIO BQUQU BSWAC BTQHN C1A C45 CAG CDBKE COF CS3 CZ4 DAKXR DIK DILTD DU5 D~K EBD EBS EE~ EJD ELUNK EMOBN F5P F9B FEDTE FHSFR FLIZI FLUFQ FOEOM FQBLK GAUVT GJXCC GROUPED_DOAJ GX1 H13 H5~ HAR HVGLF HW0 HZ~ IOX J21 JXSIZ KAQDR KOP KQ8 KSI KSN M-Z MK~ ML0 N9A NGC NLBLG NMDNZ NOMLY NTWIH NU- NVLIB O0~ O9- OAWHX ODMLO OJQWA OK1 OVD OVEED O~Y P2P PAFKI PB- PEELM PQQKQ Q1. Q5Y R44 RD5 RNI RNS ROL ROX RPM RUSNO RW1 RXO RZF RZO SV3 TEORI TJP TLC TOX TR2 VH1 W8F WOQ X7H YAYTL YKOAZ YXANX ZGI ZKX ~91 ~KM AAYXX CITATION CGR CUY CVF ECM EIF NPM 7QF 7QO 7QQ 7SC 7SE 7SP 7SR 7TA 7TB 7TM 7TO 7U5 8BQ 8FD F28 FR3 H8D H8G H94 JG9 JQ2 K9. KR7 L7M L~C L~D P64 7X8 |
| ID | FETCH-LOGICAL-c378t-dc7cdddf3cfa2be586f9b649e901da98d0c7e4c974d6a624c6f80412ecafaa0e3 |
| IEDL.DBID | TOX |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001575687900001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1367-4811 1367-4803 |
| IngestDate | Thu Sep 11 00:06:07 EDT 2025 Wed Nov 26 12:51:51 EST 2025 Thu Sep 25 01:51:33 EDT 2025 Sat Nov 29 07:29:10 EST 2025 Mon Dec 01 07:41:35 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 9 |
| Language | English |
| License | This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. https://creativecommons.org/licenses/by/4.0 The Author(s) 2025. Published by Oxford University Press. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c378t-dc7cdddf3cfa2be586f9b649e901da98d0c7e4c974d6a624c6f80412ecafaa0e3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ORCID | 0009-0005-5823-1498 0000-0001-7734-9102 |
| OpenAccessLink | https://dx.doi.org/10.1093/bioinformatics/btaf496 |
| PMID | 40924541 |
| PQID | 3253137840 |
| PQPubID | 36124 |
| ParticipantIDs | proquest_miscellaneous_3248446211 proquest_journals_3253137840 pubmed_primary_40924541 crossref_primary_10_1093_bioinformatics_btaf496 oup_primary_10_1093_bioinformatics_btaf496 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-09-01 |
| PublicationDateYYYYMMDD | 2025-09-01 |
| PublicationDate_xml | – month: 09 year: 2025 text: 2025-09-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | England |
| PublicationPlace_xml | – name: England – name: Oxford |
| PublicationTitle | Bioinformatics (Oxford, England) |
| PublicationTitleAlternate | Bioinformatics |
| PublicationYear | 2025 |
| Publisher | Oxford University Press Oxford Publishing Limited (England) |
| Publisher_xml | – name: Oxford University Press – name: Oxford Publishing Limited (England) |
| References | Saleem (2025092219523514700_btaf496-B44) 2006; 1763 Pokharel (2025092219523514700_btaf496-B40) 2022; 12 Suzek (2025092219523514700_btaf496-B45) 2015; 31 Perozzi (2025092219523514700_btaf496-B39) 2014 De Matteis (2025092219523514700_btaf496-B11) 2008; 9 Mikolov (2025092219523514700_btaf496-B35) 2013 Li (2025092219523514700_btaf496-B29) 2023; 39 Dubey (2025092219523514700_btaf496-B14) 2011; 3 Aleksander (2025092219523514700_btaf496-B1) 2023; 224 Mancuso (2025092219523514700_btaf496-B32) 2024; 20 Le (2025092219523514700_btaf496-B28) 2017 Baumgartner (2025092219523514700_btaf496-B3) 2023; 75 Rives (2025092219523514700_btaf496-B42) 2021; 118 Joulin (2025092219523514700_btaf496-B22) 2018 Lin (2025092219523514700_btaf496-B30) 2023; 379 Vendruscolo (2025092219523514700_btaf496-B49) 2022; 13 Grover (2025092219523514700_btaf496-B17) 2016 Ashburner (2025092219523514700_btaf496-B2) 2000; 25 Fan (2025092219523514700_btaf496-B16) 2019; 47 Braulke (2025092219523514700_btaf496-B7) 2009; 1793 Kalinowski (2025092219523514700_btaf496-B23) 2020 Bernhofer (2025092219523514700_btaf496-B4) 2022; 23 Elnaggar (2025092219523514700_btaf496-B15) 2022; 44 Thumuluri (2025092219523514700_btaf496-B47) 2022; 50 Khoshraftar (2025092219523514700_btaf496-B25) 2024; 15 Chu (2025092219523514700_btaf496-B9) 2019 Kanehisa (2025092219523514700_btaf496-B24) 2021; 49 Kouba (2025092219523514700_btaf496-B27) 2023; 13 Bonetta (2025092219523514700_btaf496-B5) 2020; 88 De Las Rivas (2025092219523514700_btaf496-B10) 2010; 6 Patra (2025092219523514700_btaf496-B38) 2019 Szklarczyk (2025092219523514700_btaf496-B46) 2023; 51 Yuan (2025092219523514700_btaf496-B57) 2024 Raffel (2025092219523514700_btaf496-B41) 2020; 21 Yao (2025092219523514700_btaf496-B55) 2021; 49 Hasselgren (2025092219523514700_btaf496-B18) 2024; 64 Du (2025092219523514700_btaf496-B13) 2019 Chicco (2025092219523514700_btaf496-B8) 2020; 21 Kipf (2025092219523514700_btaf496-B26) 2016 Pan (2025092219523514700_btaf496-B37) 2022; 19 Vamathevan (2025092219523514700_btaf496-B48) 2019; 18 Villegas-Morcillo (2025092219523514700_btaf496-B50) 2022; 23 Dönnes (2025092219523514700_btaf496-B12) 2004; 2 Liu (2025092219523514700_btaf496-B31) 2023; 39 Martins (2025092219523514700_btaf496-B33) 2023 Zhou (2025092219523514700_btaf496-B58) 2020; 1 Heinzinger (2025092219523514700_btaf496-B20) 2022; 4 Brandes (2025092219523514700_btaf496-B6) 2022; 38 Wang (2025092219523514700_btaf496-B52) 2014; 9 Xia (2025092219523514700_btaf496-B54) 2020; 4 Heimann (2025092219523514700_btaf496-B19) 2018 Wang (2025092219523514700_btaf496-B51) 2023; 21 Hernández-Plaza (2025092219523514700_btaf496-B21) 2023; 51 Munro (2025092219523514700_btaf496-B36) 1998; 8 Rost (2025092219523514700_btaf496-B43) 2016; 590 You (2025092219523514700_btaf496-B56) 2019; 47 |
| References_xml | – volume: 118 start-page: e2016239118 year: 2021 ident: 2025092219523514700_btaf496-B42 article-title: Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences publication-title: Proc Natl Acad Sci USA doi: 10.1073/pnas.2016239118 – volume: 39 start-page: btad529 year: 2023 ident: 2025092219523514700_btaf496-B29 article-title: Joint embedding of biological networks for cross-species functional alignment publication-title: Bioinformatics doi: 10.1093/bioinformatics/btad529 – volume: 39 start-page: btad047 year: 2023 ident: 2025092219523514700_btaf496-B31 article-title: Accurately modeling biased random walks on weighted networks using node2vec publication-title: Bioinformatics doi: 10.1093/bioinformatics/btad047 – volume: 224 start-page: iyad031 year: 2023 ident: 2025092219523514700_btaf496-B1 article-title: The gene ontology knowledgebase in 2023 publication-title: Genetics doi: 10.1093/genetics/iyad031 – volume: 590 start-page: 2327 year: 2016 ident: 2025092219523514700_btaf496-B43 article-title: Protein function in precision medicine: deep understanding with machine learning publication-title: FEBS Lett doi: 10.1002/1873-3468.12307 – year: 2016 ident: 2025092219523514700_btaf496-B17 – volume: 64 start-page: 527 year: 2024 ident: 2025092219523514700_btaf496-B18 article-title: Artificial intelligence for drug discovery: are we there yet? publication-title: Annu Rev Pharmacol Toxicol doi: 10.1146/annurev-pharmtox-040323-040828 – volume: 49 start-page: W469 year: 2021 ident: 2025092219523514700_btaf496-B55 article-title: Netgo 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information publication-title: Nucleic Acids Res doi: 10.1093/nar/gkab398 – volume: 47 start-page: W379 year: 2019 ident: 2025092219523514700_btaf496-B56 article-title: Netgo: improving large-scale protein function prediction with massive network information publication-title: Nucleic Acids Res doi: 10.1093/nar/gkz388 – volume: 25 start-page: 25 year: 2000 ident: 2025092219523514700_btaf496-B2 article-title: Gene ontology: tool for the unification of biology publication-title: Nat Genet doi: 10.1038/75556 – volume: 21 start-page: 349 year: 2023 ident: 2025092219523514700_btaf496-B51 article-title: Netgo 3.0: protein language model improves large-scale functional annotations publication-title: Genom Proteom Bioinform doi: 10.1016/j.gpb.2023.04.001 – volume: 50 start-page: W228 year: 2022 ident: 2025092219523514700_btaf496-B47 article-title: Deeploc 2.0: multi-label subcellular localization prediction using protein language models publication-title: Nucleic Acids Res doi: 10.1093/nar/gkac278 – start-page: 117 year: 2018 ident: 2025092219523514700_btaf496-B19 – start-page: 242 volume-title: 2017 4th NAFOSTED Conference on Information and Computer Science year: 2017 ident: 2025092219523514700_btaf496-B28 doi: 10.1109/NAFOSTED.2017.8108071 – volume: 19 start-page: 666 year: 2022 ident: 2025092219523514700_btaf496-B37 article-title: Identifying protein subcellular locations with embeddings-based node2loc publication-title: IEEE/ACM Trans Comput Biol Bioinform doi: 10.1109/TCBB.2021.3080386 – year: 2020 ident: 2025092219523514700_btaf496-B23 – volume: 1763 start-page: 1541 year: 2006 ident: 2025092219523514700_btaf496-B44 article-title: Proteomics of the peroxisome publication-title: Biochim Biophys Acta doi: 10.1016/j.bbamcr.2006.09.005 – volume: 12 start-page: 16933 year: 2022 ident: 2025092219523514700_btaf496-B40 article-title: Improving protein succinylation sites prediction using embeddings from protein language model publication-title: Sci Rep doi: 10.1038/s41598-022-21366-2 – volume: 6 start-page: e1000807 year: 2010 ident: 2025092219523514700_btaf496-B10 article-title: Protein–protein interactions essentials: key concepts to building and analyzing interactome networks publication-title: PLoS Comput Biol doi: 10.1371/journal.pcbi.1000807 – volume: 1793 start-page: 605 year: 2009 ident: 2025092219523514700_btaf496-B7 article-title: Sorting of lysosomal proteins publication-title: Biochim Biophys Acta doi: 10.1016/j.bbamcr.2008.10.016 – volume: 4 start-page: lqac043 year: 2022 ident: 2025092219523514700_btaf496-B20 article-title: Contrastive learning on protein embeddings enlightens midnight zone publication-title: NAR Genom Bioinform doi: 10.1093/nargab/lqac043 – start-page: 184 volume-title: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics year: 2019 ident: 2025092219523514700_btaf496-B38 doi: 10.18653/v1/P19-1018 – volume: 88 start-page: 397 year: 2020 ident: 2025092219523514700_btaf496-B5 article-title: Machine learning techniques for protein function prediction publication-title: Proteins: Struct Funct Bioinform doi: 10.1002/prot.25832 – volume: 9 start-page: 273 year: 2008 ident: 2025092219523514700_btaf496-B11 article-title: Exiting the Golgi complex publication-title: Nat Rev Mol Cell Biol doi: 10.1038/nrm2378 – volume: 1 start-page: 57 year: 2020 ident: 2025092219523514700_btaf496-B58 article-title: Graph neural networks: a review of methods and applications publication-title: AI Open doi: 10.1016/j.aiopen.2021.01.001 – start-page: 479 year: 2019 ident: 2025092219523514700_btaf496-B13 – start-page: 273 year: 2019 ident: 2025092219523514700_btaf496-B9 – volume: 4 start-page: 95 year: 2020 ident: 2025092219523514700_btaf496-B54 article-title: Random walks: a review of algorithms and applications publication-title: IEEE Trans Emerg Top Comput Intell doi: 10.1109/TETCI.2019.2952908 – year: 2013 ident: 2025092219523514700_btaf496-B35 – year: 2024 ident: 2025092219523514700_btaf496-B57 – volume: 44 start-page: 7112 year: 2022 ident: 2025092219523514700_btaf496-B15 article-title: ProtTrans: toward understanding the language of life through self-supervised learning publication-title: IEEE Trans Pattern Anal Mach Intell doi: 10.1109/TPAMI.2021.3095381 – volume: 20 start-page: e1011773 year: 2024 ident: 2025092219523514700_btaf496-B32 article-title: Joint representation of molecular networks from multiple species improves gene classification publication-title: PLoS Comput Biol doi: 10.1371/journal.pcbi.1011773 – volume: 2 start-page: 209 year: 2004 ident: 2025092219523514700_btaf496-B12 article-title: Predicting protein subcellular localization: past, present, and future publication-title: Genom Proteom Bioinform doi: 10.1016/S1672-0229(04)02027-3 – volume: 379 start-page: 1123 year: 2023 ident: 2025092219523514700_btaf496-B30 article-title: Evolutionary-scale prediction of atomic-level protein structure with a language model publication-title: Science doi: 10.1126/science.ade2574 – start-page: 701 year: 2014 ident: 2025092219523514700_btaf496-B39 – volume: 51 start-page: D638 year: 2023 ident: 2025092219523514700_btaf496-B46 article-title: The string database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest publication-title: Nucleic Acids Res doi: 10.1093/nar/gkac1000 – volume: 3 start-page: 392 year: 2011 ident: 2025092219523514700_btaf496-B14 article-title: Subcellular localization of proteins publication-title: Arch Appl Sci Res – year: 2018 ident: 2025092219523514700_btaf496-B22 – volume: 21 start-page: 1 year: 2020 ident: 2025092219523514700_btaf496-B41 article-title: Exploring the limits of transfer learning with a unified text-to-text transformer publication-title: J Mach Learn Res – volume: 51 start-page: D389 year: 2023 ident: 2025092219523514700_btaf496-B21 article-title: eggnog 6.0: enabling comparative genomics across 12 535 organisms publication-title: Nucleic Acids Res doi: 10.1093/nar/gkac1022 – volume: 23 start-page: 326 year: 2022 ident: 2025092219523514700_btaf496-B4 article-title: TMbed: transmembrane proteins predicted through language model embeddings publication-title: BMC Bioinformatics doi: 10.1186/s12859-022-04873-x – volume: 38 start-page: 2102 year: 2022 ident: 2025092219523514700_btaf496-B6 article-title: ProteinBERT: a universal deep-learning model of protein sequence and function publication-title: Bioinformatics doi: 10.1093/bioinformatics/btac020 – volume: 23 start-page: bbac142 year: 2022 ident: 2025092219523514700_btaf496-B50 article-title: An analysis of protein language model embeddings for fold prediction publication-title: Brief Bioinform doi: 10.1093/bib/bbac142 – year: 2016 ident: 2025092219523514700_btaf496-B26 – volume: 75 start-page: 100741 year: 2023 ident: 2025092219523514700_btaf496-B3 article-title: Towards the web of embeddings: integrating multiple knowledge graph embedding spaces with FedCoder publication-title: J Web Semant doi: 10.1016/j.websem.2022.100741 – volume: 15 start-page: 1 year: 2024 ident: 2025092219523514700_btaf496-B25 article-title: A survey on graph representation learning methods publication-title: ACM Trans Intell Syst Technol doi: 10.1145/3633518 – volume: 21 start-page: 6 year: 2020 ident: 2025092219523514700_btaf496-B8 article-title: The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation publication-title: BMC Genomics doi: 10.1186/s12864-019-6413-7 – volume: 13 start-page: 13863 year: 2023 ident: 2025092219523514700_btaf496-B27 article-title: Machine learning-guided protein engineering publication-title: ACS Catal doi: 10.1021/acscatal.3c02743 – volume: 31 start-page: 926 year: 2015 ident: 2025092219523514700_btaf496-B45 article-title: Uniref clusters: a comprehensive and scalable alternative for improving sequence similarity searches publication-title: Bioinformatics doi: 10.1093/bioinformatics/btu739 – year: 2023 ident: 2025092219523514700_btaf496-B33 – volume: 13 start-page: 5550 year: 2022 ident: 2025092219523514700_btaf496-B49 article-title: Protein condensation diseases: therapeutic opportunities publication-title: Nat Commun doi: 10.1038/s41467-022-32940-7 – volume: 8 start-page: 11 year: 1998 ident: 2025092219523514700_btaf496-B36 article-title: Localization of proteins to the Golgi apparatus publication-title: Trends Cell Biol doi: 10.1016/S0962-8924(97)01197-5 – volume: 9 start-page: 331 year: 2014 ident: 2025092219523514700_btaf496-B52 article-title: Review of protein subcellular localization prediction publication-title: CBIO doi: 10.2174/1574893609666140212000304 – volume: 47 start-page: e51 year: 2019 ident: 2025092219523514700_btaf496-B16 article-title: Functional protein representations from biological networks enable diverse cross-species inference publication-title: Nucleic Acids Res doi: 10.1093/nar/gkz132 – volume: 18 start-page: 463 year: 2019 ident: 2025092219523514700_btaf496-B48 article-title: Applications of machine learning in drug discovery and development publication-title: Nat Rev Drug Discov doi: 10.1038/s41573-019-0024-5 – volume: 49 start-page: D545 year: 2021 ident: 2025092219523514700_btaf496-B24 article-title: KEGG: integrating viruses and cellular organisms publication-title: Nucleic Acids Res doi: 10.1093/nar/gkaa970 |
| SSID | ssj0005056 |
| Score | 2.4889717 |
| Snippet | Abstract
Motivation
Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are... Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source... Motivation Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an... |
| SourceID | proquest pubmed crossref oup |
| SourceType | Aggregation Database Index Database Publisher |
| SubjectTerms | Amino acid sequence Availability Computational Biology - methods Databases, Protein Deep Learning Embedding Eukaryotes Humans Localization Machine Learning Networks Orthology Predictions Proteins Proteins - chemistry Proteins - metabolism Sequence Analysis, Protein - methods Source code Species Species comparisons Strings |
| Title | SPACE: STRING proteins as complementary embeddings |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/40924541 https://www.proquest.com/docview/3253137840 https://www.proquest.com/docview/3248446211 |
| Volume | 41 |
| WOSCitedRecordID | wos001575687900001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: Directory of Open Access Journals customDbUrl: eissn: 1367-4811 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0005056 issn: 1367-4811 databaseCode: DOA dateStart: 20230101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVASL databaseName: Oxford Open customDbUrl: eissn: 1367-4811 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0005056 issn: 1367-4811 databaseCode: TOX dateStart: 19850101 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8NAEB5qUfDi-1GtJYInITTdbHaz3kqxeJBaaJXcwmYf0IOpNKnQf-9uHpUqgpprNslm9jH7Md98A3DDOaaMh9yAHC1dLImtBkgSNzHOQVNGqC5YlS-PdDQKo4iNG9Crc2G-hvCZ301m80pE1AoXd5Oca8ysyHYvCO3Mnj5Fn6QO48_rPOAfH91wQRtpbd9Ol4WXGe7_o38HsFcdKZ1-OQcOoaHSI9gpi0yujgFNxv3B_Z0zmVrig1MIM8zSzOGZU_DJS_r4YuWo10TJIhR1As_D--ngwa0qJbjCp2HuSkGFlFL7QnOUqCAkmiUEM2W8veQslJ6gCguDHSThBGFBtNUdQkpwzbmn_FNopvNUnYPDGBaB53FKDVIS5jXmYhohRandHnQLurXx4rdSECMuA9l-vGmJuLJEC26NjX_duF0PRVytpiz2kdkpzI9irwXX69tmHdjgBk_VfGnb4NBAW4NnW3BWDuH6kwbDIhzg3sVfenIJu8jW-i34ZG1o5ouluoJt8Z7PskUHtmgUdgoE3ymm4Aff1eDc |
| linkProvider | Oxford University Press |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SPACE%3A+STRING+proteins+as+complementary+embeddings&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=Hu%2C+Dewei&rft.au=Szklarczyk%2C+Damian&rft.au=von+Mering%2C+Christian&rft.au=Jensen%2C+Lars+Juhl&rft.date=2025-09-01&rft.issn=1367-4811&rft.eissn=1367-4811&rft.volume=41&rft.issue=9&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtaf496&rft.externalDBID=n%2Fa&rft.externalDocID=10_1093_bioinformatics_btaf496 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4811&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4811&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4811&client=summon |