ProteinBERT: a universal deep-learning model of protein sequence and function
Abstract Summary Self-supervised deep language modeling has shown unprecedented success across natural language tasks, and has recently been repurposed to biological sequences. However, existing models and pretraining methods are designed and optimized for text analysis. We introduce ProteinBERT, a...
Saved in:
| Published in: | Bioinformatics Vol. 38; no. 8; pp. 2102 - 2110 |
|---|---|
| Main Authors: | , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
England
Oxford University Press
12.04.2022
|
| Subjects: | |
| ISSN: | 1367-4803, 1367-4811, 1460-2059, 1367-4811 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Abstract
Summary
Self-supervised deep language modeling has shown unprecedented success across natural language tasks, and has recently been repurposed to biological sequences. However, existing models and pretraining methods are designed and optimized for text analysis. We introduce ProteinBERT, a deep language model specifically designed for proteins. Our pretraining scheme combines language modeling with a novel task of Gene Ontology (GO) annotation prediction. We introduce novel architectural elements that make the model highly efficient and flexible to long sequences. The architecture of ProteinBERT consists of both local and global representations, allowing end-to-end processing of these types of inputs and outputs. ProteinBERT obtains near state-of-the-art performance, and sometimes exceeds it, on multiple benchmarks covering diverse protein properties (including protein structure, post-translational modifications and biophysical attributes), despite using a far smaller and faster model than competing deep-learning methods. Overall, ProteinBERT provides an efficient framework for rapidly training protein predictors, even with limited labeled data.
Availability and implementation
Code and pretrained model weights are available at https://github.com/nadavbra/protein_bert.
Supplementary information
Supplementary data are available at Bioinformatics online. |
|---|---|
| AbstractList | Self-supervised deep language modeling has shown unprecedented success across natural language tasks, and has recently been repurposed to biological sequences. However, existing models and pretraining methods are designed and optimized for text analysis. We introduce ProteinBERT, a deep language model specifically designed for proteins. Our pretraining scheme combines language modeling with a novel task of Gene Ontology (GO) annotation prediction. We introduce novel architectural elements that make the model highly efficient and flexible to long sequences. The architecture of ProteinBERT consists of both local and global representations, allowing end-to-end processing of these types of inputs and outputs. ProteinBERT obtains near state-of-the-art performance, and sometimes exceeds it, on multiple benchmarks covering diverse protein properties (including protein structure, post-translational modifications and biophysical attributes), despite using a far smaller and faster model than competing deep-learning methods. Overall, ProteinBERT provides an efficient framework for rapidly training protein predictors, even with limited labeled data.
Code and pretrained model weights are available at https://github.com/nadavbra/protein_bert.
Supplementary data are available at Bioinformatics online. Self-supervised deep language modeling has shown unprecedented success across natural language tasks, and has recently been repurposed to biological sequences. However, existing models and pretraining methods are designed and optimized for text analysis. We introduce ProteinBERT, a deep language model specifically designed for proteins. Our pretraining scheme combines language modeling with a novel task of Gene Ontology (GO) annotation prediction. We introduce novel architectural elements that make the model highly efficient and flexible to long sequences. The architecture of ProteinBERT consists of both local and global representations, allowing end-to-end processing of these types of inputs and outputs. ProteinBERT obtains near state-of-the-art performance, and sometimes exceeds it, on multiple benchmarks covering diverse protein properties (including protein structure, post-translational modifications and biophysical attributes), despite using a far smaller and faster model than competing deep-learning methods. Overall, ProteinBERT provides an efficient framework for rapidly training protein predictors, even with limited labeled data.SUMMARYSelf-supervised deep language modeling has shown unprecedented success across natural language tasks, and has recently been repurposed to biological sequences. However, existing models and pretraining methods are designed and optimized for text analysis. We introduce ProteinBERT, a deep language model specifically designed for proteins. Our pretraining scheme combines language modeling with a novel task of Gene Ontology (GO) annotation prediction. We introduce novel architectural elements that make the model highly efficient and flexible to long sequences. The architecture of ProteinBERT consists of both local and global representations, allowing end-to-end processing of these types of inputs and outputs. ProteinBERT obtains near state-of-the-art performance, and sometimes exceeds it, on multiple benchmarks covering diverse protein properties (including protein structure, post-translational modifications and biophysical attributes), despite using a far smaller and faster model than competing deep-learning methods. Overall, ProteinBERT provides an efficient framework for rapidly training protein predictors, even with limited labeled data.Code and pretrained model weights are available at https://github.com/nadavbra/protein_bert.AVAILABILITY AND IMPLEMENTATIONCode and pretrained model weights are available at https://github.com/nadavbra/protein_bert.Supplementary data are available at Bioinformatics online.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online. Abstract Summary Self-supervised deep language modeling has shown unprecedented success across natural language tasks, and has recently been repurposed to biological sequences. However, existing models and pretraining methods are designed and optimized for text analysis. We introduce ProteinBERT, a deep language model specifically designed for proteins. Our pretraining scheme combines language modeling with a novel task of Gene Ontology (GO) annotation prediction. We introduce novel architectural elements that make the model highly efficient and flexible to long sequences. The architecture of ProteinBERT consists of both local and global representations, allowing end-to-end processing of these types of inputs and outputs. ProteinBERT obtains near state-of-the-art performance, and sometimes exceeds it, on multiple benchmarks covering diverse protein properties (including protein structure, post-translational modifications and biophysical attributes), despite using a far smaller and faster model than competing deep-learning methods. Overall, ProteinBERT provides an efficient framework for rapidly training protein predictors, even with limited labeled data. Availability and implementation Code and pretrained model weights are available at https://github.com/nadavbra/protein_bert. Supplementary information Supplementary data are available at Bioinformatics online. |
| Author | Linial, Michal Brandes, Nadav Rappoport, Nadav Ofer, Dan Peleg, Yam |
| AuthorAffiliation | 4 Department of Software and Information Systems Engineering, Faculty of Engineering Sciences, Ben-Gurion University of the Negev , Beer Sheva 8410501, Israel 1 School of Computer Science and Engineering, The Hebrew University of Jerusalem , Jerusalem 9190401, Israel 3 Deep Trading Ltd. , Haifa 3508401, Israel 2 Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel |
| AuthorAffiliation_xml | – name: 3 Deep Trading Ltd. , Haifa 3508401, Israel – name: 4 Department of Software and Information Systems Engineering, Faculty of Engineering Sciences, Ben-Gurion University of the Negev , Beer Sheva 8410501, Israel – name: 1 School of Computer Science and Engineering, The Hebrew University of Jerusalem , Jerusalem 9190401, Israel – name: 2 Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel |
| Author_xml | – sequence: 1 givenname: Nadav orcidid: 0000-0002-0510-2546 surname: Brandes fullname: Brandes, Nadav email: nadav.brandes@mail.huji.ac.il organization: School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel – sequence: 2 givenname: Dan surname: Ofer fullname: Ofer, Dan organization: Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel – sequence: 3 givenname: Yam surname: Peleg fullname: Peleg, Yam organization: Deep Trading Ltd., Haifa 3508401, Israel – sequence: 4 givenname: Nadav orcidid: 0000-0002-7218-2558 surname: Rappoport fullname: Rappoport, Nadav organization: Department of Software and Information Systems Engineering, Faculty of Engineering Sciences, Ben-Gurion University of the Negev, Beer Sheva 8410501, Israel – sequence: 5 givenname: Michal orcidid: 0000-0002-9357-4526 surname: Linial fullname: Linial, Michal organization: Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/35020807$$D View this record in MEDLINE/PubMed |
| BookMark | eNqNkdtqFjEUhYNU7EFfoeTSm7E5TSYjIthSD1BRpF6HPZmdGplJxmSm4Nub8v8V641eJbC_tRZ7r2NyEFNEQk45e8FZL8-GkEL0Kc-wBlfOhhUcE-wROeJKs0awtj-of6m7RhkmD8lxKd8Za7lS6gk5lG2FDeuOyMfPOa0Y4vnll-uXFOgWwy3mAhMdEZdmQsgxxBs6pxEnmjxddjwt-GPD6JBCHKnfoltDik_JYw9TwWf794R8fXt5ffG-ufr07sPFm6vGtUKsTTcIDgO4gQPXIFFq7rlQDECORjDPnemUMv3oTS_9aLxWlcFRG9fWkZAn5PXOd9mGGUeHcc0w2SWHGfJPmyDYh5MYvtmbdGt7aXQnumrwfG-QU92jrHYOxeE0QcS0FSs071vV9lpW9PTPrN8h9zeswKsd4HIqJaO3Lqxwd44aHSbLmb2rzD6szO4rq3L9l_w-4Z9CvhOmbflfzS8gzLZU |
| CitedBy_id | crossref_primary_10_1002_pro_4497 crossref_primary_10_1021_acs_jcim_5c01401 crossref_primary_10_1002_pro_70200 crossref_primary_10_1002_pro_5221 crossref_primary_10_1016_j_fbio_2024_105358 crossref_primary_10_7717_peerj_13613 crossref_primary_10_1016_j_ijbiomac_2024_136172 crossref_primary_10_1093_intimm_dxaf048 crossref_primary_10_3390_molecules29040903 crossref_primary_10_1073_pnas_2320510121 crossref_primary_10_1038_s41598_023_40247_w crossref_primary_10_1016_j_csbj_2025_03_037 crossref_primary_10_1016_j_compbiomed_2024_108669 crossref_primary_10_1016_j_csbj_2025_03_038 crossref_primary_10_3390_sym16040464 crossref_primary_10_1371_journal_pone_0329174 crossref_primary_10_1021_acs_jproteome_4c00884 crossref_primary_10_1038_s41598_025_08510_4 crossref_primary_10_1093_nar_gkac278 crossref_primary_10_1186_s12911_024_02600_5 crossref_primary_10_1002_advs_202301011 crossref_primary_10_1186_s12859_025_06220_2 crossref_primary_10_1038_s41592_025_02656_9 crossref_primary_10_1038_s41598_024_72512_x crossref_primary_10_1186_s13040_025_00457_6 crossref_primary_10_1016_j_eswa_2025_126859 crossref_primary_10_1016_j_ab_2024_115637 crossref_primary_10_1007_s00299_024_03294_9 crossref_primary_10_3390_biom15060843 crossref_primary_10_1016_j_compbiomed_2024_108316 crossref_primary_10_1093_nargab_lqad087 crossref_primary_10_3390_ijms252413368 crossref_primary_10_3390_molecules29245923 crossref_primary_10_1038_s42256_023_00637_1 crossref_primary_10_1093_bioinformatics_btae445 crossref_primary_10_1093_bioinformatics_btae330 crossref_primary_10_1093_bib_bbad534 crossref_primary_10_1093_bioadv_vbaf192 crossref_primary_10_1016_j_compbiolchem_2025_108609 crossref_primary_10_1016_j_sbi_2022_102518 crossref_primary_10_1101_gr_280149_124 crossref_primary_10_1016_j_eswa_2024_124168 crossref_primary_10_1016_j_tibs_2022_11_001 crossref_primary_10_1002_qub2_70013 crossref_primary_10_1093_bib_bbae420 crossref_primary_10_1063_4_0000271 crossref_primary_10_1126_science_adr6006 crossref_primary_10_3389_fimmu_2022_858057 crossref_primary_10_1093_nar_gkad578 crossref_primary_10_1088_2632_2153_ad3ee4 crossref_primary_10_2196_59505 crossref_primary_10_1038_s41467_025_62318_4 crossref_primary_10_1109_TCBBIO_2025_3569286 crossref_primary_10_1007_s10989_025_10724_z crossref_primary_10_1371_journal_pone_0323895 crossref_primary_10_1093_bioinformatics_btae579 crossref_primary_10_1016_j_ijbiomac_2025_142309 crossref_primary_10_1007_s12539_024_00639_6 crossref_primary_10_3390_genes15081090 crossref_primary_10_1007_s00439_024_02680_3 crossref_primary_10_1371_journal_pcbi_1013268 crossref_primary_10_1007_s11426_025_2942_5 crossref_primary_10_1007_s12539_025_00734_2 crossref_primary_10_1038_s42256_024_00973_w crossref_primary_10_1109_TCBB_2023_3323295 crossref_primary_10_1021_acs_jcim_5c00205 crossref_primary_10_1186_s12859_024_05714_9 crossref_primary_10_1021_acs_langmuir_4c04140 crossref_primary_10_1111_cts_70124 crossref_primary_10_1073_pnas_2513219122 crossref_primary_10_3389_fphar_2025_1597351 crossref_primary_10_1002_advs_202407664 crossref_primary_10_1093_bib_bbaf182 crossref_primary_10_1007_s11704_024_31060_3 crossref_primary_10_1038_s42256_025_01047_1 crossref_primary_10_1038_s41467_024_48675_6 crossref_primary_10_1371_journal_pone_0316215 crossref_primary_10_1371_journal_pcbi_1010787 crossref_primary_10_3390_ijms25115820 crossref_primary_10_1016_j_sbi_2025_103004 crossref_primary_10_1007_s12539_025_00723_5 crossref_primary_10_1016_j_compbiomed_2025_110438 crossref_primary_10_1038_s42256_023_00639_z crossref_primary_10_1016_j_jhazmat_2025_139625 crossref_primary_10_1016_j_neunet_2025_107476 crossref_primary_10_1093_bib_bbae404 crossref_primary_10_1146_annurev_genom_021623_083207 crossref_primary_10_7717_peerj_19919 crossref_primary_10_1186_s12911_025_03056_x crossref_primary_10_1002_advs_202301223 crossref_primary_10_1093_gpbjnl_qzaf011 crossref_primary_10_1093_bib_bbaf496 crossref_primary_10_3390_ijms242216496 crossref_primary_10_3390_bioengineering12010026 crossref_primary_10_1016_j_xinn_2025_100948 crossref_primary_10_1360_SSV_2023_0297 crossref_primary_10_1002_advs_202509501 crossref_primary_10_1002_mgea_88 crossref_primary_10_1016_j_eml_2024_102236 crossref_primary_10_1002_pmic_202200494 crossref_primary_10_1093_bioinformatics_btaf200 crossref_primary_10_1146_annurev_phyto_121823_081033 crossref_primary_10_1038_s44320_024_00016_x crossref_primary_10_1093_bib_bbae042 crossref_primary_10_1007_s12149_025_02031_w crossref_primary_10_3389_fpls_2025_1583344 crossref_primary_10_1007_s44163_025_00304_x crossref_primary_10_1093_bioadv_vbaf146 crossref_primary_10_1177_11779322251358314 crossref_primary_10_1186_s13321_024_00862_9 crossref_primary_10_1093_nar_gkad288 crossref_primary_10_1038_s41467_022_32007_7 crossref_primary_10_1038_s41588_023_01465_0 crossref_primary_10_1021_acsomega_5c05484 crossref_primary_10_1016_j_scitotenv_2024_172466 crossref_primary_10_1093_bib_bbaf242 crossref_primary_10_1128_msystems_00035_22 crossref_primary_10_1093_nar_gkad726 crossref_primary_10_1093_bib_bbaf367 crossref_primary_10_1186_s12915_025_02202_1 crossref_primary_10_1016_j_compbiomed_2025_111067 crossref_primary_10_1021_acs_jcim_5c00860 crossref_primary_10_7554_eLife_98033_4 crossref_primary_10_1093_nargab_lqae011 crossref_primary_10_1093_bib_bbae270 crossref_primary_10_1038_s42256_025_01044_4 crossref_primary_10_1093_bib_bbaf481 crossref_primary_10_1109_RBME_2024_3496744 crossref_primary_10_3390_genes15010025 crossref_primary_10_3390_biom15060881 crossref_primary_10_1016_j_ymeth_2023_08_016 crossref_primary_10_1109_TCBB_2023_3247634 crossref_primary_10_1016_j_namjnl_2025_100012 crossref_primary_10_1016_j_sbi_2025_103027 crossref_primary_10_1093_nargab_lqae021 crossref_primary_10_1109_JBHI_2024_3413146 crossref_primary_10_1016_j_tifs_2025_105216 crossref_primary_10_1038_s41467_025_60872_5 crossref_primary_10_1093_bib_bbac401 crossref_primary_10_3389_fpls_2025_1611992 crossref_primary_10_1093_gigascience_giae104 crossref_primary_10_1021_jacsau_5c00757 crossref_primary_10_1021_acs_jcim_5c00856 crossref_primary_10_3390_pharmaceutics15020431 crossref_primary_10_1186_s12915_025_02361_1 crossref_primary_10_1007_s40747_025_02065_7 crossref_primary_10_1016_j_immuni_2024_07_022 crossref_primary_10_1016_j_ijbiomac_2024_134601 crossref_primary_10_1186_s12859_024_05708_7 crossref_primary_10_1186_s13321_024_00909_x crossref_primary_10_1093_bib_bbaf271 crossref_primary_10_3389_fimmu_2025_1556165 crossref_primary_10_1016_j_compbiomed_2024_107956 crossref_primary_10_1080_19420862_2023_2285904 crossref_primary_10_3389_fmolb_2024_1352508 crossref_primary_10_1093_bioinformatics_btae381 crossref_primary_10_1038_s42004_024_01212_4 crossref_primary_10_1142_S2737416525500322 crossref_primary_10_1016_j_eswa_2025_127991 crossref_primary_10_1002_advs_202304305 crossref_primary_10_1016_j_cell_2024_11_015 crossref_primary_10_1016_j_cels_2025_101400 crossref_primary_10_1016_j_molcel_2023_06_019 crossref_primary_10_1038_s41598_025_14545_4 crossref_primary_10_7554_eLife_80942 crossref_primary_10_1109_ACCESS_2023_3328960 crossref_primary_10_2174_0115701646374018250227075247 crossref_primary_10_1038_s41598_025_13178_x crossref_primary_10_1016_j_compbiolchem_2024_108240 crossref_primary_10_1016_j_ymeth_2023_10_008 crossref_primary_10_1093_bib_bbaf261 crossref_primary_10_1093_femsre_fuad003 crossref_primary_10_1038_s41467_024_53982_z crossref_primary_10_1371_journal_pcbi_1011953 crossref_primary_10_1016_j_scib_2023_09_039 crossref_primary_10_1099_jgv_0_002067 crossref_primary_10_1016_j_engappai_2025_110977 crossref_primary_10_1109_TCBB_2023_3311427 crossref_primary_10_1007_s11432_024_4466_3 crossref_primary_10_1016_j_gpb_2022_11_014 crossref_primary_10_1002_2211_5463_70003 crossref_primary_10_1016_j_knosys_2024_111901 crossref_primary_10_3390_app15137112 crossref_primary_10_1093_femsre_fuad030 crossref_primary_10_1016_j_heliyon_2023_e23781 crossref_primary_10_1016_j_isci_2025_113241 crossref_primary_10_1016_j_partic_2023_11_014 crossref_primary_10_1186_s12915_025_02356_y crossref_primary_10_1038_s41587_024_02127_0 crossref_primary_10_1111_imr_13309 crossref_primary_10_1016_j_compbiolchem_2024_108058 crossref_primary_10_3389_fbinf_2025_1627836 crossref_primary_10_3390_bioengineering11020185 crossref_primary_10_1093_bioinformatics_btaf496 crossref_primary_10_3390_ijms26157125 crossref_primary_10_1002_itl2_434 crossref_primary_10_1109_TCBB_2024_3402661 crossref_primary_10_1016_j_artmed_2024_102860 crossref_primary_10_1021_acscentsci_3c01275 crossref_primary_10_1016_j_csbj_2024_06_016 crossref_primary_10_1093_bib_bbaf443 crossref_primary_10_1016_j_isci_2025_113495 crossref_primary_10_1002_advs_202405861 crossref_primary_10_1016_j_jpha_2025_101255 crossref_primary_10_1038_s41592_025_02723_1 crossref_primary_10_1109_JBHI_2024_3357834 crossref_primary_10_1007_s12539_024_00626_x crossref_primary_10_1038_s41467_023_40365_z crossref_primary_10_1016_j_csbj_2025_04_002 crossref_primary_10_1007_s11426_024_2072_4 crossref_primary_10_1063_5_0134317 crossref_primary_10_1109_ACCESS_2025_3552256 crossref_primary_10_1016_j_ejmech_2023_115199 crossref_primary_10_1038_s41587_024_02214_2 crossref_primary_10_1016_j_compbiomed_2023_107571 crossref_primary_10_1039_D5CS00146C crossref_primary_10_1109_TCBBIO_2025_3572178 crossref_primary_10_1007_s12539_024_00673_4 crossref_primary_10_1007_s12539_023_00549_z crossref_primary_10_1038_s43246_024_00519_y crossref_primary_10_1093_bib_bbaf476 crossref_primary_10_1093_bib_bbad290 crossref_primary_10_1093_bib_bbae146 crossref_primary_10_1016_j_compbiomed_2024_109013 crossref_primary_10_1093_bib_bbaf357 crossref_primary_10_1039_D4DD00195H crossref_primary_10_1016_j_csbj_2024_03_017 crossref_primary_10_1093_bioadv_vbaf117 crossref_primary_10_1002_advs_202407013 crossref_primary_10_1093_bib_bbaf230 crossref_primary_10_1038_s41746_025_01886_7 crossref_primary_10_1038_s42003_025_07902_6 crossref_primary_10_1093_bioinformatics_btaf272 crossref_primary_10_1093_nar_gkad1031 crossref_primary_10_1186_s13321_024_00884_3 crossref_primary_10_1002_pmic_202400210 crossref_primary_10_7717_peerj_cs_2622 crossref_primary_10_1039_D5CP00785B crossref_primary_10_1016_j_tibtech_2025_03_003 crossref_primary_10_1016_j_indcrop_2024_119855 crossref_primary_10_1093_nar_gkae039 crossref_primary_10_1093_nar_gkae710 crossref_primary_10_1080_07391102_2024_2431664 crossref_primary_10_2174_0115748936283134240109054157 crossref_primary_10_1007_s12539_025_00730_6 crossref_primary_10_1371_journal_pcbi_1013424 crossref_primary_10_1016_j_heliyon_2024_e41488 crossref_primary_10_1093_bib_bbae495 crossref_primary_10_3390_pharmaceutics15051337 crossref_primary_10_2174_0109298673263447230920151524 crossref_primary_10_1093_glycob_cwad033 crossref_primary_10_1016_j_drudis_2024_104025 crossref_primary_10_1038_s41564_023_01584_8 crossref_primary_10_1093_bib_bbad289 crossref_primary_10_1093_bioinformatics_btaf035 crossref_primary_10_1016_j_isci_2025_113273 crossref_primary_10_1080_17460441_2025_2490253 crossref_primary_10_1093_bib_bbaf461 crossref_primary_10_1039_D4NP00003J crossref_primary_10_1109_ACCESS_2024_3416461 crossref_primary_10_1002_pro_4739 crossref_primary_10_7717_peerj_cs_2733 crossref_primary_10_1073_pnas_2206069119 crossref_primary_10_1371_journal_pcbi_1011353 crossref_primary_10_3390_ijms26094270 crossref_primary_10_1016_j_biotechadv_2024_108399 crossref_primary_10_1021_acs_jcim_5c00838 crossref_primary_10_1093_bib_bbaf459 crossref_primary_10_1016_j_bidere_2025_100044 crossref_primary_10_1002_jcb_30490 crossref_primary_10_1016_j_bsheal_2025_09_007 crossref_primary_10_1093_bioadv_vbae163 crossref_primary_10_1016_j_websem_2024_100845 crossref_primary_10_1016_j_inffus_2023_102035 crossref_primary_10_1016_j_trac_2023_117051 crossref_primary_10_1093_biomethods_bpae043 crossref_primary_10_1039_D5SC02055G crossref_primary_10_1016_j_isci_2025_112077 crossref_primary_10_1016_j_sbi_2023_102571 crossref_primary_10_3390_ijms241814061 crossref_primary_10_1042_BCJ20220405 crossref_primary_10_1080_19420862_2025_2555346 crossref_primary_10_1093_gbe_evaf139 crossref_primary_10_1016_j_neucom_2024_128103 crossref_primary_10_1002_pmic_202400398 crossref_primary_10_1109_JBHI_2024_3370680 crossref_primary_10_3390_biom14121531 crossref_primary_10_1093_bioadv_vbad001 crossref_primary_10_1145_3627101 crossref_primary_10_1016_j_soilbio_2025_109780 crossref_primary_10_1016_j_fbio_2025_106934 crossref_primary_10_1038_s43586_025_00383_1 crossref_primary_10_1016_j_tplants_2024_04_013 crossref_primary_10_3390_app15063283 crossref_primary_10_1016_j_compbiomed_2024_108076 crossref_primary_10_1093_bib_bbae675 crossref_primary_10_1093_bib_bbaf401 crossref_primary_10_1093_jxb_erac368 crossref_primary_10_1016_j_compbiomed_2024_109048 crossref_primary_10_1093_bib_bbac499 crossref_primary_10_1002_advs_202404212 crossref_primary_10_1016_j_jmb_2024_168769 crossref_primary_10_1002_advs_202502723 crossref_primary_10_1038_s41598_024_84146_0 crossref_primary_10_7554_eLife_98033 crossref_primary_10_1016_j_cels_2024_01_008 crossref_primary_10_1016_j_biotechadv_2024_108459 crossref_primary_10_3390_antib12030058 crossref_primary_10_1093_gigascience_giaf037 crossref_primary_10_1186_s13321_023_00688_x crossref_primary_10_1093_bib_bbae548 crossref_primary_10_3390_bioengineering12050440 crossref_primary_10_1038_s41598_025_93409_3 crossref_primary_10_1111_1751_7915_70072 crossref_primary_10_1002_pmic_202300011 crossref_primary_10_1021_acs_jcim_5c00016 crossref_primary_10_1093_bib_bbae583 crossref_primary_10_1002_mef2_43 crossref_primary_10_1016_j_jbi_2024_104650 crossref_primary_10_1093_bib_bbad376 crossref_primary_10_1093_bioinformatics_btae533 crossref_primary_10_1002_cbic_202200776 crossref_primary_10_1016_j_csbj_2024_09_031 crossref_primary_10_1038_s41467_023_37958_z crossref_primary_10_1093_bioinformatics_btaf198 crossref_primary_10_1111_bph_17388 crossref_primary_10_1093_database_baaf027 crossref_primary_10_1016_j_identj_2025_100890 crossref_primary_10_3390_ijms24043775 crossref_primary_10_1016_j_procs_2023_10_500 crossref_primary_10_1016_j_ymeth_2025_01_020 crossref_primary_10_1093_bib_bbae579 crossref_primary_10_1016_j_ejmech_2025_117825 crossref_primary_10_3390_antib13030074 crossref_primary_10_1007_s12539_025_00732_4 crossref_primary_10_1016_j_alit_2025_08_004 crossref_primary_10_3390_ijms25158426 crossref_primary_10_1038_s41578_025_00793_3 crossref_primary_10_1038_s41598_025_88445_y crossref_primary_10_1109_TNB_2023_3278033 crossref_primary_10_1145_3611651 crossref_primary_10_1093_bib_bbae330 crossref_primary_10_1089_hs_2024_0075 crossref_primary_10_3390_v17091199 crossref_primary_10_1007_s11427_024_2578_6 crossref_primary_10_1093_bib_bbae695 crossref_primary_10_1093_bioadv_vbae119 crossref_primary_10_1109_TPAMI_2025_3585179 crossref_primary_10_3390_biom14040409 crossref_primary_10_1093_bioadv_vbaf204 crossref_primary_10_1002_pro_5239 crossref_primary_10_1109_JBHI_2023_3273333 crossref_primary_10_1093_nsr_nwaf028 crossref_primary_10_55696_ejset_1620495 crossref_primary_10_1002_imo2_45 crossref_primary_10_1145_3715318 crossref_primary_10_1016_j_knosys_2024_112479 crossref_primary_10_1038_s42003_025_08282_7 crossref_primary_10_1016_j_csbj_2024_05_040 crossref_primary_10_1109_JBHI_2025_3530794 crossref_primary_10_1002_mlf2_12157 |
| Cites_doi | 10.1109/TKDE.2009.191 10.1016/S0022-2836(05)80360-2 10.1093/nar/gkt1223 10.1073/pnas.2016239118 10.1093/nar/gku1267 10.1093/bioinformatics/btm098 10.1093/nar/gkz1064 10.1016/j.csbj.2021.03.022 10.1016/S0006-3495(96)79210-X 10.1093/bioinformatics/btt725 10.1038/nature17995 10.1186/s12859-019-3220-8 10.1093/bioinformatics/btaa003 10.1109/TPAMI.2021.3095381 10.1038/75556 10.1007/978-1-4939-3167-5_2 10.1073/pnas.1914677117 10.1093/nar/25.17.3389 10.1038/s41592-019-0598-1 10.1093/nar/gkt1242 10.18653/v1/K19-1031 10.1002/prot.25415 10.1093/database/baw133 10.1093/bioinformatics/btv345 |
| ContentType | Journal Article |
| Copyright | The Author(s) 2022. Published by Oxford University Press. 2022 The Author(s) 2022. Published by Oxford University Press. |
| Copyright_xml | – notice: The Author(s) 2022. Published by Oxford University Press. 2022 – notice: The Author(s) 2022. Published by Oxford University Press. |
| DBID | TOX AAYXX CITATION CGR CUY CVF ECM EIF NPM 7X8 5PM |
| DOI | 10.1093/bioinformatics/btac020 |
| DatabaseName | Oxford Journals Open Access Collection CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic PubMed Central (Full Participant titles) |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: TOX name: Oxford Journals Open Access Collection url: https://academic.oup.com/journals/ sourceTypes: Publisher – sequence: 3 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Biology |
| EISSN | 1460-2059 1367-4811 |
| EndPage | 2110 |
| ExternalDocumentID | PMC9386727 35020807 10_1093_bioinformatics_btac020 10.1093/bioinformatics/btac020 |
| Genre | Research Support, Non-U.S. Gov't Journal Article |
| GrantInformation_xml | – fundername: Israel Science Foundation (ISF) grantid: 2753/20 – fundername: ; ; grantid: 2753/20 |
| GroupedDBID | -~X .2P .I3 482 48X 53G 5GY AAIMJ AAJKP AAKPC AAMVS AAPQZ AAPXW AARHZ AAVAP ABEFU ABEJV ABGNP ABJNI ABNGD ABNKS ABPTD ABSMQ ABWST ABXVV ABZBJ ACGFS ACPQN ACUFI ACUKT ACYTK ADEYI ADFTL ADGZP ADHKW ADOCK ADRTK ADYVW ADZTZ ADZXQ AECKG AEGPL AEJOX AEKKA AEKPW AEKSI AELWJ AEPUE AETBJ AFFNX AFFZL AFOFC AFSHK AGINJ AGKRT AGQXC AI. ALMA_UNASSIGNED_HOLDINGS ALTZX AQDSO ARIXL ASAOO ATDFG ATTQO AXUDD AYOIW AZFZN AZVOD BHONS CXTWN CZ4 DFGAJ EE~ ELUNK F5P F9B FEDTE H5~ HAR HVGLF HW0 IOX KSI KSN MBTAY MVM NGC PB- Q1. Q5Y QBD RD5 RIG ROL ROZ RXO TLC TN5 TOX TR2 VH1 WH7 XJT ZGI ~91 --- -E4 .DC 0R~ 23N 2WC 4.4 5WA 70D AAIJN AAMDB AAOGV AAVLN AAYXX ABEUO ABIXL ABPQP ABQLI ACIWK ACPRK ACUXJ ADBBV ADEZT ADGKP ADHZD ADMLS ADPDF ADRDM ADVEK AEMDU AENEX AENZO AEWNT AFGWE AFIYH AFRAH AGKEF AGSYK AHMBA AHXPO AIJHB AJEEA AJEUX AKHUL AKWXX ALUQC AMNDL APIBT APWMN ASPBG AVWKF BAWUL BAYMD BQDIO BQUQU BSWAC BTQHN C45 CDBKE CITATION CS3 DAKXR DIK DILTD DU5 D~K EBD EBS EMOBN FHSFR FLIZI FLUFQ FOEOM FQBLK GAUVT GJXCC GROUPED_DOAJ GX1 H13 HZ~ J21 JXSIZ KAQDR KOP KQ8 M-Z MK~ ML0 N9A NLBLG NMDNZ NOMLY NU- O9- OAWHX ODMLO OJQWA OK1 OVD OVEED P2P PAFKI PEELM PQQKQ R44 RNS ROX RPM RUSNO RW1 SV3 TEORI TJP W8F WOQ X7H YAYTL YKOAZ YXANX ZKX ~KM CGR CUY CVF ECM EIF NPM 7X8 5PM |
| ID | FETCH-LOGICAL-c522t-7b21abacb1a16a3e361f1240aa3d820f1c874489df893fd8f64e36ed68c5c8723 |
| IEDL.DBID | TOX |
| ISICitedReferencesCount | 498 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000767669100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1367-4803 1367-4811 |
| IngestDate | Thu Aug 21 14:07:17 EDT 2025 Thu Jul 10 22:09:41 EDT 2025 Mon Jul 21 05:59:52 EDT 2025 Sat Nov 29 03:49:23 EST 2025 Tue Nov 18 21:59:03 EST 2025 Wed Apr 02 07:00:55 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 8 |
| Language | English |
| License | This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. https://creativecommons.org/licenses/by/4.0 The Author(s) 2022. Published by Oxford University Press. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c522t-7b21abacb1a16a3e361f1240aa3d820f1c874489df893fd8f64e36ed68c5c8723 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 The authors wish it to be known that, in their opinion, the Nadav Brandes and Dan Ofer should be regarded as Joint First Authors. |
| ORCID | 0000-0002-9357-4526 0000-0002-0510-2546 0000-0002-7218-2558 |
| OpenAccessLink | https://dx.doi.org/10.1093/bioinformatics/btac020 |
| PMID | 35020807 |
| PQID | 2619545963 |
| PQPubID | 23479 |
| PageCount | 9 |
| ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_9386727 proquest_miscellaneous_2619545963 pubmed_primary_35020807 crossref_citationtrail_10_1093_bioinformatics_btac020 crossref_primary_10_1093_bioinformatics_btac020 oup_primary_10_1093_bioinformatics_btac020 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-04-12 |
| PublicationDateYYYYMMDD | 2022-04-12 |
| PublicationDate_xml | – month: 04 year: 2022 text: 2022-04-12 day: 12 |
| PublicationDecade | 2020 |
| PublicationPlace | England |
| PublicationPlace_xml | – name: England |
| PublicationTitle | Bioinformatics |
| PublicationTitleAlternate | Bioinformatics |
| PublicationYear | 2022 |
| Publisher | Oxford University Press |
| Publisher_xml | – name: Oxford University Press |
| References | Strodthoff (2023020109022215400_btac020-B35) 2020; 36 Ofer (2023020109022215400_btac020-B51) 2014; 30 Hochreiter (2023020109022215400_btac020-B19) 2001 Devlin (2023020109022215400_btac020-B13) 2018 Zaheer (2023020109022215400_btac020-B43) 2020 Andreeva (2023020109022215400_btac020-B45) 2014; 42 2023020109022215400_btac020-B53 Rao (2023020109022215400_btac020-B32) 2021 Andreeva (2023020109022215400_btac020-B46) 2020; 48 Neishi (2023020109022215400_btac020-B25) 2019 Alley (2023020109022215400_btac020-B2) 2019; 16 Radford (2023020109022215400_btac020-B29) 2019; 1 Sarkisyan (2023020109022215400_btac020-B52) 2016; 533 Wang (2023020109022215400_btac020-B40) 2019 Hornbeck (2023020109022215400_btac020-B48) 2015; 43 Choromanski (2023020109022215400_btac020-B11) 2020 Ofer (2023020109022215400_btac020-B26) 2021; 19 Moult (2023020109022215400_btac020-B44) 2018; 86 Clark (2023020109022215400_btac020-B12) 2020 Howard (2023020109022215400_btac020-B20) 2018 2023020109022215400_btac020-B47 Vaswani (2023020109022215400_btac020-B39) 2017 Radford (2023020109022215400_btac020-B28) 2018 Raffel (2023020109022215400_btac020-B30) 2019 Chen (2023020109022215400_btac020-B9) 2020 Chollet (2023020109022215400_btac020-B10) 2015 Elnaggar (2023020109022215400_btac020-B15) 2021; 1 Keskar (2023020109022215400_btac020-B21) 2019 Boutet (2023020109022215400_btac020-B7) 2016 Bepler (2023020109022215400_btac020-B6) 2019 Hendrycks (2023020109022215400_btac020-B18) 2016 Suzek (2023020109022215400_btac020-B37) 2007; 23 Brown (2023020109022215400_btac020-B8) 2020 Yang (2023020109022215400_btac020-B41) 2019 Ofer (2023020109022215400_btac020-B49) 2015; 31 Pan (2023020109022215400_btac020-B27) 2010; 22 Abadi (2023020109022215400_btac020-B1) 2016 Rao (2023020109022215400_btac020-B31) 2019; 32 Altschul (2023020109022215400_btac020-B4) 1997; 25 Altschul (2023020109022215400_btac020-B3) 1990; 215 Strait (2023020109022215400_btac020-B34) 1996; 71 Sturmfels (2023020109022215400_btac020-B36) 2020 Heinzinger (2023020109022215400_btac020-B17) 2019; 20 Do (2023020109022215400_btac020-B14) 2005; 18 Brandes (2023020109022215400_btac020-B50) 2016; 2016 Finn (2023020109022215400_btac020-B16) 2014; 42 Ashburner (2023020109022215400_btac020-B5) 2000; 25 Nambiar (2023020109022215400_btac020-B24) 2020 Yang (2023020109022215400_btac020-B42) 2020; 117 Madani (2023020109022215400_btac020-B23) 2020 Rives (2023020109022215400_btac020-B33) 2021; 118 Thrun (2023020109022215400_btac020-B38) 1996 Lan (2023020109022215400_btac020-B22) 2019 |
| References_xml | – volume: 22 start-page: 1345 year: 2010 ident: 2023020109022215400_btac020-B27 article-title: A survey on transfer learning publication-title: IEEE Trans. Knowl. Data Eng doi: 10.1109/TKDE.2009.191 – year: 2020 ident: 2023020109022215400_btac020-B12 – volume: 215 start-page: 403 year: 1990 ident: 2023020109022215400_btac020-B3 article-title: Basic local alignment search tool publication-title: J. Mol. Biol doi: 10.1016/S0022-2836(05)80360-2 – volume: 42 start-page: D222 year: 2014 ident: 2023020109022215400_btac020-B16 article-title: Pfam: the protein families database publication-title: Nucleic Acids Res doi: 10.1093/nar/gkt1223 – year: 2019 ident: 2023020109022215400_btac020-B41 – year: 2016 ident: 2023020109022215400_btac020-B18 – volume: 118 start-page: e2016239118 year: 2021 ident: 2023020109022215400_btac020-B33 article-title: Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences publication-title: Proc. Natl. Acad. Sci. USA doi: 10.1073/pnas.2016239118 – start-page: 1 volume-title: Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics year: 2020 ident: 2023020109022215400_btac020-B24 – volume: 43 start-page: D512 year: 2015 ident: 2023020109022215400_btac020-B48 article-title: PhosphoSitePlus, 2014: mutations, PTMs and recalibrations publication-title: Nucleic AcidsRes doi: 10.1093/nar/gku1267 – volume: 23 start-page: 1282 year: 2007 ident: 2023020109022215400_btac020-B37 article-title: UniRef: comprehensive and non-redundant UniProt reference clusters publication-title: Bioinformatics doi: 10.1093/bioinformatics/btm098 – volume: 1 start-page: 9 year: 2019 ident: 2023020109022215400_btac020-B29 article-title: Language models are unsupervised multitask learners publication-title: OpenAI Blog – year: 2020 ident: 2023020109022215400_btac020-B23 – year: 2015 ident: 2023020109022215400_btac020-B10 – year: 2018 ident: 2023020109022215400_btac020-B13 – volume: 48 start-page: D376 year: 2020 ident: 2023020109022215400_btac020-B46 article-title: The SCOP database in 2020: expanded classification of representativefamily and superfamily domains of known protein structures publication-title: Nucleic Acids Res doi: 10.1093/nar/gkz1064 – volume: 19 start-page: 1750 year: 2021 ident: 2023020109022215400_btac020-B26 article-title: The language of proteins: NLP, machine learning & protein sequences publication-title: Comput. Struct. Biotechnol. J doi: 10.1016/j.csbj.2021.03.022 – ident: 2023020109022215400_btac020-B53 – year: 2018 ident: 2023020109022215400_btac020-B20 – volume: 71 start-page: 148 year: 1996 ident: 2023020109022215400_btac020-B34 article-title: The Shannon information entropy of protein sequences publication-title: Biophys. J doi: 10.1016/S0006-3495(96)79210-X – year: 2020 ident: 2023020109022215400_btac020-B43 – volume: 30 start-page: 931 year: 2014 ident: 2023020109022215400_btac020-B51 article-title: NeuroPID: a predictorfor identifying neuropeptide precursors from metazoan proteomes publication-title: Bioinformatics doi: 10.1093/bioinformatics/btt725 – year: 2020 ident: 2023020109022215400_btac020-B8 – volume: 533 start-page: 397 year: 2016 ident: 2023020109022215400_btac020-B52 article-title: Local fitness landscape of the green fluorescent protein publication-title: Nature doi: 10.1038/nature17995 – start-page: 237 year: 2001 ident: 2023020109022215400_btac020-B19 – ident: 2023020109022215400_btac020-B47 – volume: 20 start-page: 1 year: 2019 ident: 2023020109022215400_btac020-B17 article-title: Modeling aspects of the language of life through transfer-learning protein sequences publication-title: BMC Bioinformatics doi: 10.1186/s12859-019-3220-8 – volume: 36 start-page: 2401 year: 2020 ident: 2023020109022215400_btac020-B35 article-title: UDSMProt: universal deep sequence models for protein classification publication-title: Bioinformatics doi: 10.1093/bioinformatics/btaa003 – volume: 1 start-page: 1 year: 2021 ident: 2023020109022215400_btac020-B15 article-title: ProtTrans: towards cracking the language of lifes code through self-supervised deep learning and high performance computing publication-title: IEEE Trans. Pattern Anal. Mach. Intell doi: 10.1109/TPAMI.2021.3095381 – year: 2019 ident: 2023020109022215400_btac020-B30 – year: 2020 ident: 2023020109022215400_btac020-B11 – year: 2020 ident: 2023020109022215400_btac020-B9 – start-page: 265 volume-title: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI16) year: 2016 ident: 2023020109022215400_btac020-B1 – volume: 25 start-page: 25 year: 2000 ident: 2023020109022215400_btac020-B5 article-title: Gene ontology: tool for the unification of biology publication-title: Nat. Genet doi: 10.1038/75556 – start-page: 23 volume-title: Plant Bioinformatics year: 2016 ident: 2023020109022215400_btac020-B7 doi: 10.1007/978-1-4939-3167-5_2 – year: 2021 ident: 2023020109022215400_btac020-B32 – year: 2017 ident: 2023020109022215400_btac020-B39 – year: 2019 ident: 2023020109022215400_btac020-B22 – year: 2019 ident: 2023020109022215400_btac020-B6 – volume: 117 start-page: 1496 year: 2020 ident: 2023020109022215400_btac020-B42 article-title: Improved protein structure prediction using predicted interresidue orientations publication-title: Proc. Natl. Acad. Sci. USA doi: 10.1073/pnas.1914677117 – start-page: 640 volume-title: Advances in Neural Information Processing Systems year: 1996 ident: 2023020109022215400_btac020-B38 – volume: 25 start-page: 3389 year: 1997 ident: 2023020109022215400_btac020-B4 article-title: Gapped BLAST and PSI-BLAST : a new generation of protein database search programs publication-title: Nucleic Acids Res doi: 10.1093/nar/25.17.3389 – volume: 18 start-page: 299 year: 2005 ident: 2023020109022215400_btac020-B14 article-title: Transfer learning for text classification publication-title: Adv. Neural Inf. Process. Syst – volume: 32 start-page: 9689 year: 2019 ident: 2023020109022215400_btac020-B31 article-title: Evaluating protein transfer learning with tape publication-title: Adv. Neural Inf. Process. Syst – year: 2019 ident: 2023020109022215400_btac020-B40 – year: 2018 ident: 2023020109022215400_btac020-B28 – year: 2020 ident: 2023020109022215400_btac020-B36 – volume: 16 start-page: 1315 year: 2019 ident: 2023020109022215400_btac020-B2 article-title: Unified rational protein engineering with sequence-based deep representation learning publication-title: Nat. Methods doi: 10.1038/s41592-019-0598-1 – volume: 42 start-page: D310 year: 2014 ident: 2023020109022215400_btac020-B45 article-title: SCOP2 prototype: a new approach to protein structure mining publication-title: NucleicAcids Res doi: 10.1093/nar/gkt1242 – year: 2019 ident: 2023020109022215400_btac020-B21 – start-page: 328 volume-title: Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL) year: 2019 ident: 2023020109022215400_btac020-B25 doi: 10.18653/v1/K19-1031 – volume: 86 start-page: 7 year: 2018 ident: 2023020109022215400_btac020-B44 article-title: Critical assessment of methods of protein structure prediction (CASP)—Round XII publication-title: Proteins Struct Funct Bioinforma doi: 10.1002/prot.25415 – volume: 2016 year: 2016 ident: 2023020109022215400_btac020-B50 article-title: ASAP: Amachine learning framework for local protein properties publication-title: Database doi: 10.1093/database/baw133 – volume: 31 start-page: 3429 year: 2015 ident: 2023020109022215400_btac020-B49 article-title: ProFET: Featureengineering captures high-level protein functions publication-title: Bioinformatics doi: 10.1093/bioinformatics/btv345 |
| SSID | ssj0051444 ssj0005056 |
| Score | 2.7456539 |
| Snippet | Abstract
Summary
Self-supervised deep language modeling has shown unprecedented success across natural language tasks, and has recently been repurposed to... Self-supervised deep language modeling has shown unprecedented success across natural language tasks, and has recently been repurposed to biological sequences.... |
| SourceID | pubmedcentral proquest pubmed crossref oup |
| SourceType | Open Access Repository Aggregation Database Index Database Enrichment Source Publisher |
| StartPage | 2102 |
| SubjectTerms | Amino Acid Sequence Deep Learning Language Natural Language Processing Original Papers Proteins - chemistry |
| Title | ProteinBERT: a universal deep-learning model of protein sequence and function |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/35020807 https://www.proquest.com/docview/2619545963 https://pubmed.ncbi.nlm.nih.gov/PMC9386727 |
| Volume | 38 |
| WOSCitedRecordID | wos000767669100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVASL databaseName: Oxford Journals Open Access Collection customDbUrl: eissn: 1460-2059 dateEnd: 20220930 omitProxy: false ssIdentifier: ssj0005056 issn: 1367-4803 databaseCode: TOX dateStart: 19850101 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press – providerCode: PRVASL databaseName: Oxford Journals Open Access Collection customDbUrl: eissn: 1460-2059 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0005056 issn: 1367-4803 databaseCode: TOX dateStart: 19850101 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bS8MwFD7oUPDF-2VeRgSfhLKmWdvUN5UNH3SKTNlbSdNEB9KNXQT_vSdNO60gXh5LLrQ5Sc5XzjnfB3CSFy8GSeggNkidFvWUw308jzyNPFd4gQ5yJqbH67Db5f1-dLcAtKyF-RrCj1gzGQwLElFDXNxMpkIixsFbl_rcaBb0bvsfSR2uoYaxDwgFWlbT1lB7c5eVBcLfzlnxTZV6t0-w82v25Cd31Fn7x4esw2qBPcm53SwbsKCyTVi2apRvW3BzZzgbBtlF-753RgSZ2ZwNHJEqNXIKfYknkovnkKEmI9uflOnYRGQpMY7SGHsbHjrt3uWVU6gtOBIx2NQJE4-KRMiEChoIplhANTp_VwiWIkzQVBqmfB6lGiGOTrkOWthHpQGXPjZ5bAdq2TBTe0AQAyg3lOgHFf6ACJZoT6pAcukKylqa1sEv1zmWBRW5UcR4iW1InMXVpYqLpapDcz5uZMk4fhxximb8defj0toxHjITORGZGs4msfnNRKiJl1Uddq3153My3-icumEdwsq-mHcwBN7VlmzwnBN5R4ybQPj-X17yAFY8U4GR000eQm06nqkjWJKv08Fk3IDFsM8b-Zl4B37mEgw |
| linkProvider | Oxford University Press |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=ProteinBERT%3A+a+universal+deep-learning+model+of+protein+sequence+and+function&rft.jtitle=Bioinformatics&rft.au=Brandes%2C+Nadav&rft.au=Ofer%2C+Dan&rft.au=Peleg%2C+Yam&rft.au=Rappoport%2C+Nadav&rft.date=2022-04-12&rft.pub=Oxford+University+Press&rft.issn=1367-4803&rft.eissn=1460-2059&rft.volume=38&rft.issue=8&rft.spage=2102&rft.epage=2110&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtac020&rft.externalDocID=10.1093%2Fbioinformatics%2Fbtac020 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon |