ProteinBERT: a universal deep-learning model of protein sequence and function

Abstract Summary Self-supervised deep language modeling has shown unprecedented success across natural language tasks, and has recently been repurposed to biological sequences. However, existing models and pretraining methods are designed and optimized for text analysis. We introduce ProteinBERT, a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics Jg. 38; H. 8; S. 2102 - 2110
Hauptverfasser: Brandes, Nadav, Ofer, Dan, Peleg, Yam, Rappoport, Nadav, Linial, Michal
Format: Journal Article
Sprache:Englisch
Veröffentlicht: England Oxford University Press 12.04.2022
Schlagworte:
ISSN:1367-4803, 1367-4811, 1460-2059, 1367-4811
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Abstract Summary Self-supervised deep language modeling has shown unprecedented success across natural language tasks, and has recently been repurposed to biological sequences. However, existing models and pretraining methods are designed and optimized for text analysis. We introduce ProteinBERT, a deep language model specifically designed for proteins. Our pretraining scheme combines language modeling with a novel task of Gene Ontology (GO) annotation prediction. We introduce novel architectural elements that make the model highly efficient and flexible to long sequences. The architecture of ProteinBERT consists of both local and global representations, allowing end-to-end processing of these types of inputs and outputs. ProteinBERT obtains near state-of-the-art performance, and sometimes exceeds it, on multiple benchmarks covering diverse protein properties (including protein structure, post-translational modifications and biophysical attributes), despite using a far smaller and faster model than competing deep-learning methods. Overall, ProteinBERT provides an efficient framework for rapidly training protein predictors, even with limited labeled data. Availability and implementation Code and pretrained model weights are available at https://github.com/nadavbra/protein_bert. Supplementary information Supplementary data are available at Bioinformatics online.
AbstractList Abstract Summary Self-supervised deep language modeling has shown unprecedented success across natural language tasks, and has recently been repurposed to biological sequences. However, existing models and pretraining methods are designed and optimized for text analysis. We introduce ProteinBERT, a deep language model specifically designed for proteins. Our pretraining scheme combines language modeling with a novel task of Gene Ontology (GO) annotation prediction. We introduce novel architectural elements that make the model highly efficient and flexible to long sequences. The architecture of ProteinBERT consists of both local and global representations, allowing end-to-end processing of these types of inputs and outputs. ProteinBERT obtains near state-of-the-art performance, and sometimes exceeds it, on multiple benchmarks covering diverse protein properties (including protein structure, post-translational modifications and biophysical attributes), despite using a far smaller and faster model than competing deep-learning methods. Overall, ProteinBERT provides an efficient framework for rapidly training protein predictors, even with limited labeled data. Availability and implementation Code and pretrained model weights are available at https://github.com/nadavbra/protein_bert. Supplementary information Supplementary data are available at Bioinformatics online.
Self-supervised deep language modeling has shown unprecedented success across natural language tasks, and has recently been repurposed to biological sequences. However, existing models and pretraining methods are designed and optimized for text analysis. We introduce ProteinBERT, a deep language model specifically designed for proteins. Our pretraining scheme combines language modeling with a novel task of Gene Ontology (GO) annotation prediction. We introduce novel architectural elements that make the model highly efficient and flexible to long sequences. The architecture of ProteinBERT consists of both local and global representations, allowing end-to-end processing of these types of inputs and outputs. ProteinBERT obtains near state-of-the-art performance, and sometimes exceeds it, on multiple benchmarks covering diverse protein properties (including protein structure, post-translational modifications and biophysical attributes), despite using a far smaller and faster model than competing deep-learning methods. Overall, ProteinBERT provides an efficient framework for rapidly training protein predictors, even with limited labeled data.SUMMARYSelf-supervised deep language modeling has shown unprecedented success across natural language tasks, and has recently been repurposed to biological sequences. However, existing models and pretraining methods are designed and optimized for text analysis. We introduce ProteinBERT, a deep language model specifically designed for proteins. Our pretraining scheme combines language modeling with a novel task of Gene Ontology (GO) annotation prediction. We introduce novel architectural elements that make the model highly efficient and flexible to long sequences. The architecture of ProteinBERT consists of both local and global representations, allowing end-to-end processing of these types of inputs and outputs. ProteinBERT obtains near state-of-the-art performance, and sometimes exceeds it, on multiple benchmarks covering diverse protein properties (including protein structure, post-translational modifications and biophysical attributes), despite using a far smaller and faster model than competing deep-learning methods. Overall, ProteinBERT provides an efficient framework for rapidly training protein predictors, even with limited labeled data.Code and pretrained model weights are available at https://github.com/nadavbra/protein_bert.AVAILABILITY AND IMPLEMENTATIONCode and pretrained model weights are available at https://github.com/nadavbra/protein_bert.Supplementary data are available at Bioinformatics online.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
Self-supervised deep language modeling has shown unprecedented success across natural language tasks, and has recently been repurposed to biological sequences. However, existing models and pretraining methods are designed and optimized for text analysis. We introduce ProteinBERT, a deep language model specifically designed for proteins. Our pretraining scheme combines language modeling with a novel task of Gene Ontology (GO) annotation prediction. We introduce novel architectural elements that make the model highly efficient and flexible to long sequences. The architecture of ProteinBERT consists of both local and global representations, allowing end-to-end processing of these types of inputs and outputs. ProteinBERT obtains near state-of-the-art performance, and sometimes exceeds it, on multiple benchmarks covering diverse protein properties (including protein structure, post-translational modifications and biophysical attributes), despite using a far smaller and faster model than competing deep-learning methods. Overall, ProteinBERT provides an efficient framework for rapidly training protein predictors, even with limited labeled data. Code and pretrained model weights are available at https://github.com/nadavbra/protein_bert. Supplementary data are available at Bioinformatics online.
Author Linial, Michal
Brandes, Nadav
Rappoport, Nadav
Ofer, Dan
Peleg, Yam
AuthorAffiliation 4 Department of Software and Information Systems Engineering, Faculty of Engineering Sciences, Ben-Gurion University of the Negev , Beer Sheva 8410501, Israel
1 School of Computer Science and Engineering, The Hebrew University of Jerusalem , Jerusalem 9190401, Israel
3 Deep Trading Ltd. , Haifa 3508401, Israel
2 Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
AuthorAffiliation_xml – name: 3 Deep Trading Ltd. , Haifa 3508401, Israel
– name: 4 Department of Software and Information Systems Engineering, Faculty of Engineering Sciences, Ben-Gurion University of the Negev , Beer Sheva 8410501, Israel
– name: 1 School of Computer Science and Engineering, The Hebrew University of Jerusalem , Jerusalem 9190401, Israel
– name: 2 Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
Author_xml – sequence: 1
  givenname: Nadav
  orcidid: 0000-0002-0510-2546
  surname: Brandes
  fullname: Brandes, Nadav
  email: nadav.brandes@mail.huji.ac.il
  organization: School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
– sequence: 2
  givenname: Dan
  surname: Ofer
  fullname: Ofer, Dan
  organization: Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
– sequence: 3
  givenname: Yam
  surname: Peleg
  fullname: Peleg, Yam
  organization: Deep Trading Ltd., Haifa 3508401, Israel
– sequence: 4
  givenname: Nadav
  orcidid: 0000-0002-7218-2558
  surname: Rappoport
  fullname: Rappoport, Nadav
  organization: Department of Software and Information Systems Engineering, Faculty of Engineering Sciences, Ben-Gurion University of the Negev, Beer Sheva 8410501, Israel
– sequence: 5
  givenname: Michal
  orcidid: 0000-0002-9357-4526
  surname: Linial
  fullname: Linial, Michal
  organization: Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
BackLink https://www.ncbi.nlm.nih.gov/pubmed/35020807$$D View this record in MEDLINE/PubMed
BookMark eNqNkdtqFjEUhYNU7EFfoeTSm7E5TSYjIthSD1BRpF6HPZmdGplJxmSm4Nub8v8V641eJbC_tRZ7r2NyEFNEQk45e8FZL8-GkEL0Kc-wBlfOhhUcE-wROeJKs0awtj-of6m7RhkmD8lxKd8Za7lS6gk5lG2FDeuOyMfPOa0Y4vnll-uXFOgWwy3mAhMdEZdmQsgxxBs6pxEnmjxddjwt-GPD6JBCHKnfoltDik_JYw9TwWf794R8fXt5ffG-ufr07sPFm6vGtUKsTTcIDgO4gQPXIFFq7rlQDECORjDPnemUMv3oTS_9aLxWlcFRG9fWkZAn5PXOd9mGGUeHcc0w2SWHGfJPmyDYh5MYvtmbdGt7aXQnumrwfG-QU92jrHYOxeE0QcS0FSs071vV9lpW9PTPrN8h9zeswKsd4HIqJaO3Lqxwd44aHSbLmb2rzD6szO4rq3L9l_w-4Z9CvhOmbflfzS8gzLZU
CitedBy_id crossref_primary_10_1002_pro_4497
crossref_primary_10_1021_acs_jcim_5c01401
crossref_primary_10_1002_pro_70200
crossref_primary_10_1002_pro_5221
crossref_primary_10_1016_j_fbio_2024_105358
crossref_primary_10_7717_peerj_13613
crossref_primary_10_1016_j_ijbiomac_2024_136172
crossref_primary_10_1093_intimm_dxaf048
crossref_primary_10_3390_molecules29040903
crossref_primary_10_1073_pnas_2320510121
crossref_primary_10_1038_s41598_023_40247_w
crossref_primary_10_1016_j_csbj_2025_03_037
crossref_primary_10_1016_j_compbiomed_2024_108669
crossref_primary_10_1016_j_csbj_2025_03_038
crossref_primary_10_3390_sym16040464
crossref_primary_10_1371_journal_pone_0329174
crossref_primary_10_1021_acs_jproteome_4c00884
crossref_primary_10_1038_s41598_025_08510_4
crossref_primary_10_1093_nar_gkac278
crossref_primary_10_1186_s12911_024_02600_5
crossref_primary_10_1002_advs_202301011
crossref_primary_10_1186_s12859_025_06220_2
crossref_primary_10_1038_s41592_025_02656_9
crossref_primary_10_1038_s41598_024_72512_x
crossref_primary_10_1186_s13040_025_00457_6
crossref_primary_10_1016_j_eswa_2025_126859
crossref_primary_10_1016_j_ab_2024_115637
crossref_primary_10_1007_s00299_024_03294_9
crossref_primary_10_3390_biom15060843
crossref_primary_10_1016_j_compbiomed_2024_108316
crossref_primary_10_1093_nargab_lqad087
crossref_primary_10_3390_ijms252413368
crossref_primary_10_3390_molecules29245923
crossref_primary_10_1038_s42256_023_00637_1
crossref_primary_10_1093_bioinformatics_btae445
crossref_primary_10_1093_bioinformatics_btae330
crossref_primary_10_1093_bib_bbad534
crossref_primary_10_1093_bioadv_vbaf192
crossref_primary_10_1016_j_compbiolchem_2025_108609
crossref_primary_10_1016_j_sbi_2022_102518
crossref_primary_10_1101_gr_280149_124
crossref_primary_10_1016_j_eswa_2024_124168
crossref_primary_10_1016_j_tibs_2022_11_001
crossref_primary_10_1002_qub2_70013
crossref_primary_10_1093_bib_bbae420
crossref_primary_10_1063_4_0000271
crossref_primary_10_1126_science_adr6006
crossref_primary_10_3389_fimmu_2022_858057
crossref_primary_10_1093_nar_gkad578
crossref_primary_10_1088_2632_2153_ad3ee4
crossref_primary_10_2196_59505
crossref_primary_10_1038_s41467_025_62318_4
crossref_primary_10_1109_TCBBIO_2025_3569286
crossref_primary_10_1007_s10989_025_10724_z
crossref_primary_10_1371_journal_pone_0323895
crossref_primary_10_1093_bioinformatics_btae579
crossref_primary_10_1016_j_ijbiomac_2025_142309
crossref_primary_10_1007_s12539_024_00639_6
crossref_primary_10_3390_genes15081090
crossref_primary_10_1007_s00439_024_02680_3
crossref_primary_10_1371_journal_pcbi_1013268
crossref_primary_10_1007_s11426_025_2942_5
crossref_primary_10_1007_s12539_025_00734_2
crossref_primary_10_1038_s42256_024_00973_w
crossref_primary_10_1109_TCBB_2023_3323295
crossref_primary_10_1021_acs_jcim_5c00205
crossref_primary_10_1186_s12859_024_05714_9
crossref_primary_10_1021_acs_langmuir_4c04140
crossref_primary_10_1111_cts_70124
crossref_primary_10_1073_pnas_2513219122
crossref_primary_10_3389_fphar_2025_1597351
crossref_primary_10_1002_advs_202407664
crossref_primary_10_1093_bib_bbaf182
crossref_primary_10_1007_s11704_024_31060_3
crossref_primary_10_1038_s42256_025_01047_1
crossref_primary_10_1038_s41467_024_48675_6
crossref_primary_10_1371_journal_pone_0316215
crossref_primary_10_1371_journal_pcbi_1010787
crossref_primary_10_3390_ijms25115820
crossref_primary_10_1016_j_sbi_2025_103004
crossref_primary_10_1007_s12539_025_00723_5
crossref_primary_10_1016_j_compbiomed_2025_110438
crossref_primary_10_1038_s42256_023_00639_z
crossref_primary_10_1016_j_jhazmat_2025_139625
crossref_primary_10_1016_j_neunet_2025_107476
crossref_primary_10_1093_bib_bbae404
crossref_primary_10_1146_annurev_genom_021623_083207
crossref_primary_10_7717_peerj_19919
crossref_primary_10_1186_s12911_025_03056_x
crossref_primary_10_1002_advs_202301223
crossref_primary_10_1093_gpbjnl_qzaf011
crossref_primary_10_1093_bib_bbaf496
crossref_primary_10_3390_ijms242216496
crossref_primary_10_3390_bioengineering12010026
crossref_primary_10_1016_j_xinn_2025_100948
crossref_primary_10_1360_SSV_2023_0297
crossref_primary_10_1002_advs_202509501
crossref_primary_10_1002_mgea_88
crossref_primary_10_1016_j_eml_2024_102236
crossref_primary_10_1002_pmic_202200494
crossref_primary_10_1093_bioinformatics_btaf200
crossref_primary_10_1146_annurev_phyto_121823_081033
crossref_primary_10_1038_s44320_024_00016_x
crossref_primary_10_1093_bib_bbae042
crossref_primary_10_1007_s12149_025_02031_w
crossref_primary_10_3389_fpls_2025_1583344
crossref_primary_10_1007_s44163_025_00304_x
crossref_primary_10_1093_bioadv_vbaf146
crossref_primary_10_1177_11779322251358314
crossref_primary_10_1186_s13321_024_00862_9
crossref_primary_10_1093_nar_gkad288
crossref_primary_10_1038_s41467_022_32007_7
crossref_primary_10_1038_s41588_023_01465_0
crossref_primary_10_1021_acsomega_5c05484
crossref_primary_10_1016_j_scitotenv_2024_172466
crossref_primary_10_1093_bib_bbaf242
crossref_primary_10_1128_msystems_00035_22
crossref_primary_10_1093_nar_gkad726
crossref_primary_10_1093_bib_bbaf367
crossref_primary_10_1186_s12915_025_02202_1
crossref_primary_10_1016_j_compbiomed_2025_111067
crossref_primary_10_1021_acs_jcim_5c00860
crossref_primary_10_7554_eLife_98033_4
crossref_primary_10_1093_nargab_lqae011
crossref_primary_10_1093_bib_bbae270
crossref_primary_10_1038_s42256_025_01044_4
crossref_primary_10_1093_bib_bbaf481
crossref_primary_10_1109_RBME_2024_3496744
crossref_primary_10_3390_genes15010025
crossref_primary_10_3390_biom15060881
crossref_primary_10_1016_j_ymeth_2023_08_016
crossref_primary_10_1109_TCBB_2023_3247634
crossref_primary_10_1016_j_namjnl_2025_100012
crossref_primary_10_1016_j_sbi_2025_103027
crossref_primary_10_1093_nargab_lqae021
crossref_primary_10_1109_JBHI_2024_3413146
crossref_primary_10_1016_j_tifs_2025_105216
crossref_primary_10_1038_s41467_025_60872_5
crossref_primary_10_1093_bib_bbac401
crossref_primary_10_3389_fpls_2025_1611992
crossref_primary_10_1093_gigascience_giae104
crossref_primary_10_1021_jacsau_5c00757
crossref_primary_10_1021_acs_jcim_5c00856
crossref_primary_10_3390_pharmaceutics15020431
crossref_primary_10_1186_s12915_025_02361_1
crossref_primary_10_1007_s40747_025_02065_7
crossref_primary_10_1016_j_immuni_2024_07_022
crossref_primary_10_1016_j_ijbiomac_2024_134601
crossref_primary_10_1186_s12859_024_05708_7
crossref_primary_10_1186_s13321_024_00909_x
crossref_primary_10_1093_bib_bbaf271
crossref_primary_10_3389_fimmu_2025_1556165
crossref_primary_10_1016_j_compbiomed_2024_107956
crossref_primary_10_1080_19420862_2023_2285904
crossref_primary_10_3389_fmolb_2024_1352508
crossref_primary_10_1093_bioinformatics_btae381
crossref_primary_10_1038_s42004_024_01212_4
crossref_primary_10_1142_S2737416525500322
crossref_primary_10_1016_j_eswa_2025_127991
crossref_primary_10_1002_advs_202304305
crossref_primary_10_1016_j_cell_2024_11_015
crossref_primary_10_1016_j_cels_2025_101400
crossref_primary_10_1016_j_molcel_2023_06_019
crossref_primary_10_1038_s41598_025_14545_4
crossref_primary_10_7554_eLife_80942
crossref_primary_10_1109_ACCESS_2023_3328960
crossref_primary_10_2174_0115701646374018250227075247
crossref_primary_10_1038_s41598_025_13178_x
crossref_primary_10_1016_j_compbiolchem_2024_108240
crossref_primary_10_1016_j_ymeth_2023_10_008
crossref_primary_10_1093_bib_bbaf261
crossref_primary_10_1093_femsre_fuad003
crossref_primary_10_1038_s41467_024_53982_z
crossref_primary_10_1371_journal_pcbi_1011953
crossref_primary_10_1016_j_scib_2023_09_039
crossref_primary_10_1099_jgv_0_002067
crossref_primary_10_1016_j_engappai_2025_110977
crossref_primary_10_1109_TCBB_2023_3311427
crossref_primary_10_1007_s11432_024_4466_3
crossref_primary_10_1016_j_gpb_2022_11_014
crossref_primary_10_1002_2211_5463_70003
crossref_primary_10_1016_j_knosys_2024_111901
crossref_primary_10_3390_app15137112
crossref_primary_10_1093_femsre_fuad030
crossref_primary_10_1016_j_heliyon_2023_e23781
crossref_primary_10_1016_j_isci_2025_113241
crossref_primary_10_1016_j_partic_2023_11_014
crossref_primary_10_1186_s12915_025_02356_y
crossref_primary_10_1038_s41587_024_02127_0
crossref_primary_10_1111_imr_13309
crossref_primary_10_1016_j_compbiolchem_2024_108058
crossref_primary_10_3389_fbinf_2025_1627836
crossref_primary_10_3390_bioengineering11020185
crossref_primary_10_1093_bioinformatics_btaf496
crossref_primary_10_3390_ijms26157125
crossref_primary_10_1002_itl2_434
crossref_primary_10_1109_TCBB_2024_3402661
crossref_primary_10_1016_j_artmed_2024_102860
crossref_primary_10_1021_acscentsci_3c01275
crossref_primary_10_1016_j_csbj_2024_06_016
crossref_primary_10_1093_bib_bbaf443
crossref_primary_10_1016_j_isci_2025_113495
crossref_primary_10_1002_advs_202405861
crossref_primary_10_1016_j_jpha_2025_101255
crossref_primary_10_1038_s41592_025_02723_1
crossref_primary_10_1109_JBHI_2024_3357834
crossref_primary_10_1007_s12539_024_00626_x
crossref_primary_10_1038_s41467_023_40365_z
crossref_primary_10_1016_j_csbj_2025_04_002
crossref_primary_10_1007_s11426_024_2072_4
crossref_primary_10_1063_5_0134317
crossref_primary_10_1109_ACCESS_2025_3552256
crossref_primary_10_1016_j_ejmech_2023_115199
crossref_primary_10_1038_s41587_024_02214_2
crossref_primary_10_1016_j_compbiomed_2023_107571
crossref_primary_10_1039_D5CS00146C
crossref_primary_10_1109_TCBBIO_2025_3572178
crossref_primary_10_1007_s12539_024_00673_4
crossref_primary_10_1007_s12539_023_00549_z
crossref_primary_10_1038_s43246_024_00519_y
crossref_primary_10_1093_bib_bbaf476
crossref_primary_10_1093_bib_bbad290
crossref_primary_10_1093_bib_bbae146
crossref_primary_10_1016_j_compbiomed_2024_109013
crossref_primary_10_1093_bib_bbaf357
crossref_primary_10_1039_D4DD00195H
crossref_primary_10_1016_j_csbj_2024_03_017
crossref_primary_10_1093_bioadv_vbaf117
crossref_primary_10_1002_advs_202407013
crossref_primary_10_1093_bib_bbaf230
crossref_primary_10_1038_s41746_025_01886_7
crossref_primary_10_1038_s42003_025_07902_6
crossref_primary_10_1093_bioinformatics_btaf272
crossref_primary_10_1093_nar_gkad1031
crossref_primary_10_1186_s13321_024_00884_3
crossref_primary_10_1002_pmic_202400210
crossref_primary_10_7717_peerj_cs_2622
crossref_primary_10_1039_D5CP00785B
crossref_primary_10_1016_j_tibtech_2025_03_003
crossref_primary_10_1016_j_indcrop_2024_119855
crossref_primary_10_1093_nar_gkae039
crossref_primary_10_1093_nar_gkae710
crossref_primary_10_1080_07391102_2024_2431664
crossref_primary_10_2174_0115748936283134240109054157
crossref_primary_10_1007_s12539_025_00730_6
crossref_primary_10_1371_journal_pcbi_1013424
crossref_primary_10_1016_j_heliyon_2024_e41488
crossref_primary_10_1093_bib_bbae495
crossref_primary_10_3390_pharmaceutics15051337
crossref_primary_10_2174_0109298673263447230920151524
crossref_primary_10_1093_glycob_cwad033
crossref_primary_10_1016_j_drudis_2024_104025
crossref_primary_10_1038_s41564_023_01584_8
crossref_primary_10_1093_bib_bbad289
crossref_primary_10_1093_bioinformatics_btaf035
crossref_primary_10_1016_j_isci_2025_113273
crossref_primary_10_1080_17460441_2025_2490253
crossref_primary_10_1093_bib_bbaf461
crossref_primary_10_1039_D4NP00003J
crossref_primary_10_1109_ACCESS_2024_3416461
crossref_primary_10_1002_pro_4739
crossref_primary_10_7717_peerj_cs_2733
crossref_primary_10_1073_pnas_2206069119
crossref_primary_10_1371_journal_pcbi_1011353
crossref_primary_10_3390_ijms26094270
crossref_primary_10_1016_j_biotechadv_2024_108399
crossref_primary_10_1021_acs_jcim_5c00838
crossref_primary_10_1093_bib_bbaf459
crossref_primary_10_1016_j_bidere_2025_100044
crossref_primary_10_1002_jcb_30490
crossref_primary_10_1016_j_bsheal_2025_09_007
crossref_primary_10_1093_bioadv_vbae163
crossref_primary_10_1016_j_websem_2024_100845
crossref_primary_10_1016_j_inffus_2023_102035
crossref_primary_10_1016_j_trac_2023_117051
crossref_primary_10_1093_biomethods_bpae043
crossref_primary_10_1039_D5SC02055G
crossref_primary_10_1016_j_isci_2025_112077
crossref_primary_10_1016_j_sbi_2023_102571
crossref_primary_10_3390_ijms241814061
crossref_primary_10_1042_BCJ20220405
crossref_primary_10_1080_19420862_2025_2555346
crossref_primary_10_1093_gbe_evaf139
crossref_primary_10_1016_j_neucom_2024_128103
crossref_primary_10_1002_pmic_202400398
crossref_primary_10_1109_JBHI_2024_3370680
crossref_primary_10_3390_biom14121531
crossref_primary_10_1093_bioadv_vbad001
crossref_primary_10_1145_3627101
crossref_primary_10_1016_j_soilbio_2025_109780
crossref_primary_10_1016_j_fbio_2025_106934
crossref_primary_10_1038_s43586_025_00383_1
crossref_primary_10_1016_j_tplants_2024_04_013
crossref_primary_10_3390_app15063283
crossref_primary_10_1016_j_compbiomed_2024_108076
crossref_primary_10_1093_bib_bbae675
crossref_primary_10_1093_bib_bbaf401
crossref_primary_10_1093_jxb_erac368
crossref_primary_10_1016_j_compbiomed_2024_109048
crossref_primary_10_1093_bib_bbac499
crossref_primary_10_1002_advs_202404212
crossref_primary_10_1016_j_jmb_2024_168769
crossref_primary_10_1002_advs_202502723
crossref_primary_10_1038_s41598_024_84146_0
crossref_primary_10_7554_eLife_98033
crossref_primary_10_1016_j_cels_2024_01_008
crossref_primary_10_1016_j_biotechadv_2024_108459
crossref_primary_10_3390_antib12030058
crossref_primary_10_1093_gigascience_giaf037
crossref_primary_10_1186_s13321_023_00688_x
crossref_primary_10_1093_bib_bbae548
crossref_primary_10_3390_bioengineering12050440
crossref_primary_10_1038_s41598_025_93409_3
crossref_primary_10_1111_1751_7915_70072
crossref_primary_10_1002_pmic_202300011
crossref_primary_10_1021_acs_jcim_5c00016
crossref_primary_10_1093_bib_bbae583
crossref_primary_10_1002_mef2_43
crossref_primary_10_1016_j_jbi_2024_104650
crossref_primary_10_1093_bib_bbad376
crossref_primary_10_1093_bioinformatics_btae533
crossref_primary_10_1002_cbic_202200776
crossref_primary_10_1016_j_csbj_2024_09_031
crossref_primary_10_1038_s41467_023_37958_z
crossref_primary_10_1093_bioinformatics_btaf198
crossref_primary_10_1111_bph_17388
crossref_primary_10_1093_database_baaf027
crossref_primary_10_1016_j_identj_2025_100890
crossref_primary_10_3390_ijms24043775
crossref_primary_10_1016_j_procs_2023_10_500
crossref_primary_10_1016_j_ymeth_2025_01_020
crossref_primary_10_1093_bib_bbae579
crossref_primary_10_1016_j_ejmech_2025_117825
crossref_primary_10_3390_antib13030074
crossref_primary_10_1007_s12539_025_00732_4
crossref_primary_10_1016_j_alit_2025_08_004
crossref_primary_10_3390_ijms25158426
crossref_primary_10_1038_s41578_025_00793_3
crossref_primary_10_1038_s41598_025_88445_y
crossref_primary_10_1109_TNB_2023_3278033
crossref_primary_10_1145_3611651
crossref_primary_10_1093_bib_bbae330
crossref_primary_10_1089_hs_2024_0075
crossref_primary_10_3390_v17091199
crossref_primary_10_1007_s11427_024_2578_6
crossref_primary_10_1093_bib_bbae695
crossref_primary_10_1093_bioadv_vbae119
crossref_primary_10_1109_TPAMI_2025_3585179
crossref_primary_10_3390_biom14040409
crossref_primary_10_1093_bioadv_vbaf204
crossref_primary_10_1002_pro_5239
crossref_primary_10_1109_JBHI_2023_3273333
crossref_primary_10_1093_nsr_nwaf028
crossref_primary_10_55696_ejset_1620495
crossref_primary_10_1002_imo2_45
crossref_primary_10_1145_3715318
crossref_primary_10_1016_j_knosys_2024_112479
crossref_primary_10_1038_s42003_025_08282_7
crossref_primary_10_1016_j_csbj_2024_05_040
crossref_primary_10_1109_JBHI_2025_3530794
crossref_primary_10_1002_mlf2_12157
Cites_doi 10.1109/TKDE.2009.191
10.1016/S0022-2836(05)80360-2
10.1093/nar/gkt1223
10.1073/pnas.2016239118
10.1093/nar/gku1267
10.1093/bioinformatics/btm098
10.1093/nar/gkz1064
10.1016/j.csbj.2021.03.022
10.1016/S0006-3495(96)79210-X
10.1093/bioinformatics/btt725
10.1038/nature17995
10.1186/s12859-019-3220-8
10.1093/bioinformatics/btaa003
10.1109/TPAMI.2021.3095381
10.1038/75556
10.1007/978-1-4939-3167-5_2
10.1073/pnas.1914677117
10.1093/nar/25.17.3389
10.1038/s41592-019-0598-1
10.1093/nar/gkt1242
10.18653/v1/K19-1031
10.1002/prot.25415
10.1093/database/baw133
10.1093/bioinformatics/btv345
ContentType Journal Article
Copyright The Author(s) 2022. Published by Oxford University Press. 2022
The Author(s) 2022. Published by Oxford University Press.
Copyright_xml – notice: The Author(s) 2022. Published by Oxford University Press. 2022
– notice: The Author(s) 2022. Published by Oxford University Press.
DBID TOX
AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7X8
5PM
DOI 10.1093/bioinformatics/btac020
DatabaseName Oxford Journals Open Access Collection
CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList
MEDLINE - Academic
MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: TOX
  name: Oxford Journals Open Access Collection
  url: https://academic.oup.com/journals/
  sourceTypes: Publisher
– sequence: 3
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1460-2059
1367-4811
EndPage 2110
ExternalDocumentID PMC9386727
35020807
10_1093_bioinformatics_btac020
10.1093/bioinformatics/btac020
Genre Research Support, Non-U.S. Gov't
Journal Article
GrantInformation_xml – fundername: Israel Science Foundation (ISF)
  grantid: 2753/20
– fundername: ; ;
  grantid: 2753/20
GroupedDBID -~X
.2P
.I3
482
48X
53G
5GY
AAIMJ
AAJKP
AAKPC
AAMVS
AAPQZ
AAPXW
AARHZ
AAVAP
ABEFU
ABEJV
ABGNP
ABJNI
ABNGD
ABNKS
ABPTD
ABSMQ
ABWST
ABXVV
ABZBJ
ACGFS
ACPQN
ACUFI
ACUKT
ACYTK
ADEYI
ADFTL
ADGZP
ADHKW
ADOCK
ADRTK
ADYVW
ADZTZ
ADZXQ
AECKG
AEGPL
AEJOX
AEKKA
AEKPW
AEKSI
AELWJ
AEPUE
AETBJ
AFFNX
AFFZL
AFOFC
AFSHK
AGINJ
AGKRT
AGQXC
AI.
ALMA_UNASSIGNED_HOLDINGS
ALTZX
AQDSO
ARIXL
ASAOO
ATDFG
ATTQO
AXUDD
AYOIW
AZFZN
AZVOD
BHONS
CXTWN
CZ4
DFGAJ
EE~
ELUNK
F5P
F9B
FEDTE
H5~
HAR
HVGLF
HW0
IOX
KSI
KSN
MBTAY
MVM
NGC
PB-
Q1.
Q5Y
QBD
RD5
RIG
ROL
ROZ
RXO
TLC
TN5
TOX
TR2
VH1
WH7
XJT
ZGI
~91
---
-E4
.DC
0R~
23N
2WC
4.4
5WA
70D
AAIJN
AAMDB
AAOGV
AAVLN
AAYXX
ABEUO
ABIXL
ABPQP
ABQLI
ACIWK
ACPRK
ACUXJ
ADBBV
ADEZT
ADGKP
ADHZD
ADMLS
ADPDF
ADRDM
ADVEK
AEMDU
AENEX
AENZO
AEWNT
AFGWE
AFIYH
AFRAH
AGKEF
AGSYK
AHMBA
AHXPO
AIJHB
AJEEA
AJEUX
AKHUL
AKWXX
ALUQC
AMNDL
APIBT
APWMN
ASPBG
AVWKF
BAWUL
BAYMD
BQDIO
BQUQU
BSWAC
BTQHN
C45
CDBKE
CITATION
CS3
DAKXR
DIK
DILTD
DU5
D~K
EBD
EBS
EMOBN
FHSFR
FLIZI
FLUFQ
FOEOM
FQBLK
GAUVT
GJXCC
GROUPED_DOAJ
GX1
H13
HZ~
J21
JXSIZ
KAQDR
KOP
KQ8
M-Z
MK~
ML0
N9A
NLBLG
NMDNZ
NOMLY
NU-
O9-
OAWHX
ODMLO
OJQWA
OK1
OVD
OVEED
P2P
PAFKI
PEELM
PQQKQ
R44
RNS
ROX
RPM
RUSNO
RW1
SV3
TEORI
TJP
W8F
WOQ
X7H
YAYTL
YKOAZ
YXANX
ZKX
~KM
CGR
CUY
CVF
ECM
EIF
NPM
7X8
5PM
ID FETCH-LOGICAL-c522t-7b21abacb1a16a3e361f1240aa3d820f1c874489df893fd8f64e36ed68c5c8723
IEDL.DBID TOX
ISICitedReferencesCount 498
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000767669100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1367-4803
1367-4811
IngestDate Thu Aug 21 14:07:17 EDT 2025
Thu Jul 10 22:09:41 EDT 2025
Mon Jul 21 05:59:52 EDT 2025
Sat Nov 29 03:49:23 EST 2025
Tue Nov 18 21:59:03 EST 2025
Wed Apr 02 07:00:55 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 8
Language English
License This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
https://creativecommons.org/licenses/by/4.0
The Author(s) 2022. Published by Oxford University Press.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c522t-7b21abacb1a16a3e361f1240aa3d820f1c874489df893fd8f64e36ed68c5c8723
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
The authors wish it to be known that, in their opinion, the Nadav Brandes and Dan Ofer should be regarded as Joint First Authors.
ORCID 0000-0002-9357-4526
0000-0002-0510-2546
0000-0002-7218-2558
OpenAccessLink https://dx.doi.org/10.1093/bioinformatics/btac020
PMID 35020807
PQID 2619545963
PQPubID 23479
PageCount 9
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_9386727
proquest_miscellaneous_2619545963
pubmed_primary_35020807
crossref_citationtrail_10_1093_bioinformatics_btac020
crossref_primary_10_1093_bioinformatics_btac020
oup_primary_10_1093_bioinformatics_btac020
PublicationCentury 2000
PublicationDate 2022-04-12
PublicationDateYYYYMMDD 2022-04-12
PublicationDate_xml – month: 04
  year: 2022
  text: 2022-04-12
  day: 12
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle Bioinformatics
PublicationTitleAlternate Bioinformatics
PublicationYear 2022
Publisher Oxford University Press
Publisher_xml – name: Oxford University Press
References Strodthoff (2023020109022215400_btac020-B35) 2020; 36
Ofer (2023020109022215400_btac020-B51) 2014; 30
Hochreiter (2023020109022215400_btac020-B19) 2001
Devlin (2023020109022215400_btac020-B13) 2018
Zaheer (2023020109022215400_btac020-B43) 2020
Andreeva (2023020109022215400_btac020-B45) 2014; 42
2023020109022215400_btac020-B53
Rao (2023020109022215400_btac020-B32) 2021
Andreeva (2023020109022215400_btac020-B46) 2020; 48
Neishi (2023020109022215400_btac020-B25) 2019
Alley (2023020109022215400_btac020-B2) 2019; 16
Radford (2023020109022215400_btac020-B29) 2019; 1
Sarkisyan (2023020109022215400_btac020-B52) 2016; 533
Wang (2023020109022215400_btac020-B40) 2019
Hornbeck (2023020109022215400_btac020-B48) 2015; 43
Choromanski (2023020109022215400_btac020-B11) 2020
Ofer (2023020109022215400_btac020-B26) 2021; 19
Moult (2023020109022215400_btac020-B44) 2018; 86
Clark (2023020109022215400_btac020-B12) 2020
Howard (2023020109022215400_btac020-B20) 2018
2023020109022215400_btac020-B47
Vaswani (2023020109022215400_btac020-B39) 2017
Radford (2023020109022215400_btac020-B28) 2018
Raffel (2023020109022215400_btac020-B30) 2019
Chen (2023020109022215400_btac020-B9) 2020
Chollet (2023020109022215400_btac020-B10) 2015
Elnaggar (2023020109022215400_btac020-B15) 2021; 1
Keskar (2023020109022215400_btac020-B21) 2019
Boutet (2023020109022215400_btac020-B7) 2016
Bepler (2023020109022215400_btac020-B6) 2019
Hendrycks (2023020109022215400_btac020-B18) 2016
Suzek (2023020109022215400_btac020-B37) 2007; 23
Brown (2023020109022215400_btac020-B8) 2020
Yang (2023020109022215400_btac020-B41) 2019
Ofer (2023020109022215400_btac020-B49) 2015; 31
Pan (2023020109022215400_btac020-B27) 2010; 22
Abadi (2023020109022215400_btac020-B1) 2016
Rao (2023020109022215400_btac020-B31) 2019; 32
Altschul (2023020109022215400_btac020-B4) 1997; 25
Altschul (2023020109022215400_btac020-B3) 1990; 215
Strait (2023020109022215400_btac020-B34) 1996; 71
Sturmfels (2023020109022215400_btac020-B36) 2020
Heinzinger (2023020109022215400_btac020-B17) 2019; 20
Do (2023020109022215400_btac020-B14) 2005; 18
Brandes (2023020109022215400_btac020-B50) 2016; 2016
Finn (2023020109022215400_btac020-B16) 2014; 42
Ashburner (2023020109022215400_btac020-B5) 2000; 25
Nambiar (2023020109022215400_btac020-B24) 2020
Yang (2023020109022215400_btac020-B42) 2020; 117
Madani (2023020109022215400_btac020-B23) 2020
Rives (2023020109022215400_btac020-B33) 2021; 118
Thrun (2023020109022215400_btac020-B38) 1996
Lan (2023020109022215400_btac020-B22) 2019
References_xml – volume: 22
  start-page: 1345
  year: 2010
  ident: 2023020109022215400_btac020-B27
  article-title: A survey on transfer learning
  publication-title: IEEE Trans. Knowl. Data Eng
  doi: 10.1109/TKDE.2009.191
– year: 2020
  ident: 2023020109022215400_btac020-B12
– volume: 215
  start-page: 403
  year: 1990
  ident: 2023020109022215400_btac020-B3
  article-title: Basic local alignment search tool
  publication-title: J. Mol. Biol
  doi: 10.1016/S0022-2836(05)80360-2
– volume: 42
  start-page: D222
  year: 2014
  ident: 2023020109022215400_btac020-B16
  article-title: Pfam: the protein families database
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkt1223
– year: 2019
  ident: 2023020109022215400_btac020-B41
– year: 2016
  ident: 2023020109022215400_btac020-B18
– volume: 118
  start-page: e2016239118
  year: 2021
  ident: 2023020109022215400_btac020-B33
  article-title: Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences
  publication-title: Proc. Natl. Acad. Sci. USA
  doi: 10.1073/pnas.2016239118
– start-page: 1
  volume-title: Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
  year: 2020
  ident: 2023020109022215400_btac020-B24
– volume: 43
  start-page: D512
  year: 2015
  ident: 2023020109022215400_btac020-B48
  article-title: PhosphoSitePlus, 2014: mutations, PTMs and recalibrations
  publication-title: Nucleic AcidsRes
  doi: 10.1093/nar/gku1267
– volume: 23
  start-page: 1282
  year: 2007
  ident: 2023020109022215400_btac020-B37
  article-title: UniRef: comprehensive and non-redundant UniProt reference clusters
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btm098
– volume: 1
  start-page: 9
  year: 2019
  ident: 2023020109022215400_btac020-B29
  article-title: Language models are unsupervised multitask learners
  publication-title: OpenAI Blog
– year: 2020
  ident: 2023020109022215400_btac020-B23
– year: 2015
  ident: 2023020109022215400_btac020-B10
– year: 2018
  ident: 2023020109022215400_btac020-B13
– volume: 48
  start-page: D376
  year: 2020
  ident: 2023020109022215400_btac020-B46
  article-title: The SCOP database in 2020: expanded classification of representativefamily and superfamily domains of known protein structures
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkz1064
– volume: 19
  start-page: 1750
  year: 2021
  ident: 2023020109022215400_btac020-B26
  article-title: The language of proteins: NLP, machine learning & protein sequences
  publication-title: Comput. Struct. Biotechnol. J
  doi: 10.1016/j.csbj.2021.03.022
– ident: 2023020109022215400_btac020-B53
– year: 2018
  ident: 2023020109022215400_btac020-B20
– volume: 71
  start-page: 148
  year: 1996
  ident: 2023020109022215400_btac020-B34
  article-title: The Shannon information entropy of protein sequences
  publication-title: Biophys. J
  doi: 10.1016/S0006-3495(96)79210-X
– year: 2020
  ident: 2023020109022215400_btac020-B43
– volume: 30
  start-page: 931
  year: 2014
  ident: 2023020109022215400_btac020-B51
  article-title: NeuroPID: a predictorfor identifying neuropeptide precursors from metazoan proteomes
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btt725
– year: 2020
  ident: 2023020109022215400_btac020-B8
– volume: 533
  start-page: 397
  year: 2016
  ident: 2023020109022215400_btac020-B52
  article-title: Local fitness landscape of the green fluorescent protein
  publication-title: Nature
  doi: 10.1038/nature17995
– start-page: 237
  year: 2001
  ident: 2023020109022215400_btac020-B19
– ident: 2023020109022215400_btac020-B47
– volume: 20
  start-page: 1
  year: 2019
  ident: 2023020109022215400_btac020-B17
  article-title: Modeling aspects of the language of life through transfer-learning protein sequences
  publication-title: BMC Bioinformatics
  doi: 10.1186/s12859-019-3220-8
– volume: 36
  start-page: 2401
  year: 2020
  ident: 2023020109022215400_btac020-B35
  article-title: UDSMProt: universal deep sequence models for protein classification
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btaa003
– volume: 1
  start-page: 1
  year: 2021
  ident: 2023020109022215400_btac020-B15
  article-title: ProtTrans: towards cracking the language of lifes code through self-supervised deep learning and high performance computing
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell
  doi: 10.1109/TPAMI.2021.3095381
– year: 2019
  ident: 2023020109022215400_btac020-B30
– year: 2020
  ident: 2023020109022215400_btac020-B11
– year: 2020
  ident: 2023020109022215400_btac020-B9
– start-page: 265
  volume-title: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI16)
  year: 2016
  ident: 2023020109022215400_btac020-B1
– volume: 25
  start-page: 25
  year: 2000
  ident: 2023020109022215400_btac020-B5
  article-title: Gene ontology: tool for the unification of biology
  publication-title: Nat. Genet
  doi: 10.1038/75556
– start-page: 23
  volume-title: Plant Bioinformatics
  year: 2016
  ident: 2023020109022215400_btac020-B7
  doi: 10.1007/978-1-4939-3167-5_2
– year: 2021
  ident: 2023020109022215400_btac020-B32
– year: 2017
  ident: 2023020109022215400_btac020-B39
– year: 2019
  ident: 2023020109022215400_btac020-B22
– year: 2019
  ident: 2023020109022215400_btac020-B6
– volume: 117
  start-page: 1496
  year: 2020
  ident: 2023020109022215400_btac020-B42
  article-title: Improved protein structure prediction using predicted interresidue orientations
  publication-title: Proc. Natl. Acad. Sci. USA
  doi: 10.1073/pnas.1914677117
– start-page: 640
  volume-title: Advances in Neural Information Processing Systems
  year: 1996
  ident: 2023020109022215400_btac020-B38
– volume: 25
  start-page: 3389
  year: 1997
  ident: 2023020109022215400_btac020-B4
  article-title: Gapped BLAST and PSI-BLAST : a new generation of protein database search programs
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/25.17.3389
– volume: 18
  start-page: 299
  year: 2005
  ident: 2023020109022215400_btac020-B14
  article-title: Transfer learning for text classification
  publication-title: Adv. Neural Inf. Process. Syst
– volume: 32
  start-page: 9689
  year: 2019
  ident: 2023020109022215400_btac020-B31
  article-title: Evaluating protein transfer learning with tape
  publication-title: Adv. Neural Inf. Process. Syst
– year: 2019
  ident: 2023020109022215400_btac020-B40
– year: 2018
  ident: 2023020109022215400_btac020-B28
– year: 2020
  ident: 2023020109022215400_btac020-B36
– volume: 16
  start-page: 1315
  year: 2019
  ident: 2023020109022215400_btac020-B2
  article-title: Unified rational protein engineering with sequence-based deep representation learning
  publication-title: Nat. Methods
  doi: 10.1038/s41592-019-0598-1
– volume: 42
  start-page: D310
  year: 2014
  ident: 2023020109022215400_btac020-B45
  article-title: SCOP2 prototype: a new approach to protein structure mining
  publication-title: NucleicAcids Res
  doi: 10.1093/nar/gkt1242
– year: 2019
  ident: 2023020109022215400_btac020-B21
– start-page: 328
  volume-title: Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)
  year: 2019
  ident: 2023020109022215400_btac020-B25
  doi: 10.18653/v1/K19-1031
– volume: 86
  start-page: 7
  year: 2018
  ident: 2023020109022215400_btac020-B44
  article-title: Critical assessment of methods of protein structure prediction (CASP)—Round XII
  publication-title: Proteins Struct Funct Bioinforma
  doi: 10.1002/prot.25415
– volume: 2016
  year: 2016
  ident: 2023020109022215400_btac020-B50
  article-title: ASAP: Amachine learning framework for local protein properties
  publication-title: Database
  doi: 10.1093/database/baw133
– volume: 31
  start-page: 3429
  year: 2015
  ident: 2023020109022215400_btac020-B49
  article-title: ProFET: Featureengineering captures high-level protein functions
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btv345
SSID ssj0051444
ssj0005056
Score 2.7456248
Snippet Abstract Summary Self-supervised deep language modeling has shown unprecedented success across natural language tasks, and has recently been repurposed to...
Self-supervised deep language modeling has shown unprecedented success across natural language tasks, and has recently been repurposed to biological sequences....
SourceID pubmedcentral
proquest
pubmed
crossref
oup
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 2102
SubjectTerms Amino Acid Sequence
Deep Learning
Language
Natural Language Processing
Original Papers
Proteins - chemistry
Title ProteinBERT: a universal deep-learning model of protein sequence and function
URI https://www.ncbi.nlm.nih.gov/pubmed/35020807
https://www.proquest.com/docview/2619545963
https://pubmed.ncbi.nlm.nih.gov/PMC9386727
Volume 38
WOSCitedRecordID wos000767669100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVASL
  databaseName: Oxford Journals Open Access Collection
  customDbUrl:
  eissn: 1460-2059
  dateEnd: 20220930
  omitProxy: false
  ssIdentifier: ssj0005056
  issn: 1367-4803
  databaseCode: TOX
  dateStart: 19850101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
– providerCode: PRVASL
  databaseName: Oxford Journals Open Access Collection
  customDbUrl:
  eissn: 1460-2059
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0005056
  issn: 1367-4803
  databaseCode: TOX
  dateStart: 19850101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bS8MwFD5MUfDF-2XeiOCTUNY0vaS-qWz44A2ZsreSpokOpBtzE_z3njTtXAXx8lhyoc1Jcr5yzvk-gGM3YyKWMnU05drxI8kcgUjZUaHQUnpMpUUh7eNVdHPDe734rgG0qoX5GsKPWSvtD0oSUUNc3ErHQiLGwVuXBtxoFnRve59JHa6hhrEPCAV8q2lrqL25y6oC4W_nrPmmWr3bDOz8mj054446K__4kFVYLrEnObObZQ0aKl-HRatG-b4B13eGs6Gfn7fvu6dEkInN2cARmVJDp9SXeCKFeA4ZaDK0_UmVjk1EnhHjKI2xN-Gh0-5eXDql2oIjEYONnSj1qEiFTKmgoWCKhVSj83eFYBnCBE2lYcrncaYR4uiM69DHPioLuQywyWNbMJ8PcrUDxGOBUmEoaSyEjxAl9ZXk-OsjQ0Z9JXQTgmqdE1lSkRtFjJfEhsRZUl-qpFyqJrSm44aWjOPHESdoxl93PqqsneAhM5ETkavB5DUxv5kINfGyasK2tf50ThYYnVM3akJU2xfTDobAu96S958LIu-YcRMI3_3LS-7BkmcqMAq6yX2YH48m6gAW5Nu4_zo6hLmoxw-LM_EBn7wT0g
linkProvider Oxford University Press
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=ProteinBERT%3A+a+universal+deep-learning+model+of+protein+sequence+and+function&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=Brandes%2C+Nadav&rft.au=Ofer%2C+Dan&rft.au=Peleg%2C+Yam&rft.au=Rappoport%2C+Nadav&rft.date=2022-04-12&rft.pub=Oxford+University+Press&rft.issn=1367-4803&rft.eissn=1367-4811&rft.volume=38&rft.issue=8&rft.spage=2102&rft.epage=2110&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtac020&rft_id=info%3Apmid%2F35020807&rft.externalDocID=PMC9386727
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon