BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models

Abstract In order to uncover the meanings of ‘book of life’, 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in this study, which are able to extract the linguistic properties of ‘book of life’. We also extend the BLMs into a system called Bio...

Full description

Saved in:
Bibliographic Details
Published in:Nucleic acids research Vol. 49; no. 22; p. e129
Main Authors: Li, Hong-Liang, Pang, Yi-He, Liu, Bin
Format: Journal Article
Language:English
Published: England Oxford University Press 16.12.2021
Subjects:
ISSN:0305-1048, 1362-4962, 1362-4962
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Abstract In order to uncover the meanings of ‘book of life’, 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in this study, which are able to extract the linguistic properties of ‘book of life’. We also extend the BLMs into a system called BioSeq-BLM for automatically representing and analyzing the sequence data. Experimental results show that the predictors generated by BioSeq-BLM achieve comparable or even obviously better performance than the exiting state-of-the-art predictors published in literatures, indicating that BioSeq-BLM will provide new approaches for biological sequence analysis based on natural language processing technologies, and contribute to the development of this very important field. In order to help the readers to use BioSeq-BLM for their own experiments, the corresponding web server and stand-alone package are established and released, which can be freely accessed at http://bliulab.net/BioSeq-BLM/.
AbstractList In order to uncover the meanings of ‘book of life’, 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in this study, which are able to extract the linguistic properties of ‘book of life’. We also extend the BLMs into a system called BioSeq-BLM for automatically representing and analyzing the sequence data. Experimental results show that the predictors generated by BioSeq-BLM achieve comparable or even obviously better performance than the exiting state-of-the-art predictors published in literatures, indicating that BioSeq-BLM will provide new approaches for biological sequence analysis based on natural language processing technologies, and contribute to the development of this very important field. In order to help the readers to use BioSeq-BLM for their own experiments, the corresponding web server and stand-alone package are established and released, which can be freely accessed at http://bliulab.net/BioSeq-BLM/.
Abstract In order to uncover the meanings of ‘book of life’, 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in this study, which are able to extract the linguistic properties of ‘book of life’. We also extend the BLMs into a system called BioSeq-BLM for automatically representing and analyzing the sequence data. Experimental results show that the predictors generated by BioSeq-BLM achieve comparable or even obviously better performance than the exiting state-of-the-art predictors published in literatures, indicating that BioSeq-BLM will provide new approaches for biological sequence analysis based on natural language processing technologies, and contribute to the development of this very important field. In order to help the readers to use BioSeq-BLM for their own experiments, the corresponding web server and stand-alone package are established and released, which can be freely accessed at http://bliulab.net/BioSeq-BLM/.
In order to uncover the meanings of 'book of life', 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in this study, which are able to extract the linguistic properties of 'book of life'. We also extend the BLMs into a system called BioSeq-BLM for automatically representing and analyzing the sequence data. Experimental results show that the predictors generated by BioSeq-BLM achieve comparable or even obviously better performance than the exiting state-of-the-art predictors published in literatures, indicating that BioSeq-BLM will provide new approaches for biological sequence analysis based on natural language processing technologies, and contribute to the development of this very important field. In order to help the readers to use BioSeq-BLM for their own experiments, the corresponding web server and stand-alone package are established and released, which can be freely accessed at http://bliulab.net/BioSeq-BLM/.In order to uncover the meanings of 'book of life', 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in this study, which are able to extract the linguistic properties of 'book of life'. We also extend the BLMs into a system called BioSeq-BLM for automatically representing and analyzing the sequence data. Experimental results show that the predictors generated by BioSeq-BLM achieve comparable or even obviously better performance than the exiting state-of-the-art predictors published in literatures, indicating that BioSeq-BLM will provide new approaches for biological sequence analysis based on natural language processing technologies, and contribute to the development of this very important field. In order to help the readers to use BioSeq-BLM for their own experiments, the corresponding web server and stand-alone package are established and released, which can be freely accessed at http://bliulab.net/BioSeq-BLM/.
Author Li, Hong-Liang
Pang, Yi-He
Liu, Bin
Author_xml – sequence: 1
  givenname: Hong-Liang
  surname: Li
  fullname: Li, Hong-Liang
  email: bliu@bliulab.net
– sequence: 2
  givenname: Yi-He
  surname: Pang
  fullname: Pang, Yi-He
– sequence: 3
  givenname: Bin
  orcidid: 0000-0003-3685-9469
  surname: Liu
  fullname: Liu, Bin
  email: bliu@bliulab.net
BackLink https://www.ncbi.nlm.nih.gov/pubmed/34581805$$D View this record in MEDLINE/PubMed
BookMark eNp9kc1u1TAQhS1URG8LK_bIK4QEoXbsOE4XSLflV7oUCbq3HGeSGhw7tROk9mn6LDwZru4tAoTYzCzmm3M0cw7Qng8eEHpMyUtKGnbkdTwavulWls09tKJMlAVvRLmHVoSRqqCEy310kNJXQiinFX-A9hmvJJWkWqGLExu-wGVxsvl4jDWenJ77EEecC9Zeu6tr6wf8-mz9An8-W_-40b7DUwwzWI8TXC7gDSTc6gQdDh63NrgwWKMddtoPix4Aj6EDlx6i-712CR7t-iE6f_vm_PR9sfn07sPpelMYXsu5YARaAabvOYdetH0rDBjogLJSAiVAOBG5ya4TNUgNUjasbg2wstFEGHaIXm1lp6UdoTPg56idmqIddbxSQVv158TbCzWE70oKWdZNnQWe7QRiyOelWY02GXD5HAhLUmVV1_l7jLCMPvnd65fJ3XczQLeAiSGlCL0ydtazDbfW1ilK1G2CKieodgnmned_7dzJ_pt-uqXDMv0X_AkhhK5J
CitedBy_id crossref_primary_10_1016_j_ymeth_2025_03_001
crossref_primary_10_1093_nar_gkac824
crossref_primary_10_1016_j_ymeth_2025_03_002
crossref_primary_10_1093_bfgp_elad007
crossref_primary_10_3389_fcell_2021_801113
crossref_primary_10_3389_fmed_2025_1503229
crossref_primary_10_1016_j_ijbiomac_2024_134146
crossref_primary_10_3389_fpls_2025_1618174
crossref_primary_10_1016_j_compbiomed_2022_105940
crossref_primary_10_1016_j_compbiolchem_2024_108212
crossref_primary_10_1016_j_csbj_2022_07_043
crossref_primary_10_1093_bib_bbac218
crossref_primary_10_1016_j_compbiolchem_2025_108612
crossref_primary_10_1109_TNNLS_2024_3419250
crossref_primary_10_1016_j_future_2024_05_029
crossref_primary_10_1093_bioadv_vbad043
crossref_primary_10_1093_bib_bbad261
crossref_primary_10_1038_s41598_024_72512_x
crossref_primary_10_1016_j_ijbiomac_2023_124993
crossref_primary_10_1021_acsomega_5c01924
crossref_primary_10_1007_s00299_024_03294_9
crossref_primary_10_1093_bib_bbaf447
crossref_primary_10_1093_bfgp_elad012
crossref_primary_10_1080_15592294_2022_2158284
crossref_primary_10_1186_s12915_024_01883_4
crossref_primary_10_1016_j_compbiomed_2022_105938
crossref_primary_10_1021_acs_jcim_5c01073
crossref_primary_10_3389_fgene_2021_801261
crossref_primary_10_1093_bioinformatics_btaf414
crossref_primary_10_1371_journal_pcbi_1011214
crossref_primary_10_1371_journal_pcbi_1012544
crossref_primary_10_1016_j_ymeth_2022_11_001
crossref_primary_10_3389_fdata_2021_727216
crossref_primary_10_3389_fmed_2025_1529335
crossref_primary_10_1016_j_ymeth_2024_05_012
crossref_primary_10_1093_bib_bbae469
crossref_primary_10_3389_fgene_2021_808856
crossref_primary_10_1016_j_ygeno_2025_111037
crossref_primary_10_1016_j_ymeth_2024_05_010
crossref_primary_10_1093_bib_bbae504
crossref_primary_10_26599_BDMA_2024_9020018
crossref_primary_10_1016_j_eswa_2024_125981
crossref_primary_10_1371_journal_pcbi_1010404
crossref_primary_10_1016_j_jmb_2024_168653
crossref_primary_10_1007_s11704_024_40072_y
crossref_primary_10_1186_s12915_025_02314_8
crossref_primary_10_1016_j_inffus_2025_103227
crossref_primary_10_1016_j_jmb_2025_168978
crossref_primary_10_1126_sciadv_adv0778
crossref_primary_10_1186_s12915_023_01803_y
crossref_primary_10_1016_j_compbiomed_2024_108963
crossref_primary_10_1093_bib_bbad212
crossref_primary_10_3389_fpls_2025_1626539
crossref_primary_10_1093_bib_bbac243
crossref_primary_10_1016_j_compbiomed_2022_105605
crossref_primary_10_1016_j_ijbiomac_2024_136940
crossref_primary_10_1007_s11432_024_4147_8
crossref_primary_10_1093_bioinformatics_btae581
crossref_primary_10_3390_genes15081090
crossref_primary_10_1016_j_future_2024_06_008
crossref_primary_10_1039_D5CP00785B
crossref_primary_10_1002_jcc_70111
crossref_primary_10_1093_bib_bbac236
crossref_primary_10_1109_TCBBIO_2025_3565912
crossref_primary_10_1016_j_ymthe_2022_05_001
crossref_primary_10_1007_s11432_024_4457_2
crossref_primary_10_1109_TCBB_2024_3425644
crossref_primary_10_3389_fgene_2024_1443532
crossref_primary_10_3390_ijms25137049
crossref_primary_10_1016_j_heliyon_2024_e41488
crossref_primary_10_1093_bib_bbaf189
crossref_primary_10_1016_j_ymeth_2022_01_004
crossref_primary_10_1186_s12915_024_02030_9
crossref_primary_10_1093_bib_bbae534
crossref_primary_10_1016_j_ymeth_2022_08_015
crossref_primary_10_1002_pmic_202400044
crossref_primary_10_1093_bib_bbaf061
crossref_primary_10_1038_s42003_024_07411_y
crossref_primary_10_1016_j_compbiomed_2024_108339
crossref_primary_10_1016_j_compbiolchem_2024_108282
crossref_primary_10_1016_j_ymeth_2025_04_011
crossref_primary_10_1016_j_compbiomed_2022_106523
crossref_primary_10_1016_j_compbiomed_2022_106489
crossref_primary_10_1016_j_ymeth_2024_09_010
crossref_primary_10_3390_foods14122014
crossref_primary_10_3390_ijms26062468
crossref_primary_10_1016_j_ymeth_2024_09_017
crossref_primary_10_1186_s12915_025_02206_x
crossref_primary_10_3390_ijms242216496
crossref_primary_10_1016_j_ymeth_2024_12_009
crossref_primary_10_3389_fgene_2021_827161
crossref_primary_10_1016_j_cels_2023_05_007
crossref_primary_10_1016_j_ab_2025_115968
crossref_primary_10_1093_bib_bbae169
crossref_primary_10_1109_TCBB_2022_3183018
crossref_primary_10_1093_bib_bbac265
crossref_primary_10_3389_fmicb_2022_790063
crossref_primary_10_1016_j_compbiomed_2024_108229
crossref_primary_10_1016_j_ymeth_2024_06_012
crossref_primary_10_1016_j_ijbiomac_2022_12_315
crossref_primary_10_3390_info15030163
crossref_primary_10_1016_j_compbiomed_2024_107937
crossref_primary_10_1016_j_compbiomed_2023_106711
crossref_primary_10_1016_j_ijbiomac_2025_146849
crossref_primary_10_3389_fgene_2024_1369811
crossref_primary_10_1093_bib_bbad347
crossref_primary_10_1109_TCBB_2022_3150280
crossref_primary_10_1038_s41467_022_32007_7
crossref_primary_10_3389_fgene_2021_818841
crossref_primary_10_3389_fcell_2021_803608
crossref_primary_10_1016_j_ijbiomac_2024_130659
crossref_primary_10_1016_j_ymeth_2022_10_008
crossref_primary_10_1002_pmic_202200409
crossref_primary_10_2174_1574893618666230516144641
crossref_primary_10_1002_aic_17781
crossref_primary_10_1049_syb2_12098
crossref_primary_10_1186_s12915_024_01968_0
crossref_primary_10_1016_j_compbiomed_2024_107941
crossref_primary_10_3390_ijms25189844
crossref_primary_10_1049_syb2_12090
crossref_primary_10_1016_j_compbiomed_2022_105577
crossref_primary_10_1109_TCBB_2021_3136905
crossref_primary_10_1016_j_compbiomed_2023_107654
crossref_primary_10_1016_j_future_2024_06_039
crossref_primary_10_1186_s12967_024_05706_6
crossref_primary_10_3390_ijms26146701
crossref_primary_10_3389_fgene_2022_952649
crossref_primary_10_1093_nar_gkad055
crossref_primary_10_1049_syb2_12104
crossref_primary_10_1016_j_future_2025_107801
crossref_primary_10_1021_acs_jcim_5c00895
crossref_primary_10_1016_j_compbiomed_2024_108484
crossref_primary_10_1007_s11432_024_4171_9
crossref_primary_10_1093_bib_bbad093
crossref_primary_10_1186_s12915_024_02064_z
crossref_primary_10_1016_j_ymeth_2024_01_017
crossref_primary_10_1021_acs_jcim_5c00530
crossref_primary_10_1371_journal_pcbi_1011450
crossref_primary_10_1016_j_compbiomed_2024_108129
crossref_primary_10_1016_j_compbiomed_2024_108249
crossref_primary_10_1038_s42003_025_07615_w
crossref_primary_10_1021_acs_jcim_4c00003
crossref_primary_10_1109_TCBBIO_2024_3524677
crossref_primary_10_1002_prca_70009
crossref_primary_10_1016_j_compbiomed_2024_108408
crossref_primary_10_1093_bib_bbad251
crossref_primary_10_1371_journal_pcbi_1010511
crossref_primary_10_1016_j_jmb_2022_167604
crossref_primary_10_3389_fgene_2022_857839
crossref_primary_10_1049_syb2_12105
crossref_primary_10_1016_j_jmb_2024_168741
crossref_primary_10_1016_j_chemolab_2024_105239
crossref_primary_10_1093_bib_bbaf024
crossref_primary_10_1093_bib_bbaf026
crossref_primary_10_2174_0115748936299044240202100019
crossref_primary_10_1016_j_compbiomed_2024_108534
crossref_primary_10_1021_acs_biochem_5c00237
crossref_primary_10_3389_fgene_2022_768971
Cites_doi 10.1093/bioinformatics/bti1047
10.1093/bioinformatics/btr565
10.1016/j.ab.2014.04.001
10.1093/bib/bbz133
10.1186/s12863-018-0633-8
10.1093/nar/gkn159
10.1080/00437956.1954.11659520
10.1371/journal.pone.0168288
10.1016/S0968-0004(98)01298-5
10.1002/bip.360270308
10.1023/A:1007091128394
10.1002/ajpa.20250
10.1016/j.eswa.2010.08.066
10.1093/bioinformatics/btv042
10.1016/j.gpb.2019.01.004
10.1038/nature01255
10.1093/nar/gkv458
10.1162/neco.1997.9.8.1735
10.1016/j.compeleceng.2013.11.024
10.1109/TIT.1956.1056813
10.1006/csla.2001.0174
10.1002/prot.10081
10.1371/journal.pone.0046633
10.1145/2133806.2133826
10.1093/nar/gkn597
10.1613/jair.953
10.1016/j.gde.2015.08.010
10.1093/nar/gkz740
10.1080/01638539809545028
10.1038/s41586-019-1923-7
10.1038/nbt.3300
10.1016/j.jmb.2020.09.008
10.1093/bioinformatics/bti687
10.3115/v1/D14-1162
10.1142/9781848162648_0011
10.1093/bioinformatics/btu624
10.1093/nar/gkab122
10.1073/pnas.1814684116
10.1093/bioinformatics/17.9.763
10.1093/bioinformatics/btu602
10.1002/j.1538-7305.1948.tb01338.x
10.1002/minf.201400025
10.1038/s41592-019-0360-8
10.1002/(SICI)1098-111X(200007)15:7<633::AID-INT4>3.0.CO;2-8
10.1093/bioinformatics/btw186
10.1007/978-3-642-40270-8_4
10.1186/s12859-015-0544-x
10.1002/jmr.1061
10.1093/bioinformatics/btw678
10.1038/d41586-020-03348-4
10.1038/s41587-019-0140-0
10.1093/nar/gkm998
10.1093/bioinformatics/btz825
10.1016/j.ymssp.2006.05.004
10.1093/bioinformatics/17.7.579
10.1093/bioinformatics/btaa534
10.1145/1961189.1961199
10.1109/ACCESS.2019.2952237
10.1186/1471-2105-9-510
10.18653/v1/2020.acl-main.18
10.1371/journal.pcbi.1000134
10.1038/s41467-020-17155-y
10.1007/BF00818163
10.1371/journal.pone.0121501
10.3390/ijms19092483
10.1093/bib/bbx165
10.1371/journal.pone.0153268
10.1145/3055635.3056643
10.1093/bib/bbz041
10.1093/bioinformatics/btg431
10.1093/nar/gku1019
10.1186/s12859-017-1792-8
10.1093/bioinformatics/btw730
10.1016/j.patcog.2006.09.012
10.1109/29.32278
10.1186/1471-2105-15-S16-S3
10.1002/jmri.24365
10.3389/fgene.2018.00495
10.1109/4233.966104
10.1561/2200000013
10.1093/bioinformatics/btg185
10.1093/nar/gkz203
10.1093/bib/bbz098
10.1093/bioinformatics/btp500
10.1145/331499.331504
10.1038/nrm1785
10.3115/v1/D14-1179
10.1016/j.dss.2012.01.016
10.1126/science.1136800
ContentType Journal Article
Copyright The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. 2021
The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.
Copyright_xml – notice: The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. 2021
– notice: The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.
DBID TOX
AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7X8
5PM
DOI 10.1093/nar/gkab829
DatabaseName Oxford Journals Open Access Collection
CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList
MEDLINE

MEDLINE - Academic
CrossRef
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: TOX
  name: Oxford Journals Open Access Collection
  url: https://academic.oup.com/journals/
  sourceTypes: Publisher
– sequence: 3
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Anatomy & Physiology
Chemistry
EISSN 1362-4962
EndPage e129
ExternalDocumentID PMC8682797
34581805
10_1093_nar_gkab829
10.1093/nar/gkab829
Genre Research Support, Non-U.S. Gov't
Journal Article
GrantInformation_xml – fundername: ;
  grantid: JQ19019
– fundername: ;
  grantid: 61822306; 61861146002; 61732012
– fundername: ;
  grantid: 2018AAA0100100
GroupedDBID ---
-DZ
-~X
.55
.GJ
.I3
0R~
123
18M
1TH
29N
2WC
3O-
4.4
482
53G
5VS
5WA
6.Y
70E
85S
A8Z
AAFWJ
AAHBH
AAMVS
AAOGV
AAPPN
AAPXW
AAUQX
AAVAP
AAWDT
AAYJJ
ABPTD
ABQLI
ABQTQ
ABSAR
ABSMQ
ABXVV
ACFRR
ACGFO
ACGFS
ACIPB
ACIWK
ACMRT
ACNCT
ACPQN
ACPRK
ACUTJ
ACZBC
ADBBV
ADHZD
AEGXH
AEKPW
AENEX
AENZO
AFFNX
AFPKN
AFRAH
AFSHK
AFULF
AFYAG
AGKRT
AGMDO
AHMBA
AIAGR
ALMA_UNASSIGNED_HOLDINGS
ALUQC
ANFBD
AOIJS
AQDSO
ASAOO
ASPBG
ATDFG
ATTQO
AVWKF
AZFZN
BAWUL
BAYMD
BCNDV
BEYMZ
BTTYL
C1A
CAG
CIDKT
COF
CS3
CXTWN
CZ4
D0S
DFGAJ
DIK
DU5
D~K
E3Z
EBD
EBS
EJD
ELUNK
EMOBN
ESTFP
F20
F5P
FEDTE
GROUPED_DOAJ
GX1
H13
HH5
HVGLF
HYE
HZ~
H~9
IH2
KAQDR
KC5
KQ8
KSI
M49
MBTAY
MVM
M~E
NTWIH
NU-
OAWHX
OBC
OBS
OEB
OES
OJQWA
OVD
O~Y
P2P
PB-
PEELM
PQQKQ
QBD
R44
RD5
RNI
RNS
ROL
ROX
ROZ
RPM
RXO
RZF
RZO
SJN
SV3
TCN
TEORI
TN5
TOX
TR2
UHB
WG7
WOQ
X7H
X7M
XSB
XSW
YSK
ZKX
ZXP
~91
~D7
~KM
AAYXX
ABEJV
ABGNP
AMNDL
CITATION
OVT
ADIXU
CGR
CUY
CVF
ECM
EIF
NPM
7X8
5PM
ID FETCH-LOGICAL-c478t-30eb6ecff44ef6bfb6cecede1328e10e040610e8dd67e8ae88937bce329a06c3
IEDL.DBID TOX
ISICitedReferencesCount 185
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000736046000003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0305-1048
1362-4962
IngestDate Tue Sep 30 15:32:42 EDT 2025
Thu Oct 02 11:15:38 EDT 2025
Wed Feb 19 02:27:57 EST 2025
Sat Nov 29 03:25:10 EST 2025
Tue Nov 18 22:18:21 EST 2025
Wed Aug 28 03:17:06 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 22
Language English
License This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
https://creativecommons.org/licenses/by-nc/4.0
The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c478t-30eb6ecff44ef6bfb6cecede1328e10e040610e8dd67e8ae88937bce329a06c3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0003-3685-9469
OpenAccessLink https://dx.doi.org/10.1093/nar/gkab829
PMID 34581805
PQID 2577458303
PQPubID 23479
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_8682797
proquest_miscellaneous_2577458303
pubmed_primary_34581805
crossref_citationtrail_10_1093_nar_gkab829
crossref_primary_10_1093_nar_gkab829
oup_primary_10_1093_nar_gkab829
PublicationCentury 2000
PublicationDate 2021-12-16
PublicationDateYYYYMMDD 2021-12-16
PublicationDate_xml – month: 12
  year: 2021
  text: 2021-12-16
  day: 16
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle Nucleic acids research
PublicationTitleAlternate Nucleic Acids Res
PublicationYear 2021
Publisher Oxford University Press
Publisher_xml – name: Oxford University Press
References Hanson (2021121718030776200_B25) 2019; 17
Chen (2021121718030776200_B18) 2019; 16
Pennington (2021121718030776200_B54) 2014
Noble (2021121718030776200_B35) 2005; 21
Frey (2021121718030776200_B84) 2007; 315
Mittelman (2021121718030776200_B62) 2003; 19
Laboulais (2021121718030776200_B65) 2002; 47
Liu (2021121718030776200_B96) 2015; 10
Chen (2021121718030776200_B69) 2021; 49
Bilgic (2021121718030776200_B81) 2014; 40
Bengio (2021121718030776200_B51) 2003; 3
Ye (2021121718030776200_B60) 2011; 27
Zhang (2021121718030776200_B30) 2011; 38
Hanson (2021121718030776200_B57) 2017; 33
Yeung (2021121718030776200_B92) 2001; 17
Sokal (2021121718030776200_B106) 2006; 129
Strauss (2021121718030776200_B63) 2017; 12
Liu (2021121718030776200_B27) 2012; 7
Zhou (2021121718030776200_B109) 2017; 18
Blei (2021121718030776200_B32) 2012; 55
Senior (2021121718030776200_B9) 2020; 577
Lebret (2021121718030776200_B58) 2015
Joulin (2021121718030776200_B55) 2017; 2
Guyon (2021121718030776200_B89) 2003; 3
Ahmed (2021121718030776200_B74) 2017
Leslie (2021121718030776200_B37) 2004; 20
Liu (2021121718030776200_B76) 2018; 19
Ramos (2021121718030776200_B45) 2003
Wei (2021121718030776200_B94) 2001; 5
Cho (2021121718030776200_B72) 2014
Li (2021121718030776200_B59) 2020; 21
Zhang (2021121718030776200_B100) 2017; 33
Liu (2021121718030776200_B98) 2015; 34
Liu (2021121718030776200_B11) 2019; 20
Cao (2021121718030776200_B15) 2015; 31
Chen (2021121718030776200_B13) 2019; 21
Gupta (2021121718030776200_B34) 2008; 4
Harris (2021121718030776200_B44) 1954; 10
Friedel (2021121718030776200_B7) 2009; 37
Pedregosa (2021121718030776200_B82) 2011; 12
Gimona (2021121718030776200_B5) 2006; 7
Avsec (2021121718030776200_B16) 2019; 37
Chen (2021121718030776200_B108) 2019; 7
Callaway (2021121718030776200_B101) 2020; 588
Farquad (2021121718030776200_B78) 2012; 53
Kitaev (2021121718030776200_B75) 2020
Horne (2021121718030776200_B105) 1988; 27
Lodhi (2021121718030776200_B39) 2002; 2
Mikolov (2021121718030776200_B53) 2013
Chen (2021121718030776200_B33) 2014; 456
Liu (2021121718030776200_B41) 2015; 43
Kawashima (2021121718030776200_B6) 2008; 36
Zhang (2021121718030776200_B23) 2020; 432
Pereira (2021121718030776200_B19) 2015; 16
Lin (2021121718030776200_B40) 2014; 42
Junsomboon (2021121718030776200_B79) 2017
Yu (2021121718030776200_B3) 2019; 116
Rangwala (2021121718030776200_B61) 2005; 21
Altschul (2021121718030776200_B102) 1998; 23
Dong (2021121718030776200_B24) 2009; 25
HARRIS (2021121718030776200_B52) 1954; 10
Darst (2021121718030776200_B90) 2018; 19
Wang (2021121718030776200_B66) 2020; 36
Kumar (2021121718030776200_B99) 2011; 24
Searls (2021121718030776200_B1) 2002; 420
Qiang (2021121718030776200_B28) 2018; 9
Luo (2021121718030776200_B38) 2016; 11
Schmidt (2021121718030776200_B80) 2007
Ke (2021121718030776200_B103) 2020; 36
Kim (2021121718030776200_B86) 2007; 40
Vaswani (2021121718030776200_B73) 2017
Landauer (2021121718030776200_B48) 1998; 25
Chandrashekar (2021121718030776200_B88) 2014; 40
El-Manzalawy (2021121718030776200_B36) 2008; 7
Schölkopf (2021121718030776200_B93) 1997
Liu (2021121718030776200_B12) 2019; 47
Sutton (2021121718030776200_B70) 2012; 4
Chen (2021121718030776200_B8) 2020
Sugumaran (2021121718030776200_B91) 2007; 21
Mihalcea (2021121718030776200_B31) 2004
Bahl (2021121718030776200_B29) 1989; 37
Ramage (2021121718030776200_B50) 2009
Kopp (2021121718030776200_B17) 2020; 11
Liu (2021121718030776200_B42) 2008; 9
Bressin (2021121718030776200_B46) 2019; 47
Biau (2021121718030776200_B68) 2012; 13
Chang (2021121718030776200_B67) 2011; 2
Hochreiter (2021121718030776200_B71) 1997; 9
Ester (2021121718030776200_B85) 1996
Xiao (2021121718030776200_B14) 2015; 31
Bari (2021121718030776200_B26) 2013
Scaiewicz (2021121718030776200_B2) 2015; 35
Chawla (2021121718030776200_B77) 2002; 16
Liu (2021121718030776200_B95) 2016; 32
Alipanahi (2021121718030776200_B10) 2015; 33
Guo (2021121718030776200_B47) 2008; 36
Chomsky (2021121718030776200_B22) 1956; 2
Chen (2021121718030776200_B104) 2015; 31
Liu (2021121718030776200_B43) 2014; 15
Blei (2021121718030776200_B49) 2003; 3
Liu (2021121718030776200_B56) 2020; 21
Feng (2021121718030776200_B107) 2000; 19
Shannon (2021121718030776200_B20) 1948; 27
Skarmeta (2021121718030776200_B87) 2000; 15
Searls (2021121718030776200_B4) 2001; 17
Hofacker (2021121718030776200_B97) 1994; 125
Goodman (2021121718030776200_B21) 2001; 15
Jain (2021121718030776200_B83) 1999; 31
Weinberger (2021121718030776200_B64) 2009; 10
References_xml – year: 2020
  ident: 2021121718030776200_B75
  article-title: Reformer: the efficient transformer
– volume: 10
  start-page: 207
  year: 2009
  ident: 2021121718030776200_B64
  article-title: Distance metric learning for large margin nearest neighbor classification
  publication-title: J. Mach. Learn. Res.
– volume: 21
  start-page: I338
  year: 2005
  ident: 2021121718030776200_B35
  article-title: Predicting the in vivo signature of human gene regulatory sequences
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bti1047
– volume: 27
  start-page: 3356
  year: 2011
  ident: 2021121718030776200_B60
  article-title: An assessment of substitution scores for protein profile-profile comparison
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btr565
– volume: 456
  start-page: 53
  year: 2014
  ident: 2021121718030776200_B33
  article-title: PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition
  publication-title: Anal. Biochem.
  doi: 10.1016/j.ab.2014.04.001
– volume: 3
  start-page: 993
  year: 2003
  ident: 2021121718030776200_B49
  article-title: Latent dirichlet allocation
  publication-title: J. Mach. Learn. Res.
– volume: 21
  start-page: 2133
  year: 2020
  ident: 2021121718030776200_B59
  article-title: MotifCNN-fold: protein fold recognition based on fold-specific features extracted by motif-based convolutional neural networks
  publication-title: Brief. Bioinform.
  doi: 10.1093/bib/bbz133
– volume: 19
  start-page: 353
  year: 2018
  ident: 2021121718030776200_B90
  article-title: Using recursive feature elimination in random forest to account for correlated variables in high dimensional data
  publication-title: BMC Genet.
  doi: 10.1186/s12863-018-0633-8
– volume: 36
  start-page: 3025
  year: 2008
  ident: 2021121718030776200_B47
  article-title: Using support vector machine combined with auto covariance to predict proteinprotein interactions from protein sequences
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gkn159
– volume: 10
  start-page: 146
  year: 1954
  ident: 2021121718030776200_B44
  article-title: Distributional structure
  publication-title: Word
  doi: 10.1080/00437956.1954.11659520
– volume: 12
  start-page: e0168288
  year: 2017
  ident: 2021121718030776200_B63
  article-title: Generalising Ward's method for use with Manhattan distances
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0168288
– year: 2015
  ident: 2021121718030776200_B58
  article-title: “The Sum of Its Parts”: joint learning of word and phrase representations with autoencoders
– volume: 23
  start-page: 444
  year: 1998
  ident: 2021121718030776200_B102
  article-title: Iterated profile searches with PSI-BLAST - a tool for discovery in protein databases
  publication-title: Trends Biochem. Sci.
  doi: 10.1016/S0968-0004(98)01298-5
– volume: 27
  start-page: 451
  year: 1988
  ident: 2021121718030776200_B105
  article-title: Prediction of protein helix content from an auto-correlation analysis of sequence hydrophobicities
  publication-title: Biopolymers
  doi: 10.1002/bip.360270308
– year: 2013
  ident: 2021121718030776200_B53
  article-title: Efficient estimation of word representations in vector space
– volume: 19
  start-page: 269
  year: 2000
  ident: 2021121718030776200_B107
  article-title: Prediction of membrane protein types based on the hydrophobic index of amino acids
  publication-title: J. Protein Chem.
  doi: 10.1023/A:1007091128394
– volume: 129
  start-page: 121
  year: 2006
  ident: 2021121718030776200_B106
  article-title: Population structure inferred by local spatial autocorrelation: an example from an Amerindian tribal population
  publication-title: Am. J. Phys. Anthropol.
  doi: 10.1002/ajpa.20250
– volume: 38
  start-page: 2758
  year: 2011
  ident: 2021121718030776200_B30
  article-title: A comparative study of TF*IDF, LSI and multi-words for text classification
  publication-title: Expert Syst. Appl.
  doi: 10.1016/j.eswa.2010.08.066
– volume: 31
  start-page: 1857
  year: 2015
  ident: 2021121718030776200_B14
  article-title: protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btv042
– volume: 17
  start-page: 645
  year: 2019
  ident: 2021121718030776200_B25
  article-title: SPOT-Disorder2: improved protein intrinsic disorder prediction by ensembled deep learning
  publication-title: Genomics Proteomics Bioinformatics
  doi: 10.1016/j.gpb.2019.01.004
– volume: 420
  start-page: 211
  year: 2002
  ident: 2021121718030776200_B1
  article-title: The language of genes
  publication-title: Nature
  doi: 10.1038/nature01255
– volume: 43
  start-page: W65
  year: 2015
  ident: 2021121718030776200_B41
  article-title: Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gkv458
– volume: 9
  start-page: 1735
  year: 1997
  ident: 2021121718030776200_B71
  article-title: Long short-term memory
  publication-title: Neural Comput.
  doi: 10.1162/neco.1997.9.8.1735
– volume: 40
  start-page: 16
  year: 2014
  ident: 2021121718030776200_B88
  article-title: A survey on feature selection methods
  publication-title: Comput. Electr. Eng.
  doi: 10.1016/j.compeleceng.2013.11.024
– volume: 2
  start-page: 113
  year: 1956
  ident: 2021121718030776200_B22
  article-title: Three models for the description of language
  publication-title: IRE Trans. Inf. Theory
  doi: 10.1109/TIT.1956.1056813
– volume: 15
  start-page: 403
  year: 2001
  ident: 2021121718030776200_B21
  article-title: A bit of progress in language modeling
  publication-title: Comput. Speech Lang.
  doi: 10.1006/csla.2001.0174
– volume: 47
  start-page: 169
  year: 2002
  ident: 2021121718030776200_B65
  article-title: Hamming distance geometry of a protein conformational space: application to the clustering of a 4-ns molecular dynamics trajectory of the HIV-1 integrase catalytic core
  publication-title: Proteins-Struct. Funct. Genet.
  doi: 10.1002/prot.10081
– volume: 7
  start-page: e46633
  year: 2012
  ident: 2021121718030776200_B27
  article-title: Using amino acid physicochemical distance transformation for fast protein remote homology detection
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0046633
– volume: 55
  start-page: 77
  year: 2012
  ident: 2021121718030776200_B32
  article-title: Probabilistic topic models
  publication-title: Commun. ACM
  doi: 10.1145/2133806.2133826
– volume: 37
  start-page: D37
  year: 2009
  ident: 2021121718030776200_B7
  article-title: DiProDB: a database for dinucleotide properties
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gkn597
– volume: 16
  start-page: 321
  year: 2002
  ident: 2021121718030776200_B77
  article-title: SMOTE: synthetic minority over-sampling technique
  publication-title: J. Artif. Intell. Res.
  doi: 10.1613/jair.953
– volume: 35
  start-page: 50
  year: 2015
  ident: 2021121718030776200_B2
  article-title: The language of the protein universe
  publication-title: Curr. Opin. Genet. Dev.
  doi: 10.1016/j.gde.2015.08.010
– volume: 47
  start-page: e127
  year: 2019
  ident: 2021121718030776200_B12
  article-title: BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gkz740
– volume: 25
  start-page: 259
  year: 1998
  ident: 2021121718030776200_B48
  article-title: An introduction to latent semantic analysis
  publication-title: Discourse Processes
  doi: 10.1080/01638539809545028
– volume: 577
  start-page: 706
  year: 2020
  ident: 2021121718030776200_B9
  article-title: Improved protein structure prediction using potentials from deep learning
  publication-title: Nature
  doi: 10.1038/s41586-019-1923-7
– volume: 33
  start-page: 831
  year: 2015
  ident: 2021121718030776200_B10
  article-title: Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning
  publication-title: Nat. Biotechnol.
  doi: 10.1038/nbt.3300
– volume: 432
  start-page: 5860
  year: 2020
  ident: 2021121718030776200_B23
  article-title: iDRBP_MMC: identifying DNA-binding proteins and RNA-binding proteins based on multi-label learning model and motif-based convolutional neural network
  publication-title: J. Mol. Biol.
  doi: 10.1016/j.jmb.2020.09.008
– volume: 2
  start-page: 419
  year: 2002
  ident: 2021121718030776200_B39
  article-title: Text classification using string kernels
  publication-title: J. Mach. Learn. Res.
– volume: 13
  start-page: 1063
  year: 2012
  ident: 2021121718030776200_B68
  article-title: Analysis of a random forests model
  publication-title: J. Mach. Learn. Res.
– volume: 21
  start-page: 4239
  year: 2005
  ident: 2021121718030776200_B61
  article-title: Profile-based direct kernels for remote homology detection and fold recognition
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bti687
– volume: 3
  start-page: 1157
  year: 2003
  ident: 2021121718030776200_B89
  article-title: An introduction to variable and feature selection
  publication-title: J. Mach. Learn. Res.
– start-page: 1532
  year: 2014
  ident: 2021121718030776200_B54
  article-title: Glove: Global vectors for word representation
  publication-title: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing
  doi: 10.3115/v1/D14-1162
– volume: 7
  start-page: 121
  year: 2008
  ident: 2021121718030776200_B36
  article-title: Predicting flexible length linear B-cell epitopes
  publication-title: Comput. Syst. Bioinformatics Conf.
  doi: 10.1142/9781848162648_0011
– volume: 31
  start-page: 279
  year: 2015
  ident: 2021121718030776200_B15
  article-title: Rcpi: R/Bioconductor package to generate various descriptors of proteins, compounds and their interactions
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btu624
– volume: 49
  start-page: e60
  year: 2021
  ident: 2021121718030776200_B69
  article-title: iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization
  publication-title: Nucleic. Acids. Res.
  doi: 10.1093/nar/gkab122
– volume: 116
  start-page: 3636
  year: 2019
  ident: 2021121718030776200_B3
  article-title: Grammar of protein domain architectures
  publication-title: Proc. Natl. Acad. Sci. U.S.A
  doi: 10.1073/pnas.1814684116
– start-page: 133
  volume-title: Proceedings of the First Instructional Conference on Machine Learning
  year: 2003
  ident: 2021121718030776200_B45
  article-title: Using tf-idf to determine word relevance in document queries
– volume: 17
  start-page: 763
  year: 2001
  ident: 2021121718030776200_B92
  article-title: Principal component analysis for clustering gene expression data
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/17.9.763
– volume: 31
  start-page: 119
  year: 2015
  ident: 2021121718030776200_B104
  article-title: PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btu602
– volume: 27
  start-page: 379
  year: 1948
  ident: 2021121718030776200_B20
  article-title: A mathematical theory of communication
  publication-title: Bell Syst. Tech. J.
  doi: 10.1002/j.1538-7305.1948.tb01338.x
– volume: 34
  start-page: 8
  year: 2015
  ident: 2021121718030776200_B98
  article-title: PseDNA-Pro: DNA-binding protein identification by combining Chou's PseAAC and physicochemical distance transformation
  publication-title: Mol. Inf.
  doi: 10.1002/minf.201400025
– volume: 16
  start-page: 315
  year: 2019
  ident: 2021121718030776200_B18
  article-title: Selene: a PyTorch-based deep learning library for sequence data
  publication-title: Nat. Methods
  doi: 10.1038/s41592-019-0360-8
– volume: 15
  start-page: 633
  year: 2000
  ident: 2021121718030776200_B87
  article-title: Data mining for text categorization with semi-supervised agglomerative hierarchical clustering
  publication-title: Int. J. Intell. Syst.
  doi: 10.1002/(SICI)1098-111X(200007)15:7<633::AID-INT4>3.0.CO;2-8
– volume: 32
  start-page: 2411
  year: 2016
  ident: 2021121718030776200_B95
  article-title: iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btw186
– start-page: 46
  volume-title: Proceedings of the 18th International Conference on Database Systems for Advanced Applications
  year: 2013
  ident: 2021121718030776200_B26
  article-title: DNA Encoding for Splice Site Prediction in Large DNA Sequence
  doi: 10.1007/978-3-642-40270-8_4
– volume: 16
  start-page: 142
  year: 2015
  ident: 2021121718030776200_B19
  article-title: Pydna: a simulation and documentation tool for DNA assembly strategies using python
  publication-title: BMC Bioinformatics
  doi: 10.1186/s12859-015-0544-x
– volume: 24
  start-page: 303
  year: 2011
  ident: 2021121718030776200_B99
  article-title: SVM based prediction of RNA-binding proteins using binding residues and evolutionary information
  publication-title: J. Mol. Recognit.
  doi: 10.1002/jmr.1061
– volume: 33
  start-page: 685
  year: 2017
  ident: 2021121718030776200_B57
  article-title: Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btw678
– volume: 588
  start-page: 203
  year: 2020
  ident: 2021121718030776200_B101
  article-title: It will change everything’: DeepMind's AI makes gigantic leap in solving protein structures
  publication-title: Nature
  doi: 10.1038/d41586-020-03348-4
– volume: 37
  start-page: 592
  year: 2019
  ident: 2021121718030776200_B16
  article-title: The Kipoi repository accelerates community exchange and reuse of predictive models for genomics
  publication-title: Nat. Biotechnol.
  doi: 10.1038/s41587-019-0140-0
– volume: 36
  start-page: D202
  year: 2008
  ident: 2021121718030776200_B6
  article-title: AAindex: amino acid index database, progress report 2008
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gkm998
– start-page: 6000
  volume-title: Proceedings of the 31st International Conference on Neural Information Processing Systems
  year: 2017
  ident: 2021121718030776200_B73
  article-title: Attention is all you need
– volume: 36
  start-page: 4038
  year: 2020
  ident: 2021121718030776200_B66
  article-title: An efficient approach based on multi-sources information to predict circRNA-disease associations using deep convolutional neural network
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btz825
– volume: 21
  start-page: 930
  year: 2007
  ident: 2021121718030776200_B91
  article-title: Feature selection using decision tree and classification through proximal support vector machine for fault diagnostics of roller bearing
  publication-title: Mech. Syst. Signal Process.
  doi: 10.1016/j.ymssp.2006.05.004
– volume: 17
  start-page: 579
  year: 2001
  ident: 2021121718030776200_B4
  article-title: Reading the book of life
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/17.7.579
– volume: 36
  start-page: 4576
  year: 2020
  ident: 2021121718030776200_B103
  article-title: Accurate prediction of genome-wide RNA secondary structure profile based on extreme gradient boosting
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btaa534
– volume: 2
  start-page: 27
  year: 2011
  ident: 2021121718030776200_B67
  article-title: LIBSVM: a library for support vector machines
  publication-title: ACM Trans. Intell. Syst. Technol.
  doi: 10.1145/1961189.1961199
– volume: 3
  start-page: 1137
  year: 2003
  ident: 2021121718030776200_B51
  article-title: A neural probabilistic language model
  publication-title: J. Mach. Learn. Res.
– volume: 12
  start-page: 2825
  year: 2011
  ident: 2021121718030776200_B82
  article-title: Scikit-learn: machine learning in Python
  publication-title: J. Mach. Learn. Res.
– volume: 7
  start-page: 165241
  year: 2019
  ident: 2021121718030776200_B108
  article-title: iEsGene-ZCPseKNC: identify essential genes based on Z curve pseudo k-tuple nucleotide composition
  publication-title: Ieee Access
  doi: 10.1109/ACCESS.2019.2952237
– volume: 9
  start-page: 510
  year: 2008
  ident: 2021121718030776200_B42
  article-title: A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-9-510
– start-page: 183
  year: 2020
  ident: 2021121718030776200_B8
  article-title: Few-Shot NLG with Pre-Trained Language Model
  publication-title: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL)
  doi: 10.18653/v1/2020.acl-main.18
– volume: 4
  start-page: e1000134
  year: 2008
  ident: 2021121718030776200_B34
  article-title: Predicting human nucleosome occupancy from primary sequence
  publication-title: PLoS Comput. Biol.
  doi: 10.1371/journal.pcbi.1000134
– start-page: 286
  volume-title: Proceedings of the 18th European conference on Machine Learning
  year: 2007
  ident: 2021121718030776200_B80
  article-title: Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches
– volume: 11
  start-page: 3488
  year: 2020
  ident: 2021121718030776200_B17
  article-title: Deep learning for genomics using Janggu
  publication-title: Nat. Commun.
  doi: 10.1038/s41467-020-17155-y
– volume: 125
  start-page: 167
  year: 1994
  ident: 2021121718030776200_B97
  article-title: Fast folding and comparison of rna secondary structures
  publication-title: Monatsh. Chem.
  doi: 10.1007/BF00818163
– volume: 10
  start-page: e0121501
  year: 2015
  ident: 2021121718030776200_B96
  article-title: Identification of real microRNA precursors with a pseudo structure status composition approach
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0121501
– volume: 19
  start-page: 2483
  year: 2018
  ident: 2021121718030776200_B76
  article-title: IDP–CRF: intrinsically disordered protein/region identification based on conditional random fields
  publication-title: Int. J. Mol. Sci.
  doi: 10.3390/ijms19092483
– volume: 20
  start-page: 1280
  year: 2019
  ident: 2021121718030776200_B11
  article-title: BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches
  publication-title: Brief. Bioinform.
  doi: 10.1093/bib/bbx165
– volume: 11
  start-page: e0153268
  year: 2016
  ident: 2021121718030776200_B38
  article-title: Accurate prediction of transposon-derived piRNAs by integrating various sequential and physicochemical features
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0153268
– start-page: 243
  volume-title: Proceedings of the 9th International Conference on Machine Learning and Computing
  year: 2017
  ident: 2021121718030776200_B79
  article-title: Combining Over-Sampling and Under-Sampling Techniques for Imbalance Dataset
  doi: 10.1145/3055635.3056643
– volume: 21
  start-page: 1047
  year: 2019
  ident: 2021121718030776200_B13
  article-title: iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data
  publication-title: Brief. Bioinform.
  doi: 10.1093/bib/bbz041
– volume: 20
  start-page: 467
  year: 2004
  ident: 2021121718030776200_B37
  article-title: Mismatch string kernels for discriminative protein classification
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btg431
– volume: 42
  start-page: 12961
  year: 2014
  ident: 2021121718030776200_B40
  article-title: iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gku1019
– volume: 18
  start-page: 379
  year: 2017
  ident: 2021121718030776200_B109
  article-title: EL_PSSM-RT: DNA-binding residue prediction by integrating ensemble learning with PSSM relation transformation
  publication-title: BMC Bioinformatics
  doi: 10.1186/s12859-017-1792-8
– volume: 2
  start-page: 427
  year: 2017
  ident: 2021121718030776200_B55
  article-title: Bag of Tricks for Efficient Text Classification
  publication-title: Conference of the European Chapter of the Association for Computational Linguistics
– volume: 33
  start-page: 854
  year: 2017
  ident: 2021121718030776200_B100
  article-title: RBPPred: predicting RNA-binding proteins from sequence using SVM
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btw730
– volume: 40
  start-page: 1207
  year: 2007
  ident: 2021121718030776200_B86
  article-title: Texture classification and segmentation using wavelet packet frame and Gaussian mixture model
  publication-title: Pattern Recogn
  doi: 10.1016/j.patcog.2006.09.012
– volume: 37
  start-page: 1001
  year: 1989
  ident: 2021121718030776200_B29
  article-title: A tree-based statistical language model for natural language speech recognition
  publication-title: IEEE Trans. Acoust. Speech Signal Process.
  doi: 10.1109/29.32278
– volume: 15
  start-page: S3
  year: 2014
  ident: 2021121718030776200_B43
  article-title: Using distances between Top-n-gram and residue pairs for protein remote homology detection
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-15-S16-S3
– volume: 40
  start-page: 181
  year: 2014
  ident: 2021121718030776200_B81
  article-title: Fast image reconstruction with L2-regularization
  publication-title: J. Magn. Reson. Imaging
  doi: 10.1002/jmri.24365
– volume: 9
  start-page: 495
  year: 2018
  ident: 2021121718030776200_B28
  article-title: M6AMRFS: robust prediction of N6-methyladenosine sites with sequence-based features in multiple species
  publication-title: Front. Genet.
  doi: 10.3389/fgene.2018.00495
– volume: 5
  start-page: 290
  year: 2001
  ident: 2021121718030776200_B94
  article-title: ECG data compression using truncated singular value decomposition
  publication-title: Trans. Info. Tech. Biomed.
  doi: 10.1109/4233.966104
– volume: 10
  start-page: 142
  year: 1954
  ident: 2021121718030776200_B52
  article-title: Distributional Structure
  publication-title: Word
  doi: 10.1080/00437956.1954.11659520
– volume: 4
  start-page: 267
  year: 2012
  ident: 2021121718030776200_B70
  article-title: An introduction to conditional random fields
  publication-title: Found. Trends Mach. Learn.
  doi: 10.1561/2200000013
– year: 2017
  ident: 2021121718030776200_B74
  article-title: Weighted transformer network for machine translation
– volume: 19
  start-page: 1531
  year: 2003
  ident: 2021121718030776200_B62
  article-title: Probabilistic scoring measures for profile-profile comparison yield more accurate short seed alignments
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btg185
– volume: 47
  start-page: 4406
  year: 2019
  ident: 2021121718030776200_B46
  article-title: TriPepSVM: de novo prediction of RNA-binding proteins based on short amino acid motifs
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gkz203
– volume: 21
  start-page: 1733
  year: 2020
  ident: 2021121718030776200_B56
  article-title: DeepSVM-fold: protein fold recognition by combining support vector machines and pairwise sequence similarity scores generated by deep learning networks
  publication-title: Brief. Bioinform.
  doi: 10.1093/bib/bbz098
– volume: 25
  start-page: 2655
  year: 2009
  ident: 2021121718030776200_B24
  article-title: A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btp500
– volume: 31
  start-page: 264
  year: 1999
  ident: 2021121718030776200_B83
  article-title: Data clustering: a review
  publication-title: ACM computing surveys
  doi: 10.1145/331499.331504
– start-page: 404
  volume-title: Proceedings of the 2004 conference on Empirical Methods in Natural Language Processing
  year: 2004
  ident: 2021121718030776200_B31
  article-title: Textrank: Bringing order into text
– start-page: 583
  year: 1997
  ident: 2021121718030776200_B93
  article-title: Kernel Principal Component Analysis
  publication-title: Proceedings of the 7th International Conference on Artificial Neural Networks
– volume: 7
  start-page: 68
  year: 2006
  ident: 2021121718030776200_B5
  article-title: Protein linguistics - a grammar for modular protein assembly?
  publication-title: Nat. Rev. Mol. Cell Biol.
  doi: 10.1038/nrm1785
– start-page: 226
  volume-title: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining
  year: 1996
  ident: 2021121718030776200_B85
  article-title: A density-based algorithm for discovering clusters in large spatial databases with noise
– start-page: 1724
  year: 2014
  ident: 2021121718030776200_B72
  article-title: Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
  publication-title: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing
  doi: 10.3115/v1/D14-1179
– volume: 53
  start-page: 226
  year: 2012
  ident: 2021121718030776200_B78
  article-title: Preprocessing unbalanced data using support vector machine
  publication-title: Decision Support Systems
  doi: 10.1016/j.dss.2012.01.016
– volume: 315
  start-page: 972
  year: 2007
  ident: 2021121718030776200_B84
  article-title: Clustering by passing messages between data points
  publication-title: Science
  doi: 10.1126/science.1136800
– start-page: 248
  volume-title: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
  year: 2009
  ident: 2021121718030776200_B50
  article-title: Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora
SSID ssj0014154
Score 2.689744
Snippet Abstract In order to uncover the meanings of ‘book of life’, 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are...
In order to uncover the meanings of ‘book of life’, 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in...
In order to uncover the meanings of 'book of life', 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in...
SourceID pubmedcentral
proquest
pubmed
crossref
oup
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage e129
SubjectTerms Deoxyribonuclease I
DNA-Binding Proteins - chemistry
Intrinsically Disordered Proteins - chemistry
Methods Online
MicroRNAs - chemistry
Models, Statistical
Natural Language Processing
Nucleic Acid Conformation
RNA Precursors - chemistry
RNA-Binding Proteins - chemistry
Sequence Analysis, DNA - methods
Sequence Analysis, Protein - methods
Sequence Analysis, RNA - methods
Software
Title BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models
URI https://www.ncbi.nlm.nih.gov/pubmed/34581805
https://www.proquest.com/docview/2577458303
https://pubmed.ncbi.nlm.nih.gov/PMC8682797
Volume 49
WOSCitedRecordID wos000736046000003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 1362-4962
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014154
  issn: 0305-1048
  databaseCode: DOA
  dateStart: 20050101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVASL
  databaseName: Oxford Journals Open Access Collection
  customDbUrl:
  eissn: 1362-4962
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014154
  issn: 0305-1048
  databaseCode: TOX
  dateStart: 19960101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Pb9MwFH-CCQku_NlglEF5kyYOiGhOnMbObt1g4rB1CHroLXLsl63acEfbIbFPs8-yTzbbSaN1moCLL36JLD8n769_P4AtqfOYG1G6T1yzKK3c4JGVIqWrnuHeQGsVyCbEYCBHo_xb0yA7u6eEn_Ntq6bbx6eqlIm_pxf3pCcqGB6N2mKBs0E1SlQA1Uxlcw3vzrNLhmfpMtstn_Jua-QtW7P_7H9X-RyeNt4k9mv1v4AHZFdhrW9dJP3zD37A0N8ZEuer8Hhvwe22Bie748kP-hXtHhzuoMLzMzX3ziu6AZWHKbl0Fg0_D_qf8Pugf32lrMGA6DC22DZfo7eABicWayQnr25c5D8xUOzMXsJw_8tw72vUcC5EOhVyHnFGZUa6qtKUqqysykyTJkMuaJUUM2LeAWAkjckESUXS-zulJp7kimWav4IVO7H0GrBy3kPuwjOde1Zrw2Ralabn4dxZqhIlOvBxoY9CN3jknhbjrKjr4rxwW1o0W9qBrVb4vIbhuF_svVPs3yU2F0ov3J776oiyNLmYFe7fJXwRmfEOrNeHoH0RdxOxZL0OiKXj0Qp4kO7lGTs-CWDdMpOJyMWbf65sA54kvl8mTqI4ewsr8-kFvYNH-vd8PJt24aEYyW7IGHTD6b8Bd4EDJw
linkProvider Oxford University Press
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=BioSeq-BLM%3A+a+platform+for+analyzing+DNA%2C+RNA%C2%A0and+protein+sequences+based+on+biological+language+models&rft.jtitle=Nucleic+acids+research&rft.au=Li%2C+Hong-Liang&rft.au=Pang%2C+Yi-He&rft.au=Liu%2C+Bin&rft.date=2021-12-16&rft.pub=Oxford+University+Press&rft.issn=0305-1048&rft.eissn=1362-4962&rft.volume=49&rft.issue=22&rft.spage=e129&rft.epage=e129&rft_id=info:doi/10.1093%2Fnar%2Fgkab829&rft.externalDocID=10.1093%2Fnar%2Fgkab829
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0305-1048&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0305-1048&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0305-1048&client=summon