BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models
Abstract In order to uncover the meanings of ‘book of life’, 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in this study, which are able to extract the linguistic properties of ‘book of life’. We also extend the BLMs into a system called Bio...
Saved in:
| Published in: | Nucleic acids research Vol. 49; no. 22; p. e129 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
England
Oxford University Press
16.12.2021
|
| Subjects: | |
| ISSN: | 0305-1048, 1362-4962, 1362-4962 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Abstract
In order to uncover the meanings of ‘book of life’, 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in this study, which are able to extract the linguistic properties of ‘book of life’. We also extend the BLMs into a system called BioSeq-BLM for automatically representing and analyzing the sequence data. Experimental results show that the predictors generated by BioSeq-BLM achieve comparable or even obviously better performance than the exiting state-of-the-art predictors published in literatures, indicating that BioSeq-BLM will provide new approaches for biological sequence analysis based on natural language processing technologies, and contribute to the development of this very important field. In order to help the readers to use BioSeq-BLM for their own experiments, the corresponding web server and stand-alone package are established and released, which can be freely accessed at http://bliulab.net/BioSeq-BLM/. |
|---|---|
| AbstractList | In order to uncover the meanings of ‘book of life’, 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in this study, which are able to extract the linguistic properties of ‘book of life’. We also extend the BLMs into a system called BioSeq-BLM for automatically representing and analyzing the sequence data. Experimental results show that the predictors generated by BioSeq-BLM achieve comparable or even obviously better performance than the exiting state-of-the-art predictors published in literatures, indicating that BioSeq-BLM will provide new approaches for biological sequence analysis based on natural language processing technologies, and contribute to the development of this very important field. In order to help the readers to use BioSeq-BLM for their own experiments, the corresponding web server and stand-alone package are established and released, which can be freely accessed at http://bliulab.net/BioSeq-BLM/. Abstract In order to uncover the meanings of ‘book of life’, 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in this study, which are able to extract the linguistic properties of ‘book of life’. We also extend the BLMs into a system called BioSeq-BLM for automatically representing and analyzing the sequence data. Experimental results show that the predictors generated by BioSeq-BLM achieve comparable or even obviously better performance than the exiting state-of-the-art predictors published in literatures, indicating that BioSeq-BLM will provide new approaches for biological sequence analysis based on natural language processing technologies, and contribute to the development of this very important field. In order to help the readers to use BioSeq-BLM for their own experiments, the corresponding web server and stand-alone package are established and released, which can be freely accessed at http://bliulab.net/BioSeq-BLM/. In order to uncover the meanings of 'book of life', 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in this study, which are able to extract the linguistic properties of 'book of life'. We also extend the BLMs into a system called BioSeq-BLM for automatically representing and analyzing the sequence data. Experimental results show that the predictors generated by BioSeq-BLM achieve comparable or even obviously better performance than the exiting state-of-the-art predictors published in literatures, indicating that BioSeq-BLM will provide new approaches for biological sequence analysis based on natural language processing technologies, and contribute to the development of this very important field. In order to help the readers to use BioSeq-BLM for their own experiments, the corresponding web server and stand-alone package are established and released, which can be freely accessed at http://bliulab.net/BioSeq-BLM/.In order to uncover the meanings of 'book of life', 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in this study, which are able to extract the linguistic properties of 'book of life'. We also extend the BLMs into a system called BioSeq-BLM for automatically representing and analyzing the sequence data. Experimental results show that the predictors generated by BioSeq-BLM achieve comparable or even obviously better performance than the exiting state-of-the-art predictors published in literatures, indicating that BioSeq-BLM will provide new approaches for biological sequence analysis based on natural language processing technologies, and contribute to the development of this very important field. In order to help the readers to use BioSeq-BLM for their own experiments, the corresponding web server and stand-alone package are established and released, which can be freely accessed at http://bliulab.net/BioSeq-BLM/. |
| Author | Li, Hong-Liang Pang, Yi-He Liu, Bin |
| Author_xml | – sequence: 1 givenname: Hong-Liang surname: Li fullname: Li, Hong-Liang email: bliu@bliulab.net – sequence: 2 givenname: Yi-He surname: Pang fullname: Pang, Yi-He – sequence: 3 givenname: Bin orcidid: 0000-0003-3685-9469 surname: Liu fullname: Liu, Bin email: bliu@bliulab.net |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/34581805$$D View this record in MEDLINE/PubMed |
| BookMark | eNp9kc1u1TAQhS1URG8LK_bIK4QEoXbsOE4XSLflV7oUCbq3HGeSGhw7tROk9mn6LDwZru4tAoTYzCzmm3M0cw7Qng8eEHpMyUtKGnbkdTwavulWls09tKJMlAVvRLmHVoSRqqCEy310kNJXQiinFX-A9hmvJJWkWqGLExu-wGVxsvl4jDWenJ77EEecC9Zeu6tr6wf8-mz9An8-W_-40b7DUwwzWI8TXC7gDSTc6gQdDh63NrgwWKMddtoPix4Aj6EDlx6i-712CR7t-iE6f_vm_PR9sfn07sPpelMYXsu5YARaAabvOYdetH0rDBjogLJSAiVAOBG5ya4TNUgNUjasbg2wstFEGHaIXm1lp6UdoTPg56idmqIddbxSQVv158TbCzWE70oKWdZNnQWe7QRiyOelWY02GXD5HAhLUmVV1_l7jLCMPvnd65fJ3XczQLeAiSGlCL0ydtazDbfW1ilK1G2CKieodgnmned_7dzJ_pt-uqXDMv0X_AkhhK5J |
| CitedBy_id | crossref_primary_10_1016_j_ymeth_2025_03_001 crossref_primary_10_1093_nar_gkac824 crossref_primary_10_1016_j_ymeth_2025_03_002 crossref_primary_10_1093_bfgp_elad007 crossref_primary_10_3389_fcell_2021_801113 crossref_primary_10_3389_fmed_2025_1503229 crossref_primary_10_1016_j_ijbiomac_2024_134146 crossref_primary_10_3389_fpls_2025_1618174 crossref_primary_10_1016_j_compbiomed_2022_105940 crossref_primary_10_1016_j_compbiolchem_2024_108212 crossref_primary_10_1016_j_csbj_2022_07_043 crossref_primary_10_1093_bib_bbac218 crossref_primary_10_1016_j_compbiolchem_2025_108612 crossref_primary_10_1109_TNNLS_2024_3419250 crossref_primary_10_1016_j_future_2024_05_029 crossref_primary_10_1093_bioadv_vbad043 crossref_primary_10_1093_bib_bbad261 crossref_primary_10_1038_s41598_024_72512_x crossref_primary_10_1016_j_ijbiomac_2023_124993 crossref_primary_10_1021_acsomega_5c01924 crossref_primary_10_1007_s00299_024_03294_9 crossref_primary_10_1093_bib_bbaf447 crossref_primary_10_1093_bfgp_elad012 crossref_primary_10_1080_15592294_2022_2158284 crossref_primary_10_1186_s12915_024_01883_4 crossref_primary_10_1016_j_compbiomed_2022_105938 crossref_primary_10_1021_acs_jcim_5c01073 crossref_primary_10_3389_fgene_2021_801261 crossref_primary_10_1093_bioinformatics_btaf414 crossref_primary_10_1371_journal_pcbi_1011214 crossref_primary_10_1371_journal_pcbi_1012544 crossref_primary_10_1016_j_ymeth_2022_11_001 crossref_primary_10_3389_fdata_2021_727216 crossref_primary_10_3389_fmed_2025_1529335 crossref_primary_10_1016_j_ymeth_2024_05_012 crossref_primary_10_1093_bib_bbae469 crossref_primary_10_3389_fgene_2021_808856 crossref_primary_10_1016_j_ygeno_2025_111037 crossref_primary_10_1016_j_ymeth_2024_05_010 crossref_primary_10_1093_bib_bbae504 crossref_primary_10_26599_BDMA_2024_9020018 crossref_primary_10_1016_j_eswa_2024_125981 crossref_primary_10_1371_journal_pcbi_1010404 crossref_primary_10_1016_j_jmb_2024_168653 crossref_primary_10_1007_s11704_024_40072_y crossref_primary_10_1186_s12915_025_02314_8 crossref_primary_10_1016_j_inffus_2025_103227 crossref_primary_10_1016_j_jmb_2025_168978 crossref_primary_10_1126_sciadv_adv0778 crossref_primary_10_1186_s12915_023_01803_y crossref_primary_10_1016_j_compbiomed_2024_108963 crossref_primary_10_1093_bib_bbad212 crossref_primary_10_3389_fpls_2025_1626539 crossref_primary_10_1093_bib_bbac243 crossref_primary_10_1016_j_compbiomed_2022_105605 crossref_primary_10_1016_j_ijbiomac_2024_136940 crossref_primary_10_1007_s11432_024_4147_8 crossref_primary_10_1093_bioinformatics_btae581 crossref_primary_10_3390_genes15081090 crossref_primary_10_1016_j_future_2024_06_008 crossref_primary_10_1039_D5CP00785B crossref_primary_10_1002_jcc_70111 crossref_primary_10_1093_bib_bbac236 crossref_primary_10_1109_TCBBIO_2025_3565912 crossref_primary_10_1016_j_ymthe_2022_05_001 crossref_primary_10_1007_s11432_024_4457_2 crossref_primary_10_1109_TCBB_2024_3425644 crossref_primary_10_3389_fgene_2024_1443532 crossref_primary_10_3390_ijms25137049 crossref_primary_10_1016_j_heliyon_2024_e41488 crossref_primary_10_1093_bib_bbaf189 crossref_primary_10_1016_j_ymeth_2022_01_004 crossref_primary_10_1186_s12915_024_02030_9 crossref_primary_10_1093_bib_bbae534 crossref_primary_10_1016_j_ymeth_2022_08_015 crossref_primary_10_1002_pmic_202400044 crossref_primary_10_1093_bib_bbaf061 crossref_primary_10_1038_s42003_024_07411_y crossref_primary_10_1016_j_compbiomed_2024_108339 crossref_primary_10_1016_j_compbiolchem_2024_108282 crossref_primary_10_1016_j_ymeth_2025_04_011 crossref_primary_10_1016_j_compbiomed_2022_106523 crossref_primary_10_1016_j_compbiomed_2022_106489 crossref_primary_10_1016_j_ymeth_2024_09_010 crossref_primary_10_3390_foods14122014 crossref_primary_10_3390_ijms26062468 crossref_primary_10_1016_j_ymeth_2024_09_017 crossref_primary_10_1186_s12915_025_02206_x crossref_primary_10_3390_ijms242216496 crossref_primary_10_1016_j_ymeth_2024_12_009 crossref_primary_10_3389_fgene_2021_827161 crossref_primary_10_1016_j_cels_2023_05_007 crossref_primary_10_1016_j_ab_2025_115968 crossref_primary_10_1093_bib_bbae169 crossref_primary_10_1109_TCBB_2022_3183018 crossref_primary_10_1093_bib_bbac265 crossref_primary_10_3389_fmicb_2022_790063 crossref_primary_10_1016_j_compbiomed_2024_108229 crossref_primary_10_1016_j_ymeth_2024_06_012 crossref_primary_10_1016_j_ijbiomac_2022_12_315 crossref_primary_10_3390_info15030163 crossref_primary_10_1016_j_compbiomed_2024_107937 crossref_primary_10_1016_j_compbiomed_2023_106711 crossref_primary_10_1016_j_ijbiomac_2025_146849 crossref_primary_10_3389_fgene_2024_1369811 crossref_primary_10_1093_bib_bbad347 crossref_primary_10_1109_TCBB_2022_3150280 crossref_primary_10_1038_s41467_022_32007_7 crossref_primary_10_3389_fgene_2021_818841 crossref_primary_10_3389_fcell_2021_803608 crossref_primary_10_1016_j_ijbiomac_2024_130659 crossref_primary_10_1016_j_ymeth_2022_10_008 crossref_primary_10_1002_pmic_202200409 crossref_primary_10_2174_1574893618666230516144641 crossref_primary_10_1002_aic_17781 crossref_primary_10_1049_syb2_12098 crossref_primary_10_1186_s12915_024_01968_0 crossref_primary_10_1016_j_compbiomed_2024_107941 crossref_primary_10_3390_ijms25189844 crossref_primary_10_1049_syb2_12090 crossref_primary_10_1016_j_compbiomed_2022_105577 crossref_primary_10_1109_TCBB_2021_3136905 crossref_primary_10_1016_j_compbiomed_2023_107654 crossref_primary_10_1016_j_future_2024_06_039 crossref_primary_10_1186_s12967_024_05706_6 crossref_primary_10_3390_ijms26146701 crossref_primary_10_3389_fgene_2022_952649 crossref_primary_10_1093_nar_gkad055 crossref_primary_10_1049_syb2_12104 crossref_primary_10_1016_j_future_2025_107801 crossref_primary_10_1021_acs_jcim_5c00895 crossref_primary_10_1016_j_compbiomed_2024_108484 crossref_primary_10_1007_s11432_024_4171_9 crossref_primary_10_1093_bib_bbad093 crossref_primary_10_1186_s12915_024_02064_z crossref_primary_10_1016_j_ymeth_2024_01_017 crossref_primary_10_1021_acs_jcim_5c00530 crossref_primary_10_1371_journal_pcbi_1011450 crossref_primary_10_1016_j_compbiomed_2024_108129 crossref_primary_10_1016_j_compbiomed_2024_108249 crossref_primary_10_1038_s42003_025_07615_w crossref_primary_10_1021_acs_jcim_4c00003 crossref_primary_10_1109_TCBBIO_2024_3524677 crossref_primary_10_1002_prca_70009 crossref_primary_10_1016_j_compbiomed_2024_108408 crossref_primary_10_1093_bib_bbad251 crossref_primary_10_1371_journal_pcbi_1010511 crossref_primary_10_1016_j_jmb_2022_167604 crossref_primary_10_3389_fgene_2022_857839 crossref_primary_10_1049_syb2_12105 crossref_primary_10_1016_j_jmb_2024_168741 crossref_primary_10_1016_j_chemolab_2024_105239 crossref_primary_10_1093_bib_bbaf024 crossref_primary_10_1093_bib_bbaf026 crossref_primary_10_2174_0115748936299044240202100019 crossref_primary_10_1016_j_compbiomed_2024_108534 crossref_primary_10_1021_acs_biochem_5c00237 crossref_primary_10_3389_fgene_2022_768971 |
| Cites_doi | 10.1093/bioinformatics/bti1047 10.1093/bioinformatics/btr565 10.1016/j.ab.2014.04.001 10.1093/bib/bbz133 10.1186/s12863-018-0633-8 10.1093/nar/gkn159 10.1080/00437956.1954.11659520 10.1371/journal.pone.0168288 10.1016/S0968-0004(98)01298-5 10.1002/bip.360270308 10.1023/A:1007091128394 10.1002/ajpa.20250 10.1016/j.eswa.2010.08.066 10.1093/bioinformatics/btv042 10.1016/j.gpb.2019.01.004 10.1038/nature01255 10.1093/nar/gkv458 10.1162/neco.1997.9.8.1735 10.1016/j.compeleceng.2013.11.024 10.1109/TIT.1956.1056813 10.1006/csla.2001.0174 10.1002/prot.10081 10.1371/journal.pone.0046633 10.1145/2133806.2133826 10.1093/nar/gkn597 10.1613/jair.953 10.1016/j.gde.2015.08.010 10.1093/nar/gkz740 10.1080/01638539809545028 10.1038/s41586-019-1923-7 10.1038/nbt.3300 10.1016/j.jmb.2020.09.008 10.1093/bioinformatics/bti687 10.3115/v1/D14-1162 10.1142/9781848162648_0011 10.1093/bioinformatics/btu624 10.1093/nar/gkab122 10.1073/pnas.1814684116 10.1093/bioinformatics/17.9.763 10.1093/bioinformatics/btu602 10.1002/j.1538-7305.1948.tb01338.x 10.1002/minf.201400025 10.1038/s41592-019-0360-8 10.1002/(SICI)1098-111X(200007)15:7<633::AID-INT4>3.0.CO;2-8 10.1093/bioinformatics/btw186 10.1007/978-3-642-40270-8_4 10.1186/s12859-015-0544-x 10.1002/jmr.1061 10.1093/bioinformatics/btw678 10.1038/d41586-020-03348-4 10.1038/s41587-019-0140-0 10.1093/nar/gkm998 10.1093/bioinformatics/btz825 10.1016/j.ymssp.2006.05.004 10.1093/bioinformatics/17.7.579 10.1093/bioinformatics/btaa534 10.1145/1961189.1961199 10.1109/ACCESS.2019.2952237 10.1186/1471-2105-9-510 10.18653/v1/2020.acl-main.18 10.1371/journal.pcbi.1000134 10.1038/s41467-020-17155-y 10.1007/BF00818163 10.1371/journal.pone.0121501 10.3390/ijms19092483 10.1093/bib/bbx165 10.1371/journal.pone.0153268 10.1145/3055635.3056643 10.1093/bib/bbz041 10.1093/bioinformatics/btg431 10.1093/nar/gku1019 10.1186/s12859-017-1792-8 10.1093/bioinformatics/btw730 10.1016/j.patcog.2006.09.012 10.1109/29.32278 10.1186/1471-2105-15-S16-S3 10.1002/jmri.24365 10.3389/fgene.2018.00495 10.1109/4233.966104 10.1561/2200000013 10.1093/bioinformatics/btg185 10.1093/nar/gkz203 10.1093/bib/bbz098 10.1093/bioinformatics/btp500 10.1145/331499.331504 10.1038/nrm1785 10.3115/v1/D14-1179 10.1016/j.dss.2012.01.016 10.1126/science.1136800 |
| ContentType | Journal Article |
| Copyright | The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. 2021 The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. |
| Copyright_xml | – notice: The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. 2021 – notice: The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. |
| DBID | TOX AAYXX CITATION CGR CUY CVF ECM EIF NPM 7X8 5PM |
| DOI | 10.1093/nar/gkab829 |
| DatabaseName | Oxford Journals Open Access Collection CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic PubMed Central (Full Participant titles) |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE MEDLINE - Academic CrossRef |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: TOX name: Oxford Journals Open Access Collection url: https://academic.oup.com/journals/ sourceTypes: Publisher – sequence: 3 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Anatomy & Physiology Chemistry |
| EISSN | 1362-4962 |
| EndPage | e129 |
| ExternalDocumentID | PMC8682797 34581805 10_1093_nar_gkab829 10.1093/nar/gkab829 |
| Genre | Research Support, Non-U.S. Gov't Journal Article |
| GrantInformation_xml | – fundername: ; grantid: JQ19019 – fundername: ; grantid: 61822306; 61861146002; 61732012 – fundername: ; grantid: 2018AAA0100100 |
| GroupedDBID | --- -DZ -~X .55 .GJ .I3 0R~ 123 18M 1TH 29N 2WC 3O- 4.4 482 53G 5VS 5WA 6.Y 70E 85S A8Z AAFWJ AAHBH AAMVS AAOGV AAPPN AAPXW AAUQX AAVAP AAWDT AAYJJ ABPTD ABQLI ABQTQ ABSAR ABSMQ ABXVV ACFRR ACGFO ACGFS ACIPB ACIWK ACMRT ACNCT ACPQN ACPRK ACUTJ ACZBC ADBBV ADHZD AEGXH AEKPW AENEX AENZO AFFNX AFPKN AFRAH AFSHK AFULF AFYAG AGKRT AGMDO AHMBA AIAGR ALMA_UNASSIGNED_HOLDINGS ALUQC ANFBD AOIJS AQDSO ASAOO ASPBG ATDFG ATTQO AVWKF AZFZN BAWUL BAYMD BCNDV BEYMZ BTTYL C1A CAG CIDKT COF CS3 CXTWN CZ4 D0S DFGAJ DIK DU5 D~K E3Z EBD EBS EJD ELUNK EMOBN ESTFP F20 F5P FEDTE GROUPED_DOAJ GX1 H13 HH5 HVGLF HYE HZ~ H~9 IH2 KAQDR KC5 KQ8 KSI M49 MBTAY MVM M~E NTWIH NU- OAWHX OBC OBS OEB OES OJQWA OVD O~Y P2P PB- PEELM PQQKQ QBD R44 RD5 RNI RNS ROL ROX ROZ RPM RXO RZF RZO SJN SV3 TCN TEORI TN5 TOX TR2 UHB WG7 WOQ X7H X7M XSB XSW YSK ZKX ZXP ~91 ~D7 ~KM AAYXX ABEJV ABGNP AMNDL CITATION OVT ADIXU CGR CUY CVF ECM EIF NPM 7X8 5PM |
| ID | FETCH-LOGICAL-c478t-30eb6ecff44ef6bfb6cecede1328e10e040610e8dd67e8ae88937bce329a06c3 |
| IEDL.DBID | TOX |
| ISICitedReferencesCount | 185 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000736046000003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0305-1048 1362-4962 |
| IngestDate | Tue Sep 30 15:32:42 EDT 2025 Thu Oct 02 11:15:38 EDT 2025 Wed Feb 19 02:27:57 EST 2025 Sat Nov 29 03:25:10 EST 2025 Tue Nov 18 22:18:21 EST 2025 Wed Aug 28 03:17:06 EDT 2024 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 22 |
| Language | English |
| License | This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com https://creativecommons.org/licenses/by-nc/4.0 The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c478t-30eb6ecff44ef6bfb6cecede1328e10e040610e8dd67e8ae88937bce329a06c3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ORCID | 0000-0003-3685-9469 |
| OpenAccessLink | https://dx.doi.org/10.1093/nar/gkab829 |
| PMID | 34581805 |
| PQID | 2577458303 |
| PQPubID | 23479 |
| ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_8682797 proquest_miscellaneous_2577458303 pubmed_primary_34581805 crossref_citationtrail_10_1093_nar_gkab829 crossref_primary_10_1093_nar_gkab829 oup_primary_10_1093_nar_gkab829 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-12-16 |
| PublicationDateYYYYMMDD | 2021-12-16 |
| PublicationDate_xml | – month: 12 year: 2021 text: 2021-12-16 day: 16 |
| PublicationDecade | 2020 |
| PublicationPlace | England |
| PublicationPlace_xml | – name: England |
| PublicationTitle | Nucleic acids research |
| PublicationTitleAlternate | Nucleic Acids Res |
| PublicationYear | 2021 |
| Publisher | Oxford University Press |
| Publisher_xml | – name: Oxford University Press |
| References | Hanson (2021121718030776200_B25) 2019; 17 Chen (2021121718030776200_B18) 2019; 16 Pennington (2021121718030776200_B54) 2014 Noble (2021121718030776200_B35) 2005; 21 Frey (2021121718030776200_B84) 2007; 315 Mittelman (2021121718030776200_B62) 2003; 19 Laboulais (2021121718030776200_B65) 2002; 47 Liu (2021121718030776200_B96) 2015; 10 Chen (2021121718030776200_B69) 2021; 49 Bilgic (2021121718030776200_B81) 2014; 40 Bengio (2021121718030776200_B51) 2003; 3 Ye (2021121718030776200_B60) 2011; 27 Zhang (2021121718030776200_B30) 2011; 38 Hanson (2021121718030776200_B57) 2017; 33 Yeung (2021121718030776200_B92) 2001; 17 Sokal (2021121718030776200_B106) 2006; 129 Strauss (2021121718030776200_B63) 2017; 12 Liu (2021121718030776200_B27) 2012; 7 Zhou (2021121718030776200_B109) 2017; 18 Blei (2021121718030776200_B32) 2012; 55 Senior (2021121718030776200_B9) 2020; 577 Lebret (2021121718030776200_B58) 2015 Joulin (2021121718030776200_B55) 2017; 2 Guyon (2021121718030776200_B89) 2003; 3 Ahmed (2021121718030776200_B74) 2017 Leslie (2021121718030776200_B37) 2004; 20 Liu (2021121718030776200_B76) 2018; 19 Ramos (2021121718030776200_B45) 2003 Wei (2021121718030776200_B94) 2001; 5 Cho (2021121718030776200_B72) 2014 Li (2021121718030776200_B59) 2020; 21 Zhang (2021121718030776200_B100) 2017; 33 Liu (2021121718030776200_B98) 2015; 34 Liu (2021121718030776200_B11) 2019; 20 Cao (2021121718030776200_B15) 2015; 31 Chen (2021121718030776200_B13) 2019; 21 Gupta (2021121718030776200_B34) 2008; 4 Harris (2021121718030776200_B44) 1954; 10 Friedel (2021121718030776200_B7) 2009; 37 Pedregosa (2021121718030776200_B82) 2011; 12 Gimona (2021121718030776200_B5) 2006; 7 Avsec (2021121718030776200_B16) 2019; 37 Chen (2021121718030776200_B108) 2019; 7 Callaway (2021121718030776200_B101) 2020; 588 Farquad (2021121718030776200_B78) 2012; 53 Kitaev (2021121718030776200_B75) 2020 Horne (2021121718030776200_B105) 1988; 27 Lodhi (2021121718030776200_B39) 2002; 2 Mikolov (2021121718030776200_B53) 2013 Chen (2021121718030776200_B33) 2014; 456 Liu (2021121718030776200_B41) 2015; 43 Kawashima (2021121718030776200_B6) 2008; 36 Zhang (2021121718030776200_B23) 2020; 432 Pereira (2021121718030776200_B19) 2015; 16 Lin (2021121718030776200_B40) 2014; 42 Junsomboon (2021121718030776200_B79) 2017 Yu (2021121718030776200_B3) 2019; 116 Rangwala (2021121718030776200_B61) 2005; 21 Altschul (2021121718030776200_B102) 1998; 23 Dong (2021121718030776200_B24) 2009; 25 HARRIS (2021121718030776200_B52) 1954; 10 Darst (2021121718030776200_B90) 2018; 19 Wang (2021121718030776200_B66) 2020; 36 Kumar (2021121718030776200_B99) 2011; 24 Searls (2021121718030776200_B1) 2002; 420 Qiang (2021121718030776200_B28) 2018; 9 Luo (2021121718030776200_B38) 2016; 11 Schmidt (2021121718030776200_B80) 2007 Ke (2021121718030776200_B103) 2020; 36 Kim (2021121718030776200_B86) 2007; 40 Vaswani (2021121718030776200_B73) 2017 Landauer (2021121718030776200_B48) 1998; 25 Chandrashekar (2021121718030776200_B88) 2014; 40 El-Manzalawy (2021121718030776200_B36) 2008; 7 Schölkopf (2021121718030776200_B93) 1997 Liu (2021121718030776200_B12) 2019; 47 Sutton (2021121718030776200_B70) 2012; 4 Chen (2021121718030776200_B8) 2020 Sugumaran (2021121718030776200_B91) 2007; 21 Mihalcea (2021121718030776200_B31) 2004 Bahl (2021121718030776200_B29) 1989; 37 Ramage (2021121718030776200_B50) 2009 Kopp (2021121718030776200_B17) 2020; 11 Liu (2021121718030776200_B42) 2008; 9 Bressin (2021121718030776200_B46) 2019; 47 Biau (2021121718030776200_B68) 2012; 13 Chang (2021121718030776200_B67) 2011; 2 Hochreiter (2021121718030776200_B71) 1997; 9 Ester (2021121718030776200_B85) 1996 Xiao (2021121718030776200_B14) 2015; 31 Bari (2021121718030776200_B26) 2013 Scaiewicz (2021121718030776200_B2) 2015; 35 Chawla (2021121718030776200_B77) 2002; 16 Liu (2021121718030776200_B95) 2016; 32 Alipanahi (2021121718030776200_B10) 2015; 33 Guo (2021121718030776200_B47) 2008; 36 Chomsky (2021121718030776200_B22) 1956; 2 Chen (2021121718030776200_B104) 2015; 31 Liu (2021121718030776200_B43) 2014; 15 Blei (2021121718030776200_B49) 2003; 3 Liu (2021121718030776200_B56) 2020; 21 Feng (2021121718030776200_B107) 2000; 19 Shannon (2021121718030776200_B20) 1948; 27 Skarmeta (2021121718030776200_B87) 2000; 15 Searls (2021121718030776200_B4) 2001; 17 Hofacker (2021121718030776200_B97) 1994; 125 Goodman (2021121718030776200_B21) 2001; 15 Jain (2021121718030776200_B83) 1999; 31 Weinberger (2021121718030776200_B64) 2009; 10 |
| References_xml | – year: 2020 ident: 2021121718030776200_B75 article-title: Reformer: the efficient transformer – volume: 10 start-page: 207 year: 2009 ident: 2021121718030776200_B64 article-title: Distance metric learning for large margin nearest neighbor classification publication-title: J. Mach. Learn. Res. – volume: 21 start-page: I338 year: 2005 ident: 2021121718030776200_B35 article-title: Predicting the in vivo signature of human gene regulatory sequences publication-title: Bioinformatics doi: 10.1093/bioinformatics/bti1047 – volume: 27 start-page: 3356 year: 2011 ident: 2021121718030776200_B60 article-title: An assessment of substitution scores for protein profile-profile comparison publication-title: Bioinformatics doi: 10.1093/bioinformatics/btr565 – volume: 456 start-page: 53 year: 2014 ident: 2021121718030776200_B33 article-title: PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition publication-title: Anal. Biochem. doi: 10.1016/j.ab.2014.04.001 – volume: 3 start-page: 993 year: 2003 ident: 2021121718030776200_B49 article-title: Latent dirichlet allocation publication-title: J. Mach. Learn. Res. – volume: 21 start-page: 2133 year: 2020 ident: 2021121718030776200_B59 article-title: MotifCNN-fold: protein fold recognition based on fold-specific features extracted by motif-based convolutional neural networks publication-title: Brief. Bioinform. doi: 10.1093/bib/bbz133 – volume: 19 start-page: 353 year: 2018 ident: 2021121718030776200_B90 article-title: Using recursive feature elimination in random forest to account for correlated variables in high dimensional data publication-title: BMC Genet. doi: 10.1186/s12863-018-0633-8 – volume: 36 start-page: 3025 year: 2008 ident: 2021121718030776200_B47 article-title: Using support vector machine combined with auto covariance to predict proteinprotein interactions from protein sequences publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkn159 – volume: 10 start-page: 146 year: 1954 ident: 2021121718030776200_B44 article-title: Distributional structure publication-title: Word doi: 10.1080/00437956.1954.11659520 – volume: 12 start-page: e0168288 year: 2017 ident: 2021121718030776200_B63 article-title: Generalising Ward's method for use with Manhattan distances publication-title: PLoS One doi: 10.1371/journal.pone.0168288 – year: 2015 ident: 2021121718030776200_B58 article-title: “The Sum of Its Parts”: joint learning of word and phrase representations with autoencoders – volume: 23 start-page: 444 year: 1998 ident: 2021121718030776200_B102 article-title: Iterated profile searches with PSI-BLAST - a tool for discovery in protein databases publication-title: Trends Biochem. Sci. doi: 10.1016/S0968-0004(98)01298-5 – volume: 27 start-page: 451 year: 1988 ident: 2021121718030776200_B105 article-title: Prediction of protein helix content from an auto-correlation analysis of sequence hydrophobicities publication-title: Biopolymers doi: 10.1002/bip.360270308 – year: 2013 ident: 2021121718030776200_B53 article-title: Efficient estimation of word representations in vector space – volume: 19 start-page: 269 year: 2000 ident: 2021121718030776200_B107 article-title: Prediction of membrane protein types based on the hydrophobic index of amino acids publication-title: J. Protein Chem. doi: 10.1023/A:1007091128394 – volume: 129 start-page: 121 year: 2006 ident: 2021121718030776200_B106 article-title: Population structure inferred by local spatial autocorrelation: an example from an Amerindian tribal population publication-title: Am. J. Phys. Anthropol. doi: 10.1002/ajpa.20250 – volume: 38 start-page: 2758 year: 2011 ident: 2021121718030776200_B30 article-title: A comparative study of TF*IDF, LSI and multi-words for text classification publication-title: Expert Syst. Appl. doi: 10.1016/j.eswa.2010.08.066 – volume: 31 start-page: 1857 year: 2015 ident: 2021121718030776200_B14 article-title: protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences publication-title: Bioinformatics doi: 10.1093/bioinformatics/btv042 – volume: 17 start-page: 645 year: 2019 ident: 2021121718030776200_B25 article-title: SPOT-Disorder2: improved protein intrinsic disorder prediction by ensembled deep learning publication-title: Genomics Proteomics Bioinformatics doi: 10.1016/j.gpb.2019.01.004 – volume: 420 start-page: 211 year: 2002 ident: 2021121718030776200_B1 article-title: The language of genes publication-title: Nature doi: 10.1038/nature01255 – volume: 43 start-page: W65 year: 2015 ident: 2021121718030776200_B41 article-title: Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkv458 – volume: 9 start-page: 1735 year: 1997 ident: 2021121718030776200_B71 article-title: Long short-term memory publication-title: Neural Comput. doi: 10.1162/neco.1997.9.8.1735 – volume: 40 start-page: 16 year: 2014 ident: 2021121718030776200_B88 article-title: A survey on feature selection methods publication-title: Comput. Electr. Eng. doi: 10.1016/j.compeleceng.2013.11.024 – volume: 2 start-page: 113 year: 1956 ident: 2021121718030776200_B22 article-title: Three models for the description of language publication-title: IRE Trans. Inf. Theory doi: 10.1109/TIT.1956.1056813 – volume: 15 start-page: 403 year: 2001 ident: 2021121718030776200_B21 article-title: A bit of progress in language modeling publication-title: Comput. Speech Lang. doi: 10.1006/csla.2001.0174 – volume: 47 start-page: 169 year: 2002 ident: 2021121718030776200_B65 article-title: Hamming distance geometry of a protein conformational space: application to the clustering of a 4-ns molecular dynamics trajectory of the HIV-1 integrase catalytic core publication-title: Proteins-Struct. Funct. Genet. doi: 10.1002/prot.10081 – volume: 7 start-page: e46633 year: 2012 ident: 2021121718030776200_B27 article-title: Using amino acid physicochemical distance transformation for fast protein remote homology detection publication-title: PLoS One doi: 10.1371/journal.pone.0046633 – volume: 55 start-page: 77 year: 2012 ident: 2021121718030776200_B32 article-title: Probabilistic topic models publication-title: Commun. ACM doi: 10.1145/2133806.2133826 – volume: 37 start-page: D37 year: 2009 ident: 2021121718030776200_B7 article-title: DiProDB: a database for dinucleotide properties publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkn597 – volume: 16 start-page: 321 year: 2002 ident: 2021121718030776200_B77 article-title: SMOTE: synthetic minority over-sampling technique publication-title: J. Artif. Intell. Res. doi: 10.1613/jair.953 – volume: 35 start-page: 50 year: 2015 ident: 2021121718030776200_B2 article-title: The language of the protein universe publication-title: Curr. Opin. Genet. Dev. doi: 10.1016/j.gde.2015.08.010 – volume: 47 start-page: e127 year: 2019 ident: 2021121718030776200_B12 article-title: BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkz740 – volume: 25 start-page: 259 year: 1998 ident: 2021121718030776200_B48 article-title: An introduction to latent semantic analysis publication-title: Discourse Processes doi: 10.1080/01638539809545028 – volume: 577 start-page: 706 year: 2020 ident: 2021121718030776200_B9 article-title: Improved protein structure prediction using potentials from deep learning publication-title: Nature doi: 10.1038/s41586-019-1923-7 – volume: 33 start-page: 831 year: 2015 ident: 2021121718030776200_B10 article-title: Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning publication-title: Nat. Biotechnol. doi: 10.1038/nbt.3300 – volume: 432 start-page: 5860 year: 2020 ident: 2021121718030776200_B23 article-title: iDRBP_MMC: identifying DNA-binding proteins and RNA-binding proteins based on multi-label learning model and motif-based convolutional neural network publication-title: J. Mol. Biol. doi: 10.1016/j.jmb.2020.09.008 – volume: 2 start-page: 419 year: 2002 ident: 2021121718030776200_B39 article-title: Text classification using string kernels publication-title: J. Mach. Learn. Res. – volume: 13 start-page: 1063 year: 2012 ident: 2021121718030776200_B68 article-title: Analysis of a random forests model publication-title: J. Mach. Learn. Res. – volume: 21 start-page: 4239 year: 2005 ident: 2021121718030776200_B61 article-title: Profile-based direct kernels for remote homology detection and fold recognition publication-title: Bioinformatics doi: 10.1093/bioinformatics/bti687 – volume: 3 start-page: 1157 year: 2003 ident: 2021121718030776200_B89 article-title: An introduction to variable and feature selection publication-title: J. Mach. Learn. Res. – start-page: 1532 year: 2014 ident: 2021121718030776200_B54 article-title: Glove: Global vectors for word representation publication-title: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing doi: 10.3115/v1/D14-1162 – volume: 7 start-page: 121 year: 2008 ident: 2021121718030776200_B36 article-title: Predicting flexible length linear B-cell epitopes publication-title: Comput. Syst. Bioinformatics Conf. doi: 10.1142/9781848162648_0011 – volume: 31 start-page: 279 year: 2015 ident: 2021121718030776200_B15 article-title: Rcpi: R/Bioconductor package to generate various descriptors of proteins, compounds and their interactions publication-title: Bioinformatics doi: 10.1093/bioinformatics/btu624 – volume: 49 start-page: e60 year: 2021 ident: 2021121718030776200_B69 article-title: iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization publication-title: Nucleic. Acids. Res. doi: 10.1093/nar/gkab122 – volume: 116 start-page: 3636 year: 2019 ident: 2021121718030776200_B3 article-title: Grammar of protein domain architectures publication-title: Proc. Natl. Acad. Sci. U.S.A doi: 10.1073/pnas.1814684116 – start-page: 133 volume-title: Proceedings of the First Instructional Conference on Machine Learning year: 2003 ident: 2021121718030776200_B45 article-title: Using tf-idf to determine word relevance in document queries – volume: 17 start-page: 763 year: 2001 ident: 2021121718030776200_B92 article-title: Principal component analysis for clustering gene expression data publication-title: Bioinformatics doi: 10.1093/bioinformatics/17.9.763 – volume: 31 start-page: 119 year: 2015 ident: 2021121718030776200_B104 article-title: PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions publication-title: Bioinformatics doi: 10.1093/bioinformatics/btu602 – volume: 27 start-page: 379 year: 1948 ident: 2021121718030776200_B20 article-title: A mathematical theory of communication publication-title: Bell Syst. Tech. J. doi: 10.1002/j.1538-7305.1948.tb01338.x – volume: 34 start-page: 8 year: 2015 ident: 2021121718030776200_B98 article-title: PseDNA-Pro: DNA-binding protein identification by combining Chou's PseAAC and physicochemical distance transformation publication-title: Mol. Inf. doi: 10.1002/minf.201400025 – volume: 16 start-page: 315 year: 2019 ident: 2021121718030776200_B18 article-title: Selene: a PyTorch-based deep learning library for sequence data publication-title: Nat. Methods doi: 10.1038/s41592-019-0360-8 – volume: 15 start-page: 633 year: 2000 ident: 2021121718030776200_B87 article-title: Data mining for text categorization with semi-supervised agglomerative hierarchical clustering publication-title: Int. J. Intell. Syst. doi: 10.1002/(SICI)1098-111X(200007)15:7<633::AID-INT4>3.0.CO;2-8 – volume: 32 start-page: 2411 year: 2016 ident: 2021121718030776200_B95 article-title: iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework publication-title: Bioinformatics doi: 10.1093/bioinformatics/btw186 – start-page: 46 volume-title: Proceedings of the 18th International Conference on Database Systems for Advanced Applications year: 2013 ident: 2021121718030776200_B26 article-title: DNA Encoding for Splice Site Prediction in Large DNA Sequence doi: 10.1007/978-3-642-40270-8_4 – volume: 16 start-page: 142 year: 2015 ident: 2021121718030776200_B19 article-title: Pydna: a simulation and documentation tool for DNA assembly strategies using python publication-title: BMC Bioinformatics doi: 10.1186/s12859-015-0544-x – volume: 24 start-page: 303 year: 2011 ident: 2021121718030776200_B99 article-title: SVM based prediction of RNA-binding proteins using binding residues and evolutionary information publication-title: J. Mol. Recognit. doi: 10.1002/jmr.1061 – volume: 33 start-page: 685 year: 2017 ident: 2021121718030776200_B57 article-title: Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks publication-title: Bioinformatics doi: 10.1093/bioinformatics/btw678 – volume: 588 start-page: 203 year: 2020 ident: 2021121718030776200_B101 article-title: It will change everything’: DeepMind's AI makes gigantic leap in solving protein structures publication-title: Nature doi: 10.1038/d41586-020-03348-4 – volume: 37 start-page: 592 year: 2019 ident: 2021121718030776200_B16 article-title: The Kipoi repository accelerates community exchange and reuse of predictive models for genomics publication-title: Nat. Biotechnol. doi: 10.1038/s41587-019-0140-0 – volume: 36 start-page: D202 year: 2008 ident: 2021121718030776200_B6 article-title: AAindex: amino acid index database, progress report 2008 publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkm998 – start-page: 6000 volume-title: Proceedings of the 31st International Conference on Neural Information Processing Systems year: 2017 ident: 2021121718030776200_B73 article-title: Attention is all you need – volume: 36 start-page: 4038 year: 2020 ident: 2021121718030776200_B66 article-title: An efficient approach based on multi-sources information to predict circRNA-disease associations using deep convolutional neural network publication-title: Bioinformatics doi: 10.1093/bioinformatics/btz825 – volume: 21 start-page: 930 year: 2007 ident: 2021121718030776200_B91 article-title: Feature selection using decision tree and classification through proximal support vector machine for fault diagnostics of roller bearing publication-title: Mech. Syst. Signal Process. doi: 10.1016/j.ymssp.2006.05.004 – volume: 17 start-page: 579 year: 2001 ident: 2021121718030776200_B4 article-title: Reading the book of life publication-title: Bioinformatics doi: 10.1093/bioinformatics/17.7.579 – volume: 36 start-page: 4576 year: 2020 ident: 2021121718030776200_B103 article-title: Accurate prediction of genome-wide RNA secondary structure profile based on extreme gradient boosting publication-title: Bioinformatics doi: 10.1093/bioinformatics/btaa534 – volume: 2 start-page: 27 year: 2011 ident: 2021121718030776200_B67 article-title: LIBSVM: a library for support vector machines publication-title: ACM Trans. Intell. Syst. Technol. doi: 10.1145/1961189.1961199 – volume: 3 start-page: 1137 year: 2003 ident: 2021121718030776200_B51 article-title: A neural probabilistic language model publication-title: J. Mach. Learn. Res. – volume: 12 start-page: 2825 year: 2011 ident: 2021121718030776200_B82 article-title: Scikit-learn: machine learning in Python publication-title: J. Mach. Learn. Res. – volume: 7 start-page: 165241 year: 2019 ident: 2021121718030776200_B108 article-title: iEsGene-ZCPseKNC: identify essential genes based on Z curve pseudo k-tuple nucleotide composition publication-title: Ieee Access doi: 10.1109/ACCESS.2019.2952237 – volume: 9 start-page: 510 year: 2008 ident: 2021121718030776200_B42 article-title: A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-9-510 – start-page: 183 year: 2020 ident: 2021121718030776200_B8 article-title: Few-Shot NLG with Pre-Trained Language Model publication-title: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) doi: 10.18653/v1/2020.acl-main.18 – volume: 4 start-page: e1000134 year: 2008 ident: 2021121718030776200_B34 article-title: Predicting human nucleosome occupancy from primary sequence publication-title: PLoS Comput. Biol. doi: 10.1371/journal.pcbi.1000134 – start-page: 286 volume-title: Proceedings of the 18th European conference on Machine Learning year: 2007 ident: 2021121718030776200_B80 article-title: Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches – volume: 11 start-page: 3488 year: 2020 ident: 2021121718030776200_B17 article-title: Deep learning for genomics using Janggu publication-title: Nat. Commun. doi: 10.1038/s41467-020-17155-y – volume: 125 start-page: 167 year: 1994 ident: 2021121718030776200_B97 article-title: Fast folding and comparison of rna secondary structures publication-title: Monatsh. Chem. doi: 10.1007/BF00818163 – volume: 10 start-page: e0121501 year: 2015 ident: 2021121718030776200_B96 article-title: Identification of real microRNA precursors with a pseudo structure status composition approach publication-title: PLoS One doi: 10.1371/journal.pone.0121501 – volume: 19 start-page: 2483 year: 2018 ident: 2021121718030776200_B76 article-title: IDP–CRF: intrinsically disordered protein/region identification based on conditional random fields publication-title: Int. J. Mol. Sci. doi: 10.3390/ijms19092483 – volume: 20 start-page: 1280 year: 2019 ident: 2021121718030776200_B11 article-title: BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches publication-title: Brief. Bioinform. doi: 10.1093/bib/bbx165 – volume: 11 start-page: e0153268 year: 2016 ident: 2021121718030776200_B38 article-title: Accurate prediction of transposon-derived piRNAs by integrating various sequential and physicochemical features publication-title: PLoS One doi: 10.1371/journal.pone.0153268 – start-page: 243 volume-title: Proceedings of the 9th International Conference on Machine Learning and Computing year: 2017 ident: 2021121718030776200_B79 article-title: Combining Over-Sampling and Under-Sampling Techniques for Imbalance Dataset doi: 10.1145/3055635.3056643 – volume: 21 start-page: 1047 year: 2019 ident: 2021121718030776200_B13 article-title: iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data publication-title: Brief. Bioinform. doi: 10.1093/bib/bbz041 – volume: 20 start-page: 467 year: 2004 ident: 2021121718030776200_B37 article-title: Mismatch string kernels for discriminative protein classification publication-title: Bioinformatics doi: 10.1093/bioinformatics/btg431 – volume: 42 start-page: 12961 year: 2014 ident: 2021121718030776200_B40 article-title: iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition publication-title: Nucleic Acids Res. doi: 10.1093/nar/gku1019 – volume: 18 start-page: 379 year: 2017 ident: 2021121718030776200_B109 article-title: EL_PSSM-RT: DNA-binding residue prediction by integrating ensemble learning with PSSM relation transformation publication-title: BMC Bioinformatics doi: 10.1186/s12859-017-1792-8 – volume: 2 start-page: 427 year: 2017 ident: 2021121718030776200_B55 article-title: Bag of Tricks for Efficient Text Classification publication-title: Conference of the European Chapter of the Association for Computational Linguistics – volume: 33 start-page: 854 year: 2017 ident: 2021121718030776200_B100 article-title: RBPPred: predicting RNA-binding proteins from sequence using SVM publication-title: Bioinformatics doi: 10.1093/bioinformatics/btw730 – volume: 40 start-page: 1207 year: 2007 ident: 2021121718030776200_B86 article-title: Texture classification and segmentation using wavelet packet frame and Gaussian mixture model publication-title: Pattern Recogn doi: 10.1016/j.patcog.2006.09.012 – volume: 37 start-page: 1001 year: 1989 ident: 2021121718030776200_B29 article-title: A tree-based statistical language model for natural language speech recognition publication-title: IEEE Trans. Acoust. Speech Signal Process. doi: 10.1109/29.32278 – volume: 15 start-page: S3 year: 2014 ident: 2021121718030776200_B43 article-title: Using distances between Top-n-gram and residue pairs for protein remote homology detection publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-15-S16-S3 – volume: 40 start-page: 181 year: 2014 ident: 2021121718030776200_B81 article-title: Fast image reconstruction with L2-regularization publication-title: J. Magn. Reson. Imaging doi: 10.1002/jmri.24365 – volume: 9 start-page: 495 year: 2018 ident: 2021121718030776200_B28 article-title: M6AMRFS: robust prediction of N6-methyladenosine sites with sequence-based features in multiple species publication-title: Front. Genet. doi: 10.3389/fgene.2018.00495 – volume: 5 start-page: 290 year: 2001 ident: 2021121718030776200_B94 article-title: ECG data compression using truncated singular value decomposition publication-title: Trans. Info. Tech. Biomed. doi: 10.1109/4233.966104 – volume: 10 start-page: 142 year: 1954 ident: 2021121718030776200_B52 article-title: Distributional Structure publication-title: Word doi: 10.1080/00437956.1954.11659520 – volume: 4 start-page: 267 year: 2012 ident: 2021121718030776200_B70 article-title: An introduction to conditional random fields publication-title: Found. Trends Mach. Learn. doi: 10.1561/2200000013 – year: 2017 ident: 2021121718030776200_B74 article-title: Weighted transformer network for machine translation – volume: 19 start-page: 1531 year: 2003 ident: 2021121718030776200_B62 article-title: Probabilistic scoring measures for profile-profile comparison yield more accurate short seed alignments publication-title: Bioinformatics doi: 10.1093/bioinformatics/btg185 – volume: 47 start-page: 4406 year: 2019 ident: 2021121718030776200_B46 article-title: TriPepSVM: de novo prediction of RNA-binding proteins based on short amino acid motifs publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkz203 – volume: 21 start-page: 1733 year: 2020 ident: 2021121718030776200_B56 article-title: DeepSVM-fold: protein fold recognition by combining support vector machines and pairwise sequence similarity scores generated by deep learning networks publication-title: Brief. Bioinform. doi: 10.1093/bib/bbz098 – volume: 25 start-page: 2655 year: 2009 ident: 2021121718030776200_B24 article-title: A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation publication-title: Bioinformatics doi: 10.1093/bioinformatics/btp500 – volume: 31 start-page: 264 year: 1999 ident: 2021121718030776200_B83 article-title: Data clustering: a review publication-title: ACM computing surveys doi: 10.1145/331499.331504 – start-page: 404 volume-title: Proceedings of the 2004 conference on Empirical Methods in Natural Language Processing year: 2004 ident: 2021121718030776200_B31 article-title: Textrank: Bringing order into text – start-page: 583 year: 1997 ident: 2021121718030776200_B93 article-title: Kernel Principal Component Analysis publication-title: Proceedings of the 7th International Conference on Artificial Neural Networks – volume: 7 start-page: 68 year: 2006 ident: 2021121718030776200_B5 article-title: Protein linguistics - a grammar for modular protein assembly? publication-title: Nat. Rev. Mol. Cell Biol. doi: 10.1038/nrm1785 – start-page: 226 volume-title: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining year: 1996 ident: 2021121718030776200_B85 article-title: A density-based algorithm for discovering clusters in large spatial databases with noise – start-page: 1724 year: 2014 ident: 2021121718030776200_B72 article-title: Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation publication-title: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing doi: 10.3115/v1/D14-1179 – volume: 53 start-page: 226 year: 2012 ident: 2021121718030776200_B78 article-title: Preprocessing unbalanced data using support vector machine publication-title: Decision Support Systems doi: 10.1016/j.dss.2012.01.016 – volume: 315 start-page: 972 year: 2007 ident: 2021121718030776200_B84 article-title: Clustering by passing messages between data points publication-title: Science doi: 10.1126/science.1136800 – start-page: 248 volume-title: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing year: 2009 ident: 2021121718030776200_B50 article-title: Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora |
| SSID | ssj0014154 |
| Score | 2.689744 |
| Snippet | Abstract
In order to uncover the meanings of ‘book of life’, 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are... In order to uncover the meanings of ‘book of life’, 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in... In order to uncover the meanings of 'book of life', 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in... |
| SourceID | pubmedcentral proquest pubmed crossref oup |
| SourceType | Open Access Repository Aggregation Database Index Database Enrichment Source Publisher |
| StartPage | e129 |
| SubjectTerms | Deoxyribonuclease I DNA-Binding Proteins - chemistry Intrinsically Disordered Proteins - chemistry Methods Online MicroRNAs - chemistry Models, Statistical Natural Language Processing Nucleic Acid Conformation RNA Precursors - chemistry RNA-Binding Proteins - chemistry Sequence Analysis, DNA - methods Sequence Analysis, Protein - methods Sequence Analysis, RNA - methods Software |
| Title | BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/34581805 https://www.proquest.com/docview/2577458303 https://pubmed.ncbi.nlm.nih.gov/PMC8682797 |
| Volume | 49 |
| WOSCitedRecordID | wos000736046000003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 1362-4962 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014154 issn: 0305-1048 databaseCode: DOA dateStart: 20050101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVASL databaseName: Oxford Journals Open Access Collection customDbUrl: eissn: 1362-4962 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014154 issn: 0305-1048 databaseCode: TOX dateStart: 19960101 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Pb9MwFH-CCQku_NlglEF5kyYOiGhOnMbObt1g4rB1CHroLXLsl63acEfbIbFPs8-yTzbbSaN1moCLL36JLD8n769_P4AtqfOYG1G6T1yzKK3c4JGVIqWrnuHeQGsVyCbEYCBHo_xb0yA7u6eEn_Ntq6bbx6eqlIm_pxf3pCcqGB6N2mKBs0E1SlQA1Uxlcw3vzrNLhmfpMtstn_Jua-QtW7P_7H9X-RyeNt4k9mv1v4AHZFdhrW9dJP3zD37A0N8ZEuer8Hhvwe22Bie748kP-hXtHhzuoMLzMzX3ziu6AZWHKbl0Fg0_D_qf8Pugf32lrMGA6DC22DZfo7eABicWayQnr25c5D8xUOzMXsJw_8tw72vUcC5EOhVyHnFGZUa6qtKUqqysykyTJkMuaJUUM2LeAWAkjckESUXS-zulJp7kimWav4IVO7H0GrBy3kPuwjOde1Zrw2Ralabn4dxZqhIlOvBxoY9CN3jknhbjrKjr4rxwW1o0W9qBrVb4vIbhuF_svVPs3yU2F0ov3J776oiyNLmYFe7fJXwRmfEOrNeHoH0RdxOxZL0OiKXj0Qp4kO7lGTs-CWDdMpOJyMWbf65sA54kvl8mTqI4ewsr8-kFvYNH-vd8PJt24aEYyW7IGHTD6b8Bd4EDJw |
| linkProvider | Oxford University Press |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=BioSeq-BLM%3A+a+platform+for+analyzing+DNA%2C+RNA%C2%A0and+protein+sequences+based+on+biological+language+models&rft.jtitle=Nucleic+acids+research&rft.au=Li%2C+Hong-Liang&rft.au=Pang%2C+Yi-He&rft.au=Liu%2C+Bin&rft.date=2021-12-16&rft.pub=Oxford+University+Press&rft.issn=0305-1048&rft.eissn=1362-4962&rft.volume=49&rft.issue=22&rft.spage=e129&rft.epage=e129&rft_id=info:doi/10.1093%2Fnar%2Fgkab829&rft.externalDocID=10.1093%2Fnar%2Fgkab829 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0305-1048&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0305-1048&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0305-1048&client=summon |