ProPythia: A Python package for protein classification based on machine and deep learning
The field of protein data mining has been growing rapidly in the last years. To characterize proteins and determine their function from their amino acid sequences are challenging and long-standing problems, where Bioinformatics and Machine Learning have an emergent role. A myriad of machine and deep...
Saved in:
| Published in: | Neurocomputing (Amsterdam) Vol. 484; pp. 172 - 182 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier B.V
01.05.2022
|
| Subjects: | |
| ISSN: | 0925-2312, 1872-8286 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | The field of protein data mining has been growing rapidly in the last years. To characterize proteins and determine their function from their amino acid sequences are challenging and long-standing problems, where Bioinformatics and Machine Learning have an emergent role. A myriad of machine and deep learning algorithms have been applied in these tasks with exciting results. However, tools and platforms to calculate protein features and perform both Machine Learning (ML) and Deep Learning (DL) pipelines, taking as inputs protein sequences, are still lacking and have their limitations in terms of performance, user-friendliness and restricted domains of application.
Here, to address these limitations, we propose ProPythia, a generic and modular Python package that allows to easily deploy ML and DL approaches for a plethora of problems in protein sequence analysis and classification. It facilitates the implementation, comparison and validation of the major tasks in ML or DL pipelines including modules to read and alter sequences, calculate protein features, preprocess datasets, execute feature selection and dimensionality reduction, perform clustering and manifold analysis, as well as to train and optimize ML/DL models and use them to make predictions.
ProPythia has an adaptable modular architecture being a versatile and easy-to-use tool, which will be useful to transform protein data in valuable knowledge even for people not familiarized with ML code. This platform was tested in several applications comparing with results from literature. Here, we illustrate its applicability in two cases studies: the prediction of antimicrobial peptides and the prediction of enzymes Enzyme commission (EC) numbers. Furthermore, we assess the performance of the different descriptors on four different protein classification challenges. Its source code and documentation, including an user guide and case studies are freely available at https://github.com/BioSystemsUM/propythia. |
|---|---|
| AbstractList | The field of protein data mining has been growing rapidly in the last years. To characterize proteins and determine their function from their amino acid sequences are challenging and long-standing problems, where Bioinformatics and Machine Learning have an emergent role. A myriad of machine and deep learning algorithms have been applied in these tasks with exciting results. However, tools and platforms to calculate protein features and perform both Machine Learning (ML) and Deep Learning (DL) pipelines, taking as inputs protein sequences, are still lacking and have their limitations in terms of performance, user-friendliness and restricted domains of application.
Here, to address these limitations, we propose ProPythia, a generic and modular Python package that allows to easily deploy ML and DL approaches for a plethora of problems in protein sequence analysis and classification. It facilitates the implementation, comparison and validation of the major tasks in ML or DL pipelines including modules to read and alter sequences, calculate protein features, preprocess datasets, execute feature selection and dimensionality reduction, perform clustering and manifold analysis, as well as to train and optimize ML/DL models and use them to make predictions.
ProPythia has an adaptable modular architecture being a versatile and easy-to-use tool, which will be useful to transform protein data in valuable knowledge even for people not familiarized with ML code. This platform was tested in several applications comparing with results from literature. Here, we illustrate its applicability in two cases studies: the prediction of antimicrobial peptides and the prediction of enzymes Enzyme commission (EC) numbers. Furthermore, we assess the performance of the different descriptors on four different protein classification challenges. Its source code and documentation, including an user guide and case studies are freely available at https://github.com/BioSystemsUM/propythia. |
| Author | Lousa, Diana Rocha, Miguel Sequeira, Ana Marta |
| Author_xml | – sequence: 1 givenname: Ana Marta surname: Sequeira fullname: Sequeira, Ana Marta email: id9417@alunos.uminho.pt organization: CEB-Centre Biological Engineering, University of Minho, 4710-057 Braga, Portugal – sequence: 2 givenname: Diana surname: Lousa fullname: Lousa, Diana email: dlousa@itqb.unl.pt organization: Protein Modelling Laboratory, Instituto de Tecnologia Química e Biológica António Xavier (ITQB NOVA), Universidade Nova de Lisboa, 2780-157 Oeiras, Portugal – sequence: 3 givenname: Miguel surname: Rocha fullname: Rocha, Miguel email: mrocha@di.uminho.pt organization: CEB-Centre Biological Engineering, University of Minho, 4710-057 Braga, Portugal |
| BookMark | eNqFkM1OwzAQhC1UJNrCG3DwCyTYzn8PSFXFn1SJHuDAyXLWm9YltSM7IPXtSSgnDnDa0Wi_0e7MyMQ6i4RccxZzxvObfWzxA9whFkzwmBWDK87IlJeFiEpR5hMyZZXIIpFwcUFmIewZ4wUX1ZS8bbzbHPudUQu6pKNylnYK3tUWaeM87bzr0VgKrQrBNAZUb4aVWgXUdBAHBTtjkSqrqUbsaIvKW2O3l-S8UW3Aq585J6_3dy-rx2j9_PC0Wq4jSBLeRwVHwSotdMMhQyhrwLxmZV2xBjgq1CVCiplI6lqjQAZZrqpUpVXGGhQ6SeZkccoF70Lw2Egw_feRvVemlZzJsSS5l6eS5FiSZMXgigFOf8GdNwflj_9htycMh8c-DXoZwKAF1MYj9FI783fAF9Zhh8k |
| CitedBy_id | crossref_primary_10_1016_j_engappai_2024_108195 crossref_primary_10_1002_prot_26822 crossref_primary_10_1016_j_csbj_2025_02_011 crossref_primary_10_1093_bib_bbae225 crossref_primary_10_1016_j_chroma_2023_464304 crossref_primary_10_1038_s41598_024_62419_y crossref_primary_10_1007_s11831_025_10377_7 crossref_primary_10_1109_JBHI_2024_3425716 crossref_primary_10_2174_1574893618666230818121046 crossref_primary_10_1093_nargab_lqac103 crossref_primary_10_1093_femsre_fuad003 crossref_primary_10_1186_s12859_025_06079_3 |
| Cites_doi | 10.1186/s13321-018-0270-2 10.1101/365965 10.1038/s41586-019-1923-7 10.1038/s41467-020-17155-y 10.1093/bioinformatics/bty222 10.1093/bioinformatics/btp163 10.1093/database/baw133 10.1371/journal.pone.0141287 10.1101/626507 10.1021/ci400127q 10.1007/978-1-60327-194-3_2 10.1093/nar/25.17.3389 10.1093/nar/gkz740 10.1186/1471-2105-15-93 10.1093/bioinformatics/btx680 10.1186/s12864-020-06978-0 10.1016/j.jmb.2016.10.013 10.1186/s13040-019-0196-x 10.1093/bib/bbz150 10.1016/j.ab.2019.04.011 10.1093/bioinformatics/btx531 10.1186/s12859-018-2368-y 10.3389/fbioe.2020.00391 10.18632/oncotarget.14524 10.1016/j.eswa.2010.09.005 10.1093/bioinformatics/bty179 10.1093/bioinformatics/btt072 10.1093/bib/bbx165 10.1093/bioinformatics/btv345 10.1038/s41598-018-19752-w 10.1093/nargab/lqab039 10.1093/bioinformatics/btv042 10.1042/bse0590001 10.1093/nar/gky1048 10.1101/2020.09.04.282814 10.1073/pnas.1821905116 10.1093/bioinformatics/btg255 10.1038/s41586-018-0124-0 10.1016/j.patter.2020.100178 10.1002/prot.25832 10.1021/jm9700575 10.1101/599126 10.1073/pnas.1609893113 10.18632/oncotarget.20365 |
| ContentType | Journal Article |
| Copyright | 2021 Elsevier B.V. |
| Copyright_xml | – notice: 2021 Elsevier B.V. |
| DBID | AAYXX CITATION |
| DOI | 10.1016/j.neucom.2021.07.102 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1872-8286 |
| EndPage | 182 |
| ExternalDocumentID | 10_1016_j_neucom_2021_07_102 S0925231221016568 |
| GroupedDBID | --- --K --M .DC .~1 0R~ 123 1B1 1~. 1~5 4.4 457 4G. 53G 5VS 7-5 71M 8P~ 9JM 9JN AABNK AACTN AADPK AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAXLA AAXUO AAYFN ABBOA ABCQJ ABFNM ABJNI ABMAC ABYKQ ACDAQ ACGFS ACRLP ACZNC ADBBV ADEZE AEBSH AEKER AENEX AFKWA AFTJW AFXIZ AGHFR AGUBO AGWIK AGYEJ AHHHB AHZHX AIALX AIEXJ AIKHN AITUG AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD AXJTR BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EO8 EO9 EP2 EP3 F5P FDB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ IHE J1W KOM LG9 M41 MO0 MOBAO N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 ROL RPZ SDF SDG SDP SES SPC SPCBC SSN SSV SSZ T5K ZMT ~G- 29N 9DU AAQXK AATTM AAXKI AAYWO AAYXX ABWVN ABXDB ACLOT ACNNM ACRPL ACVFH ADCNI ADJOM ADMUD ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP ASPBG AVWKF AZFZN CITATION EFKBS EJD FEDTE FGOYB HLZ HVGLF HZ~ R2- SBC SEW WUQ XPP ~HD |
| ID | FETCH-LOGICAL-c331t-71e209d2df1c5ec8bce6b08b90fc1eaed8ec4e523bbde2e0c56a94a4950fe2d33 |
| ISICitedReferencesCount | 17 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000772806500002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0925-2312 |
| IngestDate | Sat Nov 29 07:13:26 EST 2025 Tue Nov 18 22:11:51 EST 2025 Fri Feb 23 02:41:14 EST 2024 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Deep learning Python Package Enzyme Protein/peptide classification Machine learning Antimicrobial peptide |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c331t-71e209d2df1c5ec8bce6b08b90fc1eaed8ec4e523bbde2e0c56a94a4950fe2d33 |
| OpenAccessLink | http://hdl.handle.net/1822/76505 |
| PageCount | 11 |
| ParticipantIDs | crossref_citationtrail_10_1016_j_neucom_2021_07_102 crossref_primary_10_1016_j_neucom_2021_07_102 elsevier_sciencedirect_doi_10_1016_j_neucom_2021_07_102 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-05-01 2022-05-00 |
| PublicationDateYYYYMMDD | 2022-05-01 |
| PublicationDate_xml | – month: 05 year: 2022 text: 2022-05-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationTitle | Neurocomputing (Amsterdam) |
| PublicationYear | 2022 |
| Publisher | Elsevier B.V |
| Publisher_xml | – name: Elsevier B.V |
| References | Robinson (b0170) 2015; 59 Ryu, Kim, Lee (b0045) 2019; 116 Zhao, Pinilla, Valmori, Martin, Simon (b0215) 2003; 19 Liu, Gao, Zhang (b0290) 2019; 47 M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, Y.J. Michael Isard, Rafal Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, M. Schuster, R. Monga, S. Moore, D. Murray, J. Chris Olah, O. Shlens, B. Steiner, I. Sutskever, P.T. Kunal Talwar, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, TensorFlow: Large-scale machine learning on heterogeneous systems. URL:tensorflow.org. B. Liu, BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches, Briefings in Bioinformatics (January) 1–15. doi:10.1093/bib/bbx165. Asgari, Mofrad (b0140) 2015; 10 Muhammod, Ahmed, Md Farid, Shatabda, Sharma, Dehzangi (b0285) 2019 Budach, Marsico (b0235) 2018; 34 J. Dong, Z.J. Yao, L. Zhang, F. Luo, Q. Lin, A.P. Lu, A.F. Chen, D.S. Cao, PyBioMed: a python library for various molecular representations of chemicals, proteins and DNAs and their interactions, Journal of Cheminformatics doi:10.1186/s13321-018-0270-2. Xiao, Cao, Zhu, Xu (b0260) 2015; 31 Chen, Zhao, Li, Li, Xiang, Chen, Akutsu, Daly, Webb, Zhao, Kurgan, Song (b0295) 2021 Dalkiran, Rifaioglu, Martin, Cetin-Atalay, Atalay, Dogan (b0050) 2018; 19 A.S. Schwartz, G.J. Hannum, Z.R. Dwiel, M.E. Smoot, A.R. Grant, J.M. Knight, S.A. Becker, J.R. Eads, M.C. Lafave, H. Eavani, Y. Liu, A.K. Bansal, T.H. Richardson, Deep Semantic Protein Representation for Annotation, Discovery, and Engineering, bioRxiv doi:10.1101/365965. S.F. Altschul, T.L. Madden, A.A. Schäffer, J. Zhang, Z. Zhang, W. Miller, D.J. Lipman, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs., Tech. Rep. 17 (1997). doi:10.1046/j.1471-8286.2003.00484.x. A. Pande, S. Patiyal, A. Lathwal, C. Arora, D. Kaur, A. Dhall, G. Mishra, H. Kaur, N. Sharma, S. Jain, S.S. Usmani, P. Agrawal, R. Kumar, V. Kumar, G.P. Raghava, Computing wide range of protein/peptide features from their sequence and structure, bioRxiv 599126 doi:10.1101/599126. Zou, Tian, Gao, Li (b0180) 2019; 10 Veltri, Kamath, Shehu (b0065) 2018; 34 M. Sandberg, et al., New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids, J. Med. Chem. 41 (14) (1998) 2481–2491. doi:10.1021/jm9700575. W. Kopp, R. Monti, A. Tamburrini, U. Ohler, A. Akalin, Deep learning for genomics using Janggu, Nat. Commun. (1) 1–7. doi:10.1038/s41467-020-17155-y. Liu, Wu, Zhang, Wang, Chou (b0280) 2017; 8 F. Chollet, E. all., Keras (2015). URL:https://keras.io. Spänig, Heider (b0110) 2019; 12 van den Berg, Reinders, Roubos, de Ridder (b0270) 2014; 15 D.S. Cao, Y.Z. Liang, J. Yan, G.S. Tan, Q.S. Xu, S. Liu, PyDPI: Freely available python package for chemoinformatics, bioinformatics, and chemogenomics studies, Journal of Chemical Information and Modeling doi:10.1021/ci400127q. Shi, Chen, Huang, Wang, Xue (b0005) 2019; 00 P. Bhadra, J. Yan, J. Li, S. Fong, S.W. Siu, AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest, Scientific Reports (1) 1–10. doi:10.1038/s41598-018-19752-w. Müller, Gabernet, Hiss, Schneider (b0150) 2017; 33 I. Inza, B. Calvo, R. Armañanzas, E. Bengoetxea, P. Larrañaga, J.A. Lozano, Machine learning: an indispensable tool in bioinformatics., Methods in molecular biology (Clifton, N.J.) 593 (2010) 25–48. doi:10.1007/978-1-60327-194-3_2. L. Nanni, A. Lumini, A new encoding technique for peptide classification, Expert Systems with Applications (4) 3185–3191. doi:10.1016/j.eswa.2010.09.005. Villegas-Morcillo, Makrodimitris, van Ham, Gomez, Sanchez, Reinders (b0130) 2020 Cao, Xu, Liang (b0240) 2013; 29 Muller, Guido (b0105) 2017 L. McInnes, J. Healy, J. Melville, UMAP: Uniform manifold approximation and projection for dimension reduction, arXiv arXiv:1802.03426. Price, Wetmore, Waters, Callaghan, Ray, Liu, Kuehl, Melnyk, Lamson, Suh, Carlson, Esquivel, Sadeeshkumar, Chakraborty, Zane, Rubin, Wall, Visel, Bristow, Blow, Arkin, Deutschbauer (b0020) 2018; 557 V.I. Jurtz, A.R. Johansen, M. Nielsen, J.J. Almagro Armenteros, H. Nielsen, C.K. Sønderby, O. Winther, S.K. Sønderby, An introduction to deep learning on biological sequence data: Examples and solutions, Bioinformatics 33 (22) (2017) 3685–3690. doi:10.1093/bioinformatics/btx531. Li, Wang, Umarov, Xie, Fan, Li, Gao (b0040) 2018; 34 Chen, Zhao, Li, Marquez-Lago, Leier, Revote, Zhu, Powell, Akutsu, Webb, Chou, Smith, Daly, Li, Song (b0220) 2019; 00 Zhang, Tao, Zeng, Qin, Chen, Zhu, Yang, Li, Chen, Chen (b0250) 2017; 429 Awad, Khanna (b0025) 2015 Spänig, Mohsen, Hattab, Hauschild, Heider (b0195) 2021; 3 A.W. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. Qin, A. Žídek, A.W. Nelson, A. Bridgland, H. Penedones, S. Petersen, K. Simonyan, S. Crossan, P. Kohli, D.T. Jones, D. Silver, K. Kavukcuoglu, D. Hassabis, Improved protein structure prediction using potentials from deep learning, Nature (7792) 706–710. doi:10.1038/s41586-019-1923-7. M. Littmann, M. Heinzinger, C. Dallago, T. Olenyi, &. B. Rost, Embeddings from deep learning transfer GO annotations beyond homology, bioRxiv 2020.09.04.282814 doi:10.1038/s41598-020-80786-0. Y. Cai, J. Wang, L. Deng, SDN2GO: An integrated deep learning model for protein function prediction, Frontiers in Bioengineering and Biotechnology 8. doi:10.3389/fbioe.2020.00391. Pedregosa, Varoquaux, Gramfort, Michel, Thirion, Grisel, Blondel, Prettenhofer, Weiss, Dubourg, Vanderplas, Passos, Cournapeau, Brucher, Perrot, Duchesnay (b0085) 2011; 12 Chollet (b0100) 2017 Dong, Zhu, Yun, Lu, Hou, Cao (b0265) 2021; 22 Nanni, Lumini, Brahnam (b0190) 2014; 1 Brandes, Ofer, Linial (b0275) 2016; 2016 M.L. Bileschi, D. Belanger, D. Bryant, T. Sanderson, B. Carter, D. Sculley, M.A. DePristo, L.J. Colwell, Using Deep Learning to Annotate the Protein Universe, bioRxiv (2019) 1–29 doi:10.1101/626507. Jeske, Placzek, Schomburg, Chang, Schomburg (b0185) 2019; 47 B. Manavalan, S. Basith, T. Hwan Shin, S. Choi, M. Ok Kim, G. Lee, MLACP: machine-learning-based prediction of anticancer peptides, Oncotarget (44) 77121–77136. doi:10.18632/oncotarget.20365. Wang, Wang, Li, Lee (b0205) 2020; 8 Bonetta, Valentino (b0080) 2020; 88 T.T.D. Nguyen, N.Q.K. Le, Q.T. Ho, D.V. Phan, Y.Y. Ou, Using word embedding technique to efficiently represent protein sequences for identifying substrate specificities of transporters, Analytical Biochemistry (January) 73–81. doi:10.1016/j.ab.2019.04.011. A. Tomic, I. Tomic, L. Waldron, L. Geistlinger, M. Kuhn, R.L. Spreng, L.C. Dahora, K.E. Seaton, G. Tomaras, J. Hill, N.A. Duggal, R.D. Pollock, N.R. Lazarus, S.D. Harridge, J.M. Lord, P. Khatri, A.J. Pollard, M.M. Davis, SIMON: Open-Source Knowledge Discovery Platform, Patterns (1) 100178. doi:10.1016/j.patter.2020.100178. Fu, Cao, Li, Wang (b0070) 2020; 21 Ofer, Linial (b0255) 2015; 31 Cock, Antao, Chang, Chapman, Cox, Dalke, Friedberg, Hamelryck, Kauff, Wilczynski, De Hoon (b0145) 2009; 25 E.Y. Lee, B.M. Fulan, G.C.L. Wong, A.L. Ferguson, Mapping membrane activity in undiscovered peptide sequence space using machine learning, no. 48. doi:10.1073/pnas.1609893113. Veltri (10.1016/j.neucom.2021.07.102_b0065) 2018; 34 10.1016/j.neucom.2021.07.102_b0090 Cao (10.1016/j.neucom.2021.07.102_b0240) 2013; 29 Asgari (10.1016/j.neucom.2021.07.102_b0140) 2015; 10 10.1016/j.neucom.2021.07.102_b0210 10.1016/j.neucom.2021.07.102_b0010 10.1016/j.neucom.2021.07.102_b0175 10.1016/j.neucom.2021.07.102_b0055 10.1016/j.neucom.2021.07.102_b0095 Wang (10.1016/j.neucom.2021.07.102_b0205) 2020; 8 Liu (10.1016/j.neucom.2021.07.102_b0280) 2017; 8 10.1016/j.neucom.2021.07.102_b0135 10.1016/j.neucom.2021.07.102_b0015 Müller (10.1016/j.neucom.2021.07.102_b0150) 2017; 33 10.1016/j.neucom.2021.07.102_b0200 Muller (10.1016/j.neucom.2021.07.102_b0105) 2017 10.1016/j.neucom.2021.07.102_b0120 10.1016/j.neucom.2021.07.102_b0165 Ofer (10.1016/j.neucom.2021.07.102_b0255) 2015; 31 Awad (10.1016/j.neucom.2021.07.102_b0025) 2015 Dalkiran (10.1016/j.neucom.2021.07.102_b0050) 2018; 19 10.1016/j.neucom.2021.07.102_b0160 Chollet (10.1016/j.neucom.2021.07.102_b0100) 2017 Spänig (10.1016/j.neucom.2021.07.102_b0195) 2021; 3 Zhang (10.1016/j.neucom.2021.07.102_b0250) 2017; 429 Zhao (10.1016/j.neucom.2021.07.102_b0215) 2003; 19 10.1016/j.neucom.2021.07.102_b0245 10.1016/j.neucom.2021.07.102_b0125 Nanni (10.1016/j.neucom.2021.07.102_b0190) 2014; 1 10.1016/j.neucom.2021.07.102_b0155 10.1016/j.neucom.2021.07.102_b0035 10.1016/j.neucom.2021.07.102_b0230 10.1016/j.neucom.2021.07.102_b0030 10.1016/j.neucom.2021.07.102_b0075 Spänig (10.1016/j.neucom.2021.07.102_b0110) 2019; 12 Jeske (10.1016/j.neucom.2021.07.102_b0185) 2019; 47 Budach (10.1016/j.neucom.2021.07.102_b0235) 2018; 34 Robinson (10.1016/j.neucom.2021.07.102_b0170) 2015; 59 10.1016/j.neucom.2021.07.102_b0115 Zou (10.1016/j.neucom.2021.07.102_b0180) 2019; 10 Li (10.1016/j.neucom.2021.07.102_b0040) 2018; 34 Ryu (10.1016/j.neucom.2021.07.102_b0045) 2019; 116 Pedregosa (10.1016/j.neucom.2021.07.102_b0085) 2011; 12 Chen (10.1016/j.neucom.2021.07.102_b0220) 2019; 00 Muhammod (10.1016/j.neucom.2021.07.102_b0285) 2019 Fu (10.1016/j.neucom.2021.07.102_b0070) 2020; 21 Chen (10.1016/j.neucom.2021.07.102_b0295) 2021 10.1016/j.neucom.2021.07.102_b0060 Bonetta (10.1016/j.neucom.2021.07.102_b0080) 2020; 88 Liu (10.1016/j.neucom.2021.07.102_b0290) 2019; 47 Cock (10.1016/j.neucom.2021.07.102_b0145) 2009; 25 Brandes (10.1016/j.neucom.2021.07.102_b0275) 2016; 2016 van den Berg (10.1016/j.neucom.2021.07.102_b0270) 2014; 15 Shi (10.1016/j.neucom.2021.07.102_b0005) 2019; 00 Villegas-Morcillo (10.1016/j.neucom.2021.07.102_b0130) 2020 Price (10.1016/j.neucom.2021.07.102_b0020) 2018; 557 Xiao (10.1016/j.neucom.2021.07.102_b0260) 2015; 31 10.1016/j.neucom.2021.07.102_b0225 Dong (10.1016/j.neucom.2021.07.102_b0265) 2021; 22 10.1016/j.neucom.2021.07.102_b0300 |
| References_xml | – reference: W. Kopp, R. Monti, A. Tamburrini, U. Ohler, A. Akalin, Deep learning for genomics using Janggu, Nat. Commun. (1) 1–7. doi:10.1038/s41467-020-17155-y. – volume: 3 start-page: 1 year: 2021 end-page: 13 ident: b0195 article-title: A large-scale comparative study on peptide encodings for biomedical classification publication-title: NAR Genomics Bioinforma. – volume: 557 start-page: 503 year: 2018 end-page: 509 ident: b0020 article-title: Mutant phenotypes for thousands of bacterial genes of unknown function publication-title: Nature – volume: 34 start-page: 2740 year: 2018 end-page: 2747 ident: b0065 article-title: Deep learning improves antimicrobial peptide recognition publication-title: Bioinformatics – reference: M. Littmann, M. Heinzinger, C. Dallago, T. Olenyi, &. B. Rost, Embeddings from deep learning transfer GO annotations beyond homology, bioRxiv 2020.09.04.282814 doi:10.1038/s41598-020-80786-0. – volume: 34 start-page: 760 year: 2018 end-page: 769 ident: b0040 article-title: DEEPre: Sequence-based enzyme EC number prediction by deep learning publication-title: Bioinformatics – volume: 12 start-page: 2825 year: 2011 end-page: 2830 ident: b0085 article-title: Scikit-learn: Machine learning in Python publication-title: Journal of Machine Learning Research – volume: 12 start-page: 1 year: 2019 end-page: 29 ident: b0110 article-title: Encodings and models for antimicrobial peptide classification for multi-resistant pathogens publication-title: BioData Mining – reference: A. Tomic, I. Tomic, L. Waldron, L. Geistlinger, M. Kuhn, R.L. Spreng, L.C. Dahora, K.E. Seaton, G. Tomaras, J. Hill, N.A. Duggal, R.D. Pollock, N.R. Lazarus, S.D. Harridge, J.M. Lord, P. Khatri, A.J. Pollard, M.M. Davis, SIMON: Open-Source Knowledge Discovery Platform, Patterns (1) 100178. doi:10.1016/j.patter.2020.100178. – volume: 34 start-page: 3035 year: 2018 end-page: 3037 ident: b0235 article-title: Pysster: Classification of biological sequences by learning sequence and structure motifs with convolutional neural networks publication-title: Bioinformatics – year: 2015 ident: b0025 article-title: Efficient Learning Machines – reference: V.I. Jurtz, A.R. Johansen, M. Nielsen, J.J. Almagro Armenteros, H. Nielsen, C.K. Sønderby, O. Winther, S.K. Sønderby, An introduction to deep learning on biological sequence data: Examples and solutions, Bioinformatics 33 (22) (2017) 3685–3690. doi:10.1093/bioinformatics/btx531. – reference: A. Pande, S. Patiyal, A. Lathwal, C. Arora, D. Kaur, A. Dhall, G. Mishra, H. Kaur, N. Sharma, S. Jain, S.S. Usmani, P. Agrawal, R. Kumar, V. Kumar, G.P. Raghava, Computing wide range of protein/peptide features from their sequence and structure, bioRxiv 599126 doi:10.1101/599126. – reference: D.S. Cao, Y.Z. Liang, J. Yan, G.S. Tan, Q.S. Xu, S. Liu, PyDPI: Freely available python package for chemoinformatics, bioinformatics, and chemogenomics studies, Journal of Chemical Information and Modeling doi:10.1021/ci400127q. – reference: A.S. Schwartz, G.J. Hannum, Z.R. Dwiel, M.E. Smoot, A.R. Grant, J.M. Knight, S.A. Becker, J.R. Eads, M.C. Lafave, H. Eavani, Y. Liu, A.K. Bansal, T.H. Richardson, Deep Semantic Protein Representation for Annotation, Discovery, and Engineering, bioRxiv doi:10.1101/365965. – volume: 1 year: 2014 ident: b0190 article-title: An empirical study of different approaches for protein classification publication-title: Sci. World J. – reference: F. Chollet, E. all., Keras (2015). URL:https://keras.io. – reference: M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, Y.J. Michael Isard, Rafal Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, M. Schuster, R. Monga, S. Moore, D. Murray, J. Chris Olah, O. Shlens, B. Steiner, I. Sutskever, P.T. Kunal Talwar, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, TensorFlow: Large-scale machine learning on heterogeneous systems. URL:tensorflow.org. – volume: 8 start-page: 1 year: 2020 end-page: 13 ident: b0205 article-title: Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites publication-title: Frontiers in Cell and Developmental Biology – volume: 8 start-page: 13338 year: 2017 end-page: 13343 ident: b0280 article-title: Pse-Analysis: A python package for DNA/RNA and protein/ peptide sequence analysis based on pseudo components and kernel methods publication-title: Oncotarget – volume: 19 start-page: 1 year: 2018 end-page: 13 ident: b0050 article-title: ECPred: A tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature publication-title: BMC Bioinformatics – start-page: 2 year: 2019 end-page: 3 ident: b0285 article-title: PyFeat: a Python-based effective feature generation tool for DNA, RNA and protein sequences publication-title: Bioinformatics – reference: A.W. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. Qin, A. Žídek, A.W. Nelson, A. Bridgland, H. Penedones, S. Petersen, K. Simonyan, S. Crossan, P. Kohli, D.T. Jones, D. Silver, K. Kavukcuoglu, D. Hassabis, Improved protein structure prediction using potentials from deep learning, Nature (7792) 706–710. doi:10.1038/s41586-019-1923-7. – volume: 29 start-page: 960 year: 2013 end-page: 962 ident: b0240 article-title: Propy: A tool to generate various modes of Chou’s PseAAC publication-title: Bioinformatics – reference: P. Bhadra, J. Yan, J. Li, S. Fong, S.W. Siu, AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest, Scientific Reports (1) 1–10. doi:10.1038/s41598-018-19752-w. – start-page: 1 year: 2020 end-page: 9 ident: b0130 article-title: Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function – reference: T.T.D. Nguyen, N.Q.K. Le, Q.T. Ho, D.V. Phan, Y.Y. Ou, Using word embedding technique to efficiently represent protein sequences for identifying substrate specificities of transporters, Analytical Biochemistry (January) 73–81. doi:10.1016/j.ab.2019.04.011. – volume: 00 start-page: 1 year: 2019 end-page: 11 ident: b0220 article-title: iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA publication-title: RNA and protein sequence data, Briefings in Bioinformatics – volume: 2016 start-page: 1 year: 2016 end-page: 10 ident: b0275 article-title: ASAP: A machine learning framework for local protein properties publication-title: Database – reference: M. Sandberg, et al., New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids, J. Med. Chem. 41 (14) (1998) 2481–2491. doi:10.1021/jm9700575. – reference: B. Liu, BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches, Briefings in Bioinformatics (January) 1–15. doi:10.1093/bib/bbx165. – reference: Y. Cai, J. Wang, L. Deng, SDN2GO: An integrated deep learning model for protein function prediction, Frontiers in Bioengineering and Biotechnology 8. doi:10.3389/fbioe.2020.00391. – year: 2017 ident: b0100 publication-title: Deep Learning with Python – reference: M.L. Bileschi, D. Belanger, D. Bryant, T. Sanderson, B. Carter, D. Sculley, M.A. DePristo, L.J. Colwell, Using Deep Learning to Annotate the Protein Universe, bioRxiv (2019) 1–29 doi:10.1101/626507. – volume: 22 start-page: 474 year: 2021 end-page: 484 ident: b0265 article-title: BioMedR: An R/CRAN package for integrated data analysis pipeline in biomedical study publication-title: Brief. Bioinform. – reference: L. McInnes, J. Healy, J. Melville, UMAP: Uniform manifold approximation and projection for dimension reduction, arXiv arXiv:1802.03426. – volume: 31 start-page: 3429 year: 2015 end-page: 3436 ident: b0255 article-title: ProFET: Feature engineering captures high-level protein functions publication-title: Bioinformatics – volume: 31 start-page: 1857 year: 2015 end-page: 1859 ident: b0260 article-title: Protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences publication-title: Bioinformatics – reference: S.F. Altschul, T.L. Madden, A.A. Schäffer, J. Zhang, Z. Zhang, W. Miller, D.J. Lipman, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs., Tech. Rep. 17 (1997). doi:10.1046/j.1471-8286.2003.00484.x. – volume: 88 start-page: 397 year: 2020 end-page: 413 ident: b0080 article-title: Machine learning techniques for protein function prediction, Proteins: Structure publication-title: Function and Bioinformatics – year: 2017 ident: b0105 article-title: Introduction to Machine Learning with Python: A guide for data scientists – volume: 33 start-page: 2753 year: 2017 end-page: 2755 ident: b0150 article-title: modlAMP: Python for antimicrobial peptides publication-title: Bioinformatics (Oxford, England) – reference: L. Nanni, A. Lumini, A new encoding technique for peptide classification, Expert Systems with Applications (4) 3185–3191. doi:10.1016/j.eswa.2010.09.005. – volume: 25 start-page: 1422 year: 2009 end-page: 1423 ident: b0145 article-title: Biopython: Freely available Python tools for computational molecular biology and bioinformatics publication-title: Bioinformatics – volume: 21 start-page: 1 year: 2020 end-page: 14 ident: b0070 article-title: ACEP: Improving antimicrobial peptides recognition through automatic feature fusion and amino acid embedding publication-title: BMC Genomics – reference: J. Dong, Z.J. Yao, L. Zhang, F. Luo, Q. Lin, A.P. Lu, A.F. Chen, D.S. Cao, PyBioMed: a python library for various molecular representations of chemicals, proteins and DNAs and their interactions, Journal of Cheminformatics doi:10.1186/s13321-018-0270-2. – volume: 47 start-page: D542 year: 2019 end-page: D549 ident: b0185 article-title: BRENDA in 2019: A European ELIXIR core data resource publication-title: Nucleic Acids Research – volume: 00 start-page: 1 year: 2019 end-page: 25 ident: b0005 article-title: Deep learning for mining protein data publication-title: Briefings in Bioinformatics – volume: 116 start-page: 13996 year: 2019 end-page: 14001 ident: b0045 article-title: Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers publication-title: Proceedings of the National Academy of Sciences of the United States of America – reference: E.Y. Lee, B.M. Fulan, G.C.L. Wong, A.L. Ferguson, Mapping membrane activity in undiscovered peptide sequence space using machine learning, no. 48. doi:10.1073/pnas.1609893113. – volume: 59 start-page: 1 year: 2015 end-page: 41 ident: b0170 article-title: Enzymes: principles and biotechnological applications publication-title: Essays in Biochemistry – volume: 429 start-page: 416 year: 2017 end-page: 425 ident: b0250 article-title: PROFEAT Update: A Protein Features Web Server with Added Facility to Compute Network Descriptors for Studying Omics-Derived Networks publication-title: Journal of Molecular Biology – start-page: 1 year: 2021 end-page: 19 ident: b0295 article-title: iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization publication-title: Nucleic Acids Res. – volume: 10 start-page: 1 year: 2015 end-page: 15 ident: b0140 article-title: Continuous distributed representation of biological sequences for deep proteomics and genomics publication-title: PLoS ONE – volume: 19 start-page: 1978 year: 2003 end-page: 1984 ident: b0215 article-title: Application of support vector machines for T-cell epitopes prediction publication-title: Bioinformatics – volume: 47 year: 2019 ident: b0290 article-title: BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches publication-title: Nucleic acids research – volume: 10 start-page: 1 year: 2019 end-page: 10 ident: b0180 article-title: mlDEEPre: Multi-functional enzyme function prediction with hierarchical multi-label deep learning publication-title: Frontiers in Genetics – reference: B. Manavalan, S. Basith, T. Hwan Shin, S. Choi, M. Ok Kim, G. Lee, MLACP: machine-learning-based prediction of anticancer peptides, Oncotarget (44) 77121–77136. doi:10.18632/oncotarget.20365. – volume: 15 start-page: 1 year: 2014 end-page: 10 ident: b0270 article-title: SPiCE: A web-based tool for sequence-based protein classification and exploration publication-title: BMC Bioinformatics – reference: I. Inza, B. Calvo, R. Armañanzas, E. Bengoetxea, P. Larrañaga, J.A. Lozano, Machine learning: an indispensable tool in bioinformatics., Methods in molecular biology (Clifton, N.J.) 593 (2010) 25–48. doi:10.1007/978-1-60327-194-3_2. – ident: 10.1016/j.neucom.2021.07.102_b0245 doi: 10.1186/s13321-018-0270-2 – ident: 10.1016/j.neucom.2021.07.102_b0175 doi: 10.1101/365965 – ident: 10.1016/j.neucom.2021.07.102_b0075 doi: 10.1038/s41586-019-1923-7 – year: 2017 ident: 10.1016/j.neucom.2021.07.102_b0105 – ident: 10.1016/j.neucom.2021.07.102_b0225 doi: 10.1038/s41467-020-17155-y – volume: 34 start-page: 3035 issue: 17 year: 2018 ident: 10.1016/j.neucom.2021.07.102_b0235 article-title: Pysster: Classification of biological sequences by learning sequence and structure motifs with convolutional neural networks publication-title: Bioinformatics doi: 10.1093/bioinformatics/bty222 – volume: 25 start-page: 1422 issue: 11 year: 2009 ident: 10.1016/j.neucom.2021.07.102_b0145 article-title: Biopython: Freely available Python tools for computational molecular biology and bioinformatics publication-title: Bioinformatics doi: 10.1093/bioinformatics/btp163 – volume: 2016 start-page: 1 year: 2016 ident: 10.1016/j.neucom.2021.07.102_b0275 article-title: ASAP: A machine learning framework for local protein properties publication-title: Database doi: 10.1093/database/baw133 – volume: 10 start-page: 1 issue: 11 year: 2015 ident: 10.1016/j.neucom.2021.07.102_b0140 article-title: Continuous distributed representation of biological sequences for deep proteomics and genomics publication-title: PLoS ONE doi: 10.1371/journal.pone.0141287 – ident: 10.1016/j.neucom.2021.07.102_b0015 doi: 10.1101/626507 – ident: 10.1016/j.neucom.2021.07.102_b0160 doi: 10.1021/ci400127q – ident: 10.1016/j.neucom.2021.07.102_b0030 doi: 10.1007/978-1-60327-194-3_2 – ident: 10.1016/j.neucom.2021.07.102_b0090 – volume: 10 start-page: 1 issue: JAN year: 2019 ident: 10.1016/j.neucom.2021.07.102_b0180 article-title: mlDEEPre: Multi-functional enzyme function prediction with hierarchical multi-label deep learning publication-title: Frontiers in Genetics – ident: 10.1016/j.neucom.2021.07.102_b0010 doi: 10.1093/nar/25.17.3389 – volume: 1 year: 2014 ident: 10.1016/j.neucom.2021.07.102_b0190 article-title: An empirical study of different approaches for protein classification publication-title: Sci. World J. – volume: 00 start-page: 1 issue: January year: 2019 ident: 10.1016/j.neucom.2021.07.102_b0220 article-title: iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA publication-title: RNA and protein sequence data, Briefings in Bioinformatics – volume: 47 issue: 20 year: 2019 ident: 10.1016/j.neucom.2021.07.102_b0290 article-title: BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches publication-title: Nucleic acids research doi: 10.1093/nar/gkz740 – year: 2017 ident: 10.1016/j.neucom.2021.07.102_b0100 publication-title: Deep Learning with Python – start-page: 2 year: 2019 ident: 10.1016/j.neucom.2021.07.102_b0285 article-title: PyFeat: a Python-based effective feature generation tool for DNA, RNA and protein sequences publication-title: Bioinformatics – year: 2015 ident: 10.1016/j.neucom.2021.07.102_b0025 – volume: 15 start-page: 1 issue: 1 year: 2014 ident: 10.1016/j.neucom.2021.07.102_b0270 article-title: SPiCE: A web-based tool for sequence-based protein classification and exploration publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-15-93 – volume: 34 start-page: 760 issue: 5 year: 2018 ident: 10.1016/j.neucom.2021.07.102_b0040 article-title: DEEPre: Sequence-based enzyme EC number prediction by deep learning publication-title: Bioinformatics doi: 10.1093/bioinformatics/btx680 – volume: 21 start-page: 1 issue: 1 year: 2020 ident: 10.1016/j.neucom.2021.07.102_b0070 article-title: ACEP: Improving antimicrobial peptides recognition through automatic feature fusion and amino acid embedding publication-title: BMC Genomics doi: 10.1186/s12864-020-06978-0 – volume: 429 start-page: 416 issue: 3 year: 2017 ident: 10.1016/j.neucom.2021.07.102_b0250 article-title: PROFEAT Update: A Protein Features Web Server with Added Facility to Compute Network Descriptors for Studying Omics-Derived Networks publication-title: Journal of Molecular Biology doi: 10.1016/j.jmb.2016.10.013 – start-page: 1 year: 2021 ident: 10.1016/j.neucom.2021.07.102_b0295 article-title: iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization publication-title: Nucleic Acids Res. – volume: 12 start-page: 1 issue: 1 year: 2019 ident: 10.1016/j.neucom.2021.07.102_b0110 article-title: Encodings and models for antimicrobial peptide classification for multi-resistant pathogens publication-title: BioData Mining doi: 10.1186/s13040-019-0196-x – volume: 22 start-page: 474 issue: 1 year: 2021 ident: 10.1016/j.neucom.2021.07.102_b0265 article-title: BioMedR: An R/CRAN package for integrated data analysis pipeline in biomedical study publication-title: Brief. Bioinform. doi: 10.1093/bib/bbz150 – ident: 10.1016/j.neucom.2021.07.102_b0210 doi: 10.1016/j.ab.2019.04.011 – volume: 33 start-page: 2753 issue: 17 year: 2017 ident: 10.1016/j.neucom.2021.07.102_b0150 article-title: modlAMP: Python for antimicrobial peptides publication-title: Bioinformatics (Oxford, England) – ident: 10.1016/j.neucom.2021.07.102_b0095 – ident: 10.1016/j.neucom.2021.07.102_b0115 doi: 10.1093/bioinformatics/btx531 – volume: 19 start-page: 1 issue: 1 year: 2018 ident: 10.1016/j.neucom.2021.07.102_b0050 article-title: ECPred: A tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature publication-title: BMC Bioinformatics doi: 10.1186/s12859-018-2368-y – ident: 10.1016/j.neucom.2021.07.102_b0055 doi: 10.3389/fbioe.2020.00391 – volume: 8 start-page: 13338 issue: 8 year: 2017 ident: 10.1016/j.neucom.2021.07.102_b0280 article-title: Pse-Analysis: A python package for DNA/RNA and protein/ peptide sequence analysis based on pseudo components and kernel methods publication-title: Oncotarget doi: 10.18632/oncotarget.14524 – ident: 10.1016/j.neucom.2021.07.102_b0120 doi: 10.1016/j.eswa.2010.09.005 – volume: 34 start-page: 2740 issue: 16 year: 2018 ident: 10.1016/j.neucom.2021.07.102_b0065 article-title: Deep learning improves antimicrobial peptide recognition publication-title: Bioinformatics doi: 10.1093/bioinformatics/bty179 – volume: 00 start-page: 1 issue: August year: 2019 ident: 10.1016/j.neucom.2021.07.102_b0005 article-title: Deep learning for mining protein data publication-title: Briefings in Bioinformatics – volume: 29 start-page: 960 issue: 7 year: 2013 ident: 10.1016/j.neucom.2021.07.102_b0240 article-title: Propy: A tool to generate various modes of Chou’s PseAAC publication-title: Bioinformatics doi: 10.1093/bioinformatics/btt072 – ident: 10.1016/j.neucom.2021.07.102_b0300 doi: 10.1093/bib/bbx165 – volume: 31 start-page: 3429 issue: 21 year: 2015 ident: 10.1016/j.neucom.2021.07.102_b0255 article-title: ProFET: Feature engineering captures high-level protein functions publication-title: Bioinformatics doi: 10.1093/bioinformatics/btv345 – ident: 10.1016/j.neucom.2021.07.102_b0060 doi: 10.1038/s41598-018-19752-w – volume: 3 start-page: 1 issue: 2 year: 2021 ident: 10.1016/j.neucom.2021.07.102_b0195 article-title: A large-scale comparative study on peptide encodings for biomedical classification publication-title: NAR Genomics Bioinforma. doi: 10.1093/nargab/lqab039 – volume: 31 start-page: 1857 issue: 11 year: 2015 ident: 10.1016/j.neucom.2021.07.102_b0260 article-title: Protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences publication-title: Bioinformatics doi: 10.1093/bioinformatics/btv042 – volume: 59 start-page: 1 year: 2015 ident: 10.1016/j.neucom.2021.07.102_b0170 article-title: Enzymes: principles and biotechnological applications publication-title: Essays in Biochemistry doi: 10.1042/bse0590001 – ident: 10.1016/j.neucom.2021.07.102_b0165 – volume: 47 start-page: D542 issue: D1 year: 2019 ident: 10.1016/j.neucom.2021.07.102_b0185 article-title: BRENDA in 2019: A European ELIXIR core data resource publication-title: Nucleic Acids Research doi: 10.1093/nar/gky1048 – volume: 8 start-page: 1 issue: September year: 2020 ident: 10.1016/j.neucom.2021.07.102_b0205 article-title: Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites publication-title: Frontiers in Cell and Developmental Biology – ident: 10.1016/j.neucom.2021.07.102_b0135 doi: 10.1101/2020.09.04.282814 – volume: 116 start-page: 13996 issue: 28 year: 2019 ident: 10.1016/j.neucom.2021.07.102_b0045 article-title: Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers publication-title: Proceedings of the National Academy of Sciences of the United States of America doi: 10.1073/pnas.1821905116 – start-page: 1 year: 2020 ident: 10.1016/j.neucom.2021.07.102_b0130 – volume: 19 start-page: 1978 issue: 15 year: 2003 ident: 10.1016/j.neucom.2021.07.102_b0215 article-title: Application of support vector machines for T-cell epitopes prediction publication-title: Bioinformatics doi: 10.1093/bioinformatics/btg255 – volume: 557 start-page: 503 issue: 7706 year: 2018 ident: 10.1016/j.neucom.2021.07.102_b0020 article-title: Mutant phenotypes for thousands of bacterial genes of unknown function publication-title: Nature doi: 10.1038/s41586-018-0124-0 – ident: 10.1016/j.neucom.2021.07.102_b0230 doi: 10.1016/j.patter.2020.100178 – volume: 88 start-page: 397 issue: 3 year: 2020 ident: 10.1016/j.neucom.2021.07.102_b0080 article-title: Machine learning techniques for protein function prediction, Proteins: Structure publication-title: Function and Bioinformatics doi: 10.1002/prot.25832 – volume: 12 start-page: 2825 year: 2011 ident: 10.1016/j.neucom.2021.07.102_b0085 article-title: Scikit-learn: Machine learning in Python publication-title: Journal of Machine Learning Research – ident: 10.1016/j.neucom.2021.07.102_b0125 doi: 10.1021/jm9700575 – ident: 10.1016/j.neucom.2021.07.102_b0155 doi: 10.1101/599126 – ident: 10.1016/j.neucom.2021.07.102_b0035 doi: 10.1073/pnas.1609893113 – ident: 10.1016/j.neucom.2021.07.102_b0200 doi: 10.18632/oncotarget.20365 |
| SSID | ssj0017129 |
| Score | 2.4384506 |
| Snippet | The field of protein data mining has been growing rapidly in the last years. To characterize proteins and determine their function from their amino acid... |
| SourceID | crossref elsevier |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 172 |
| SubjectTerms | Antimicrobial peptide Deep learning Enzyme Machine learning Protein/peptide classification Python Package |
| Title | ProPythia: A Python package for protein classification based on machine and deep learning |
| URI | https://dx.doi.org/10.1016/j.neucom.2021.07.102 |
| Volume | 484 |
| WOSCitedRecordID | wos000772806500002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1872-8286 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017129 issn: 0925-2312 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9NAEF5FKQcuvBGlgPbA1Wh3bcfe3iJUBKhUERQpnCzvwyV92FHqVIVfz4z30ZaiQg9crNXKXieZLzuzM9_MEPK6lKouaqGTiUx1kjVZntSN0kkKqkCklme1GhKFd4u9vXI-l7PR6GfIhTk7Ltq2PD-Xy_8qapgDYWPq7C3EHReFCRiD0OEKYofrPwl-tupmP_rvi9olneO4w_qp-gjpOcgqHGozIP8cLWekCjkQoEIzGDw4GQiWLq5grF2G1hIHly3ZoaqHHnpCeG_D9ASLLhhEWPQufEGm9mLlnLdtjalBfVQEu9361IWbAKNx9nOnXRDq0-Jg7Qn93i0BJ9pIAgz-RZEnYDxe2Wpd3dKwWXLXtMfrXe6aEF3b0p134fBNa9fI74F3cSy3ypm4UGEhbP-bZot8w0BlO6zcKhWuUrECZkF9b4gil-WYbEw_7Mw_xhhUwYWr1Oi_SEi8HNiB1z_Nnw2bS8bK_gNyz58y6NSh4yEZ2fYRuR86eFC_oT8m3yJYtumUOqhQDxUKUKEeKvQqVOgAFQoDDxUKUKEIFRqg8oR8fbez__Z94pttJDpNeZ8U3AomjTAN17nVpdJ2olipJGs0t7U1pdWZzUWqlLHCMp1PapnVcL5mjRUmTZ-Scdu19hmhSk0MaxiY6iXcIQwshcd4njcStAOTmyQNv1SlfSV6bIhyXN0kp02SxKeWrhLLX-4vghAqb006K7ECZN345PNbvmmL3L34B7wg4361ti_JHX3WL05XrzysfgHnV5nQ |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=ProPythia%3A+A+Python+package+for+protein+classification+based+on+machine+and+deep+learning&rft.jtitle=Neurocomputing+%28Amsterdam%29&rft.au=Sequeira%2C+Ana+Marta&rft.au=Lousa%2C+Diana&rft.au=Rocha%2C+Miguel&rft.date=2022-05-01&rft.issn=0925-2312&rft.volume=484&rft.spage=172&rft.epage=182&rft_id=info:doi/10.1016%2Fj.neucom.2021.07.102&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_neucom_2021_07_102 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0925-2312&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0925-2312&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0925-2312&client=summon |