ProPythia: A Python package for protein classification based on machine and deep learning

The field of protein data mining has been growing rapidly in the last years. To characterize proteins and determine their function from their amino acid sequences are challenging and long-standing problems, where Bioinformatics and Machine Learning have an emergent role. A myriad of machine and deep...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Neurocomputing (Amsterdam) Ročník 484; s. 172 - 182
Hlavní autoři: Sequeira, Ana Marta, Lousa, Diana, Rocha, Miguel
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.05.2022
Témata:
ISSN:0925-2312, 1872-8286
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract The field of protein data mining has been growing rapidly in the last years. To characterize proteins and determine their function from their amino acid sequences are challenging and long-standing problems, where Bioinformatics and Machine Learning have an emergent role. A myriad of machine and deep learning algorithms have been applied in these tasks with exciting results. However, tools and platforms to calculate protein features and perform both Machine Learning (ML) and Deep Learning (DL) pipelines, taking as inputs protein sequences, are still lacking and have their limitations in terms of performance, user-friendliness and restricted domains of application. Here, to address these limitations, we propose ProPythia, a generic and modular Python package that allows to easily deploy ML and DL approaches for a plethora of problems in protein sequence analysis and classification. It facilitates the implementation, comparison and validation of the major tasks in ML or DL pipelines including modules to read and alter sequences, calculate protein features, preprocess datasets, execute feature selection and dimensionality reduction, perform clustering and manifold analysis, as well as to train and optimize ML/DL models and use them to make predictions. ProPythia has an adaptable modular architecture being a versatile and easy-to-use tool, which will be useful to transform protein data in valuable knowledge even for people not familiarized with ML code. This platform was tested in several applications comparing with results from literature. Here, we illustrate its applicability in two cases studies: the prediction of antimicrobial peptides and the prediction of enzymes Enzyme commission (EC) numbers. Furthermore, we assess the performance of the different descriptors on four different protein classification challenges. Its source code and documentation, including an user guide and case studies are freely available at https://github.com/BioSystemsUM/propythia.
AbstractList The field of protein data mining has been growing rapidly in the last years. To characterize proteins and determine their function from their amino acid sequences are challenging and long-standing problems, where Bioinformatics and Machine Learning have an emergent role. A myriad of machine and deep learning algorithms have been applied in these tasks with exciting results. However, tools and platforms to calculate protein features and perform both Machine Learning (ML) and Deep Learning (DL) pipelines, taking as inputs protein sequences, are still lacking and have their limitations in terms of performance, user-friendliness and restricted domains of application. Here, to address these limitations, we propose ProPythia, a generic and modular Python package that allows to easily deploy ML and DL approaches for a plethora of problems in protein sequence analysis and classification. It facilitates the implementation, comparison and validation of the major tasks in ML or DL pipelines including modules to read and alter sequences, calculate protein features, preprocess datasets, execute feature selection and dimensionality reduction, perform clustering and manifold analysis, as well as to train and optimize ML/DL models and use them to make predictions. ProPythia has an adaptable modular architecture being a versatile and easy-to-use tool, which will be useful to transform protein data in valuable knowledge even for people not familiarized with ML code. This platform was tested in several applications comparing with results from literature. Here, we illustrate its applicability in two cases studies: the prediction of antimicrobial peptides and the prediction of enzymes Enzyme commission (EC) numbers. Furthermore, we assess the performance of the different descriptors on four different protein classification challenges. Its source code and documentation, including an user guide and case studies are freely available at https://github.com/BioSystemsUM/propythia.
Author Lousa, Diana
Rocha, Miguel
Sequeira, Ana Marta
Author_xml – sequence: 1
  givenname: Ana Marta
  surname: Sequeira
  fullname: Sequeira, Ana Marta
  email: id9417@alunos.uminho.pt
  organization: CEB-Centre Biological Engineering, University of Minho, 4710-057 Braga, Portugal
– sequence: 2
  givenname: Diana
  surname: Lousa
  fullname: Lousa, Diana
  email: dlousa@itqb.unl.pt
  organization: Protein Modelling Laboratory, Instituto de Tecnologia Química e Biológica António Xavier (ITQB NOVA), Universidade Nova de Lisboa, 2780-157 Oeiras, Portugal
– sequence: 3
  givenname: Miguel
  surname: Rocha
  fullname: Rocha, Miguel
  email: mrocha@di.uminho.pt
  organization: CEB-Centre Biological Engineering, University of Minho, 4710-057 Braga, Portugal
BookMark eNqFkM1OwzAQhC1UJNrCG3DwCyTYzn8PSFXFn1SJHuDAyXLWm9YltSM7IPXtSSgnDnDa0Wi_0e7MyMQ6i4RccxZzxvObfWzxA9whFkzwmBWDK87IlJeFiEpR5hMyZZXIIpFwcUFmIewZ4wUX1ZS8bbzbHPudUQu6pKNylnYK3tUWaeM87bzr0VgKrQrBNAZUb4aVWgXUdBAHBTtjkSqrqUbsaIvKW2O3l-S8UW3Aq585J6_3dy-rx2j9_PC0Wq4jSBLeRwVHwSotdMMhQyhrwLxmZV2xBjgq1CVCiplI6lqjQAZZrqpUpVXGGhQ6SeZkccoF70Lw2Egw_feRvVemlZzJsSS5l6eS5FiSZMXgigFOf8GdNwflj_9htycMh8c-DXoZwKAF1MYj9FI783fAF9Zhh8k
CitedBy_id crossref_primary_10_1016_j_engappai_2024_108195
crossref_primary_10_1002_prot_26822
crossref_primary_10_1016_j_csbj_2025_02_011
crossref_primary_10_1093_bib_bbae225
crossref_primary_10_1016_j_chroma_2023_464304
crossref_primary_10_1038_s41598_024_62419_y
crossref_primary_10_1007_s11831_025_10377_7
crossref_primary_10_1109_JBHI_2024_3425716
crossref_primary_10_2174_1574893618666230818121046
crossref_primary_10_1093_nargab_lqac103
crossref_primary_10_1093_femsre_fuad003
crossref_primary_10_1186_s12859_025_06079_3
Cites_doi 10.1186/s13321-018-0270-2
10.1101/365965
10.1038/s41586-019-1923-7
10.1038/s41467-020-17155-y
10.1093/bioinformatics/bty222
10.1093/bioinformatics/btp163
10.1093/database/baw133
10.1371/journal.pone.0141287
10.1101/626507
10.1021/ci400127q
10.1007/978-1-60327-194-3_2
10.1093/nar/25.17.3389
10.1093/nar/gkz740
10.1186/1471-2105-15-93
10.1093/bioinformatics/btx680
10.1186/s12864-020-06978-0
10.1016/j.jmb.2016.10.013
10.1186/s13040-019-0196-x
10.1093/bib/bbz150
10.1016/j.ab.2019.04.011
10.1093/bioinformatics/btx531
10.1186/s12859-018-2368-y
10.3389/fbioe.2020.00391
10.18632/oncotarget.14524
10.1016/j.eswa.2010.09.005
10.1093/bioinformatics/bty179
10.1093/bioinformatics/btt072
10.1093/bib/bbx165
10.1093/bioinformatics/btv345
10.1038/s41598-018-19752-w
10.1093/nargab/lqab039
10.1093/bioinformatics/btv042
10.1042/bse0590001
10.1093/nar/gky1048
10.1101/2020.09.04.282814
10.1073/pnas.1821905116
10.1093/bioinformatics/btg255
10.1038/s41586-018-0124-0
10.1016/j.patter.2020.100178
10.1002/prot.25832
10.1021/jm9700575
10.1101/599126
10.1073/pnas.1609893113
10.18632/oncotarget.20365
ContentType Journal Article
Copyright 2021 Elsevier B.V.
Copyright_xml – notice: 2021 Elsevier B.V.
DBID AAYXX
CITATION
DOI 10.1016/j.neucom.2021.07.102
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1872-8286
EndPage 182
ExternalDocumentID 10_1016_j_neucom_2021_07_102
S0925231221016568
GroupedDBID ---
--K
--M
.DC
.~1
0R~
123
1B1
1~.
1~5
4.4
457
4G.
53G
5VS
7-5
71M
8P~
9JM
9JN
AABNK
AACTN
AADPK
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAXLA
AAXUO
AAYFN
ABBOA
ABCQJ
ABFNM
ABJNI
ABMAC
ABYKQ
ACDAQ
ACGFS
ACRLP
ACZNC
ADBBV
ADEZE
AEBSH
AEKER
AENEX
AFKWA
AFTJW
AFXIZ
AGHFR
AGUBO
AGWIK
AGYEJ
AHHHB
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
AXJTR
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EO8
EO9
EP2
EP3
F5P
FDB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
IHE
J1W
KOM
LG9
M41
MO0
MOBAO
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
ROL
RPZ
SDF
SDG
SDP
SES
SPC
SPCBC
SSN
SSV
SSZ
T5K
ZMT
~G-
29N
9DU
AAQXK
AATTM
AAXKI
AAYWO
AAYXX
ABWVN
ABXDB
ACLOT
ACNNM
ACRPL
ACVFH
ADCNI
ADJOM
ADMUD
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
ASPBG
AVWKF
AZFZN
CITATION
EFKBS
EJD
FEDTE
FGOYB
HLZ
HVGLF
HZ~
R2-
SBC
SEW
WUQ
XPP
~HD
ID FETCH-LOGICAL-c331t-71e209d2df1c5ec8bce6b08b90fc1eaed8ec4e523bbde2e0c56a94a4950fe2d33
ISICitedReferencesCount 17
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000772806500002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0925-2312
IngestDate Sat Nov 29 07:13:26 EST 2025
Tue Nov 18 22:11:51 EST 2025
Fri Feb 23 02:41:14 EST 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords Deep learning
Python Package
Enzyme
Protein/peptide classification
Machine learning
Antimicrobial peptide
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c331t-71e209d2df1c5ec8bce6b08b90fc1eaed8ec4e523bbde2e0c56a94a4950fe2d33
OpenAccessLink http://hdl.handle.net/1822/76505
PageCount 11
ParticipantIDs crossref_citationtrail_10_1016_j_neucom_2021_07_102
crossref_primary_10_1016_j_neucom_2021_07_102
elsevier_sciencedirect_doi_10_1016_j_neucom_2021_07_102
PublicationCentury 2000
PublicationDate 2022-05-01
2022-05-00
PublicationDateYYYYMMDD 2022-05-01
PublicationDate_xml – month: 05
  year: 2022
  text: 2022-05-01
  day: 01
PublicationDecade 2020
PublicationTitle Neurocomputing (Amsterdam)
PublicationYear 2022
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Robinson (b0170) 2015; 59
Ryu, Kim, Lee (b0045) 2019; 116
Zhao, Pinilla, Valmori, Martin, Simon (b0215) 2003; 19
Liu, Gao, Zhang (b0290) 2019; 47
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, Y.J. Michael Isard, Rafal Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, M. Schuster, R. Monga, S. Moore, D. Murray, J. Chris Olah, O. Shlens, B. Steiner, I. Sutskever, P.T. Kunal Talwar, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, TensorFlow: Large-scale machine learning on heterogeneous systems. URL:tensorflow.org.
B. Liu, BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches, Briefings in Bioinformatics (January) 1–15. doi:10.1093/bib/bbx165.
Asgari, Mofrad (b0140) 2015; 10
Muhammod, Ahmed, Md Farid, Shatabda, Sharma, Dehzangi (b0285) 2019
Budach, Marsico (b0235) 2018; 34
J. Dong, Z.J. Yao, L. Zhang, F. Luo, Q. Lin, A.P. Lu, A.F. Chen, D.S. Cao, PyBioMed: a python library for various molecular representations of chemicals, proteins and DNAs and their interactions, Journal of Cheminformatics doi:10.1186/s13321-018-0270-2.
Xiao, Cao, Zhu, Xu (b0260) 2015; 31
Chen, Zhao, Li, Li, Xiang, Chen, Akutsu, Daly, Webb, Zhao, Kurgan, Song (b0295) 2021
Dalkiran, Rifaioglu, Martin, Cetin-Atalay, Atalay, Dogan (b0050) 2018; 19
A.S. Schwartz, G.J. Hannum, Z.R. Dwiel, M.E. Smoot, A.R. Grant, J.M. Knight, S.A. Becker, J.R. Eads, M.C. Lafave, H. Eavani, Y. Liu, A.K. Bansal, T.H. Richardson, Deep Semantic Protein Representation for Annotation, Discovery, and Engineering, bioRxiv doi:10.1101/365965.
S.F. Altschul, T.L. Madden, A.A. Schäffer, J. Zhang, Z. Zhang, W. Miller, D.J. Lipman, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs., Tech. Rep. 17 (1997). doi:10.1046/j.1471-8286.2003.00484.x.
A. Pande, S. Patiyal, A. Lathwal, C. Arora, D. Kaur, A. Dhall, G. Mishra, H. Kaur, N. Sharma, S. Jain, S.S. Usmani, P. Agrawal, R. Kumar, V. Kumar, G.P. Raghava, Computing wide range of protein/peptide features from their sequence and structure, bioRxiv 599126 doi:10.1101/599126.
Zou, Tian, Gao, Li (b0180) 2019; 10
Veltri, Kamath, Shehu (b0065) 2018; 34
M. Sandberg, et al., New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids, J. Med. Chem. 41 (14) (1998) 2481–2491. doi:10.1021/jm9700575.
W. Kopp, R. Monti, A. Tamburrini, U. Ohler, A. Akalin, Deep learning for genomics using Janggu, Nat. Commun. (1) 1–7. doi:10.1038/s41467-020-17155-y.
Liu, Wu, Zhang, Wang, Chou (b0280) 2017; 8
F. Chollet, E. all., Keras (2015). URL:https://keras.io.
Spänig, Heider (b0110) 2019; 12
van den Berg, Reinders, Roubos, de Ridder (b0270) 2014; 15
D.S. Cao, Y.Z. Liang, J. Yan, G.S. Tan, Q.S. Xu, S. Liu, PyDPI: Freely available python package for chemoinformatics, bioinformatics, and chemogenomics studies, Journal of Chemical Information and Modeling doi:10.1021/ci400127q.
Shi, Chen, Huang, Wang, Xue (b0005) 2019; 00
P. Bhadra, J. Yan, J. Li, S. Fong, S.W. Siu, AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest, Scientific Reports (1) 1–10. doi:10.1038/s41598-018-19752-w.
Müller, Gabernet, Hiss, Schneider (b0150) 2017; 33
I. Inza, B. Calvo, R. Armañanzas, E. Bengoetxea, P. Larrañaga, J.A. Lozano, Machine learning: an indispensable tool in bioinformatics., Methods in molecular biology (Clifton, N.J.) 593 (2010) 25–48. doi:10.1007/978-1-60327-194-3_2.
L. Nanni, A. Lumini, A new encoding technique for peptide classification, Expert Systems with Applications (4) 3185–3191. doi:10.1016/j.eswa.2010.09.005.
Villegas-Morcillo, Makrodimitris, van Ham, Gomez, Sanchez, Reinders (b0130) 2020
Cao, Xu, Liang (b0240) 2013; 29
Muller, Guido (b0105) 2017
L. McInnes, J. Healy, J. Melville, UMAP: Uniform manifold approximation and projection for dimension reduction, arXiv arXiv:1802.03426.
Price, Wetmore, Waters, Callaghan, Ray, Liu, Kuehl, Melnyk, Lamson, Suh, Carlson, Esquivel, Sadeeshkumar, Chakraborty, Zane, Rubin, Wall, Visel, Bristow, Blow, Arkin, Deutschbauer (b0020) 2018; 557
V.I. Jurtz, A.R. Johansen, M. Nielsen, J.J. Almagro Armenteros, H. Nielsen, C.K. Sønderby, O. Winther, S.K. Sønderby, An introduction to deep learning on biological sequence data: Examples and solutions, Bioinformatics 33 (22) (2017) 3685–3690. doi:10.1093/bioinformatics/btx531.
Li, Wang, Umarov, Xie, Fan, Li, Gao (b0040) 2018; 34
Chen, Zhao, Li, Marquez-Lago, Leier, Revote, Zhu, Powell, Akutsu, Webb, Chou, Smith, Daly, Li, Song (b0220) 2019; 00
Zhang, Tao, Zeng, Qin, Chen, Zhu, Yang, Li, Chen, Chen (b0250) 2017; 429
Awad, Khanna (b0025) 2015
Spänig, Mohsen, Hattab, Hauschild, Heider (b0195) 2021; 3
A.W. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. Qin, A. Žídek, A.W. Nelson, A. Bridgland, H. Penedones, S. Petersen, K. Simonyan, S. Crossan, P. Kohli, D.T. Jones, D. Silver, K. Kavukcuoglu, D. Hassabis, Improved protein structure prediction using potentials from deep learning, Nature (7792) 706–710. doi:10.1038/s41586-019-1923-7.
M. Littmann, M. Heinzinger, C. Dallago, T. Olenyi, &. B. Rost, Embeddings from deep learning transfer GO annotations beyond homology, bioRxiv 2020.09.04.282814 doi:10.1038/s41598-020-80786-0.
Y. Cai, J. Wang, L. Deng, SDN2GO: An integrated deep learning model for protein function prediction, Frontiers in Bioengineering and Biotechnology 8. doi:10.3389/fbioe.2020.00391.
Pedregosa, Varoquaux, Gramfort, Michel, Thirion, Grisel, Blondel, Prettenhofer, Weiss, Dubourg, Vanderplas, Passos, Cournapeau, Brucher, Perrot, Duchesnay (b0085) 2011; 12
Chollet (b0100) 2017
Dong, Zhu, Yun, Lu, Hou, Cao (b0265) 2021; 22
Nanni, Lumini, Brahnam (b0190) 2014; 1
Brandes, Ofer, Linial (b0275) 2016; 2016
M.L. Bileschi, D. Belanger, D. Bryant, T. Sanderson, B. Carter, D. Sculley, M.A. DePristo, L.J. Colwell, Using Deep Learning to Annotate the Protein Universe, bioRxiv (2019) 1–29 doi:10.1101/626507.
Jeske, Placzek, Schomburg, Chang, Schomburg (b0185) 2019; 47
B. Manavalan, S. Basith, T. Hwan Shin, S. Choi, M. Ok Kim, G. Lee, MLACP: machine-learning-based prediction of anticancer peptides, Oncotarget (44) 77121–77136. doi:10.18632/oncotarget.20365.
Wang, Wang, Li, Lee (b0205) 2020; 8
Bonetta, Valentino (b0080) 2020; 88
T.T.D. Nguyen, N.Q.K. Le, Q.T. Ho, D.V. Phan, Y.Y. Ou, Using word embedding technique to efficiently represent protein sequences for identifying substrate specificities of transporters, Analytical Biochemistry (January) 73–81. doi:10.1016/j.ab.2019.04.011.
A. Tomic, I. Tomic, L. Waldron, L. Geistlinger, M. Kuhn, R.L. Spreng, L.C. Dahora, K.E. Seaton, G. Tomaras, J. Hill, N.A. Duggal, R.D. Pollock, N.R. Lazarus, S.D. Harridge, J.M. Lord, P. Khatri, A.J. Pollard, M.M. Davis, SIMON: Open-Source Knowledge Discovery Platform, Patterns (1) 100178. doi:10.1016/j.patter.2020.100178.
Fu, Cao, Li, Wang (b0070) 2020; 21
Ofer, Linial (b0255) 2015; 31
Cock, Antao, Chang, Chapman, Cox, Dalke, Friedberg, Hamelryck, Kauff, Wilczynski, De Hoon (b0145) 2009; 25
E.Y. Lee, B.M. Fulan, G.C.L. Wong, A.L. Ferguson, Mapping membrane activity in undiscovered peptide sequence space using machine learning, no. 48. doi:10.1073/pnas.1609893113.
Veltri (10.1016/j.neucom.2021.07.102_b0065) 2018; 34
10.1016/j.neucom.2021.07.102_b0090
Cao (10.1016/j.neucom.2021.07.102_b0240) 2013; 29
Asgari (10.1016/j.neucom.2021.07.102_b0140) 2015; 10
10.1016/j.neucom.2021.07.102_b0210
10.1016/j.neucom.2021.07.102_b0010
10.1016/j.neucom.2021.07.102_b0175
10.1016/j.neucom.2021.07.102_b0055
10.1016/j.neucom.2021.07.102_b0095
Wang (10.1016/j.neucom.2021.07.102_b0205) 2020; 8
Liu (10.1016/j.neucom.2021.07.102_b0280) 2017; 8
10.1016/j.neucom.2021.07.102_b0135
10.1016/j.neucom.2021.07.102_b0015
Müller (10.1016/j.neucom.2021.07.102_b0150) 2017; 33
10.1016/j.neucom.2021.07.102_b0200
Muller (10.1016/j.neucom.2021.07.102_b0105) 2017
10.1016/j.neucom.2021.07.102_b0120
10.1016/j.neucom.2021.07.102_b0165
Ofer (10.1016/j.neucom.2021.07.102_b0255) 2015; 31
Awad (10.1016/j.neucom.2021.07.102_b0025) 2015
Dalkiran (10.1016/j.neucom.2021.07.102_b0050) 2018; 19
10.1016/j.neucom.2021.07.102_b0160
Chollet (10.1016/j.neucom.2021.07.102_b0100) 2017
Spänig (10.1016/j.neucom.2021.07.102_b0195) 2021; 3
Zhang (10.1016/j.neucom.2021.07.102_b0250) 2017; 429
Zhao (10.1016/j.neucom.2021.07.102_b0215) 2003; 19
10.1016/j.neucom.2021.07.102_b0245
10.1016/j.neucom.2021.07.102_b0125
Nanni (10.1016/j.neucom.2021.07.102_b0190) 2014; 1
10.1016/j.neucom.2021.07.102_b0155
10.1016/j.neucom.2021.07.102_b0035
10.1016/j.neucom.2021.07.102_b0230
10.1016/j.neucom.2021.07.102_b0030
10.1016/j.neucom.2021.07.102_b0075
Spänig (10.1016/j.neucom.2021.07.102_b0110) 2019; 12
Jeske (10.1016/j.neucom.2021.07.102_b0185) 2019; 47
Budach (10.1016/j.neucom.2021.07.102_b0235) 2018; 34
Robinson (10.1016/j.neucom.2021.07.102_b0170) 2015; 59
10.1016/j.neucom.2021.07.102_b0115
Zou (10.1016/j.neucom.2021.07.102_b0180) 2019; 10
Li (10.1016/j.neucom.2021.07.102_b0040) 2018; 34
Ryu (10.1016/j.neucom.2021.07.102_b0045) 2019; 116
Pedregosa (10.1016/j.neucom.2021.07.102_b0085) 2011; 12
Chen (10.1016/j.neucom.2021.07.102_b0220) 2019; 00
Muhammod (10.1016/j.neucom.2021.07.102_b0285) 2019
Fu (10.1016/j.neucom.2021.07.102_b0070) 2020; 21
Chen (10.1016/j.neucom.2021.07.102_b0295) 2021
10.1016/j.neucom.2021.07.102_b0060
Bonetta (10.1016/j.neucom.2021.07.102_b0080) 2020; 88
Liu (10.1016/j.neucom.2021.07.102_b0290) 2019; 47
Cock (10.1016/j.neucom.2021.07.102_b0145) 2009; 25
Brandes (10.1016/j.neucom.2021.07.102_b0275) 2016; 2016
van den Berg (10.1016/j.neucom.2021.07.102_b0270) 2014; 15
Shi (10.1016/j.neucom.2021.07.102_b0005) 2019; 00
Villegas-Morcillo (10.1016/j.neucom.2021.07.102_b0130) 2020
Price (10.1016/j.neucom.2021.07.102_b0020) 2018; 557
Xiao (10.1016/j.neucom.2021.07.102_b0260) 2015; 31
10.1016/j.neucom.2021.07.102_b0225
Dong (10.1016/j.neucom.2021.07.102_b0265) 2021; 22
10.1016/j.neucom.2021.07.102_b0300
References_xml – reference: W. Kopp, R. Monti, A. Tamburrini, U. Ohler, A. Akalin, Deep learning for genomics using Janggu, Nat. Commun. (1) 1–7. doi:10.1038/s41467-020-17155-y.
– volume: 3
  start-page: 1
  year: 2021
  end-page: 13
  ident: b0195
  article-title: A large-scale comparative study on peptide encodings for biomedical classification
  publication-title: NAR Genomics Bioinforma.
– volume: 557
  start-page: 503
  year: 2018
  end-page: 509
  ident: b0020
  article-title: Mutant phenotypes for thousands of bacterial genes of unknown function
  publication-title: Nature
– volume: 34
  start-page: 2740
  year: 2018
  end-page: 2747
  ident: b0065
  article-title: Deep learning improves antimicrobial peptide recognition
  publication-title: Bioinformatics
– reference: M. Littmann, M. Heinzinger, C. Dallago, T. Olenyi, &. B. Rost, Embeddings from deep learning transfer GO annotations beyond homology, bioRxiv 2020.09.04.282814 doi:10.1038/s41598-020-80786-0.
– volume: 34
  start-page: 760
  year: 2018
  end-page: 769
  ident: b0040
  article-title: DEEPre: Sequence-based enzyme EC number prediction by deep learning
  publication-title: Bioinformatics
– volume: 12
  start-page: 2825
  year: 2011
  end-page: 2830
  ident: b0085
  article-title: Scikit-learn: Machine learning in Python
  publication-title: Journal of Machine Learning Research
– volume: 12
  start-page: 1
  year: 2019
  end-page: 29
  ident: b0110
  article-title: Encodings and models for antimicrobial peptide classification for multi-resistant pathogens
  publication-title: BioData Mining
– reference: A. Tomic, I. Tomic, L. Waldron, L. Geistlinger, M. Kuhn, R.L. Spreng, L.C. Dahora, K.E. Seaton, G. Tomaras, J. Hill, N.A. Duggal, R.D. Pollock, N.R. Lazarus, S.D. Harridge, J.M. Lord, P. Khatri, A.J. Pollard, M.M. Davis, SIMON: Open-Source Knowledge Discovery Platform, Patterns (1) 100178. doi:10.1016/j.patter.2020.100178.
– volume: 34
  start-page: 3035
  year: 2018
  end-page: 3037
  ident: b0235
  article-title: Pysster: Classification of biological sequences by learning sequence and structure motifs with convolutional neural networks
  publication-title: Bioinformatics
– year: 2015
  ident: b0025
  article-title: Efficient Learning Machines
– reference: V.I. Jurtz, A.R. Johansen, M. Nielsen, J.J. Almagro Armenteros, H. Nielsen, C.K. Sønderby, O. Winther, S.K. Sønderby, An introduction to deep learning on biological sequence data: Examples and solutions, Bioinformatics 33 (22) (2017) 3685–3690. doi:10.1093/bioinformatics/btx531.
– reference: A. Pande, S. Patiyal, A. Lathwal, C. Arora, D. Kaur, A. Dhall, G. Mishra, H. Kaur, N. Sharma, S. Jain, S.S. Usmani, P. Agrawal, R. Kumar, V. Kumar, G.P. Raghava, Computing wide range of protein/peptide features from their sequence and structure, bioRxiv 599126 doi:10.1101/599126.
– reference: D.S. Cao, Y.Z. Liang, J. Yan, G.S. Tan, Q.S. Xu, S. Liu, PyDPI: Freely available python package for chemoinformatics, bioinformatics, and chemogenomics studies, Journal of Chemical Information and Modeling doi:10.1021/ci400127q.
– reference: A.S. Schwartz, G.J. Hannum, Z.R. Dwiel, M.E. Smoot, A.R. Grant, J.M. Knight, S.A. Becker, J.R. Eads, M.C. Lafave, H. Eavani, Y. Liu, A.K. Bansal, T.H. Richardson, Deep Semantic Protein Representation for Annotation, Discovery, and Engineering, bioRxiv doi:10.1101/365965.
– volume: 1
  year: 2014
  ident: b0190
  article-title: An empirical study of different approaches for protein classification
  publication-title: Sci. World J.
– reference: F. Chollet, E. all., Keras (2015). URL:https://keras.io.
– reference: M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, Y.J. Michael Isard, Rafal Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, M. Schuster, R. Monga, S. Moore, D. Murray, J. Chris Olah, O. Shlens, B. Steiner, I. Sutskever, P.T. Kunal Talwar, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, TensorFlow: Large-scale machine learning on heterogeneous systems. URL:tensorflow.org.
– volume: 8
  start-page: 1
  year: 2020
  end-page: 13
  ident: b0205
  article-title: Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites
  publication-title: Frontiers in Cell and Developmental Biology
– volume: 8
  start-page: 13338
  year: 2017
  end-page: 13343
  ident: b0280
  article-title: Pse-Analysis: A python package for DNA/RNA and protein/ peptide sequence analysis based on pseudo components and kernel methods
  publication-title: Oncotarget
– volume: 19
  start-page: 1
  year: 2018
  end-page: 13
  ident: b0050
  article-title: ECPred: A tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature
  publication-title: BMC Bioinformatics
– start-page: 2
  year: 2019
  end-page: 3
  ident: b0285
  article-title: PyFeat: a Python-based effective feature generation tool for DNA, RNA and protein sequences
  publication-title: Bioinformatics
– reference: A.W. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. Qin, A. Žídek, A.W. Nelson, A. Bridgland, H. Penedones, S. Petersen, K. Simonyan, S. Crossan, P. Kohli, D.T. Jones, D. Silver, K. Kavukcuoglu, D. Hassabis, Improved protein structure prediction using potentials from deep learning, Nature (7792) 706–710. doi:10.1038/s41586-019-1923-7.
– volume: 29
  start-page: 960
  year: 2013
  end-page: 962
  ident: b0240
  article-title: Propy: A tool to generate various modes of Chou’s PseAAC
  publication-title: Bioinformatics
– reference: P. Bhadra, J. Yan, J. Li, S. Fong, S.W. Siu, AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest, Scientific Reports (1) 1–10. doi:10.1038/s41598-018-19752-w.
– start-page: 1
  year: 2020
  end-page: 9
  ident: b0130
  article-title: Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function
– reference: T.T.D. Nguyen, N.Q.K. Le, Q.T. Ho, D.V. Phan, Y.Y. Ou, Using word embedding technique to efficiently represent protein sequences for identifying substrate specificities of transporters, Analytical Biochemistry (January) 73–81. doi:10.1016/j.ab.2019.04.011.
– volume: 00
  start-page: 1
  year: 2019
  end-page: 11
  ident: b0220
  article-title: iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA
  publication-title: RNA and protein sequence data, Briefings in Bioinformatics
– volume: 2016
  start-page: 1
  year: 2016
  end-page: 10
  ident: b0275
  article-title: ASAP: A machine learning framework for local protein properties
  publication-title: Database
– reference: M. Sandberg, et al., New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids, J. Med. Chem. 41 (14) (1998) 2481–2491. doi:10.1021/jm9700575.
– reference: B. Liu, BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches, Briefings in Bioinformatics (January) 1–15. doi:10.1093/bib/bbx165.
– reference: Y. Cai, J. Wang, L. Deng, SDN2GO: An integrated deep learning model for protein function prediction, Frontiers in Bioengineering and Biotechnology 8. doi:10.3389/fbioe.2020.00391.
– year: 2017
  ident: b0100
  publication-title: Deep Learning with Python
– reference: M.L. Bileschi, D. Belanger, D. Bryant, T. Sanderson, B. Carter, D. Sculley, M.A. DePristo, L.J. Colwell, Using Deep Learning to Annotate the Protein Universe, bioRxiv (2019) 1–29 doi:10.1101/626507.
– volume: 22
  start-page: 474
  year: 2021
  end-page: 484
  ident: b0265
  article-title: BioMedR: An R/CRAN package for integrated data analysis pipeline in biomedical study
  publication-title: Brief. Bioinform.
– reference: L. McInnes, J. Healy, J. Melville, UMAP: Uniform manifold approximation and projection for dimension reduction, arXiv arXiv:1802.03426.
– volume: 31
  start-page: 3429
  year: 2015
  end-page: 3436
  ident: b0255
  article-title: ProFET: Feature engineering captures high-level protein functions
  publication-title: Bioinformatics
– volume: 31
  start-page: 1857
  year: 2015
  end-page: 1859
  ident: b0260
  article-title: Protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences
  publication-title: Bioinformatics
– reference: S.F. Altschul, T.L. Madden, A.A. Schäffer, J. Zhang, Z. Zhang, W. Miller, D.J. Lipman, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs., Tech. Rep. 17 (1997). doi:10.1046/j.1471-8286.2003.00484.x.
– volume: 88
  start-page: 397
  year: 2020
  end-page: 413
  ident: b0080
  article-title: Machine learning techniques for protein function prediction, Proteins: Structure
  publication-title: Function and Bioinformatics
– year: 2017
  ident: b0105
  article-title: Introduction to Machine Learning with Python: A guide for data scientists
– volume: 33
  start-page: 2753
  year: 2017
  end-page: 2755
  ident: b0150
  article-title: modlAMP: Python for antimicrobial peptides
  publication-title: Bioinformatics (Oxford, England)
– reference: L. Nanni, A. Lumini, A new encoding technique for peptide classification, Expert Systems with Applications (4) 3185–3191. doi:10.1016/j.eswa.2010.09.005.
– volume: 25
  start-page: 1422
  year: 2009
  end-page: 1423
  ident: b0145
  article-title: Biopython: Freely available Python tools for computational molecular biology and bioinformatics
  publication-title: Bioinformatics
– volume: 21
  start-page: 1
  year: 2020
  end-page: 14
  ident: b0070
  article-title: ACEP: Improving antimicrobial peptides recognition through automatic feature fusion and amino acid embedding
  publication-title: BMC Genomics
– reference: J. Dong, Z.J. Yao, L. Zhang, F. Luo, Q. Lin, A.P. Lu, A.F. Chen, D.S. Cao, PyBioMed: a python library for various molecular representations of chemicals, proteins and DNAs and their interactions, Journal of Cheminformatics doi:10.1186/s13321-018-0270-2.
– volume: 47
  start-page: D542
  year: 2019
  end-page: D549
  ident: b0185
  article-title: BRENDA in 2019: A European ELIXIR core data resource
  publication-title: Nucleic Acids Research
– volume: 00
  start-page: 1
  year: 2019
  end-page: 25
  ident: b0005
  article-title: Deep learning for mining protein data
  publication-title: Briefings in Bioinformatics
– volume: 116
  start-page: 13996
  year: 2019
  end-page: 14001
  ident: b0045
  article-title: Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers
  publication-title: Proceedings of the National Academy of Sciences of the United States of America
– reference: E.Y. Lee, B.M. Fulan, G.C.L. Wong, A.L. Ferguson, Mapping membrane activity in undiscovered peptide sequence space using machine learning, no. 48. doi:10.1073/pnas.1609893113.
– volume: 59
  start-page: 1
  year: 2015
  end-page: 41
  ident: b0170
  article-title: Enzymes: principles and biotechnological applications
  publication-title: Essays in Biochemistry
– volume: 429
  start-page: 416
  year: 2017
  end-page: 425
  ident: b0250
  article-title: PROFEAT Update: A Protein Features Web Server with Added Facility to Compute Network Descriptors for Studying Omics-Derived Networks
  publication-title: Journal of Molecular Biology
– start-page: 1
  year: 2021
  end-page: 19
  ident: b0295
  article-title: iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization
  publication-title: Nucleic Acids Res.
– volume: 10
  start-page: 1
  year: 2015
  end-page: 15
  ident: b0140
  article-title: Continuous distributed representation of biological sequences for deep proteomics and genomics
  publication-title: PLoS ONE
– volume: 19
  start-page: 1978
  year: 2003
  end-page: 1984
  ident: b0215
  article-title: Application of support vector machines for T-cell epitopes prediction
  publication-title: Bioinformatics
– volume: 47
  year: 2019
  ident: b0290
  article-title: BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches
  publication-title: Nucleic acids research
– volume: 10
  start-page: 1
  year: 2019
  end-page: 10
  ident: b0180
  article-title: mlDEEPre: Multi-functional enzyme function prediction with hierarchical multi-label deep learning
  publication-title: Frontiers in Genetics
– reference: B. Manavalan, S. Basith, T. Hwan Shin, S. Choi, M. Ok Kim, G. Lee, MLACP: machine-learning-based prediction of anticancer peptides, Oncotarget (44) 77121–77136. doi:10.18632/oncotarget.20365.
– volume: 15
  start-page: 1
  year: 2014
  end-page: 10
  ident: b0270
  article-title: SPiCE: A web-based tool for sequence-based protein classification and exploration
  publication-title: BMC Bioinformatics
– reference: I. Inza, B. Calvo, R. Armañanzas, E. Bengoetxea, P. Larrañaga, J.A. Lozano, Machine learning: an indispensable tool in bioinformatics., Methods in molecular biology (Clifton, N.J.) 593 (2010) 25–48. doi:10.1007/978-1-60327-194-3_2.
– ident: 10.1016/j.neucom.2021.07.102_b0245
  doi: 10.1186/s13321-018-0270-2
– ident: 10.1016/j.neucom.2021.07.102_b0175
  doi: 10.1101/365965
– ident: 10.1016/j.neucom.2021.07.102_b0075
  doi: 10.1038/s41586-019-1923-7
– year: 2017
  ident: 10.1016/j.neucom.2021.07.102_b0105
– ident: 10.1016/j.neucom.2021.07.102_b0225
  doi: 10.1038/s41467-020-17155-y
– volume: 34
  start-page: 3035
  issue: 17
  year: 2018
  ident: 10.1016/j.neucom.2021.07.102_b0235
  article-title: Pysster: Classification of biological sequences by learning sequence and structure motifs with convolutional neural networks
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bty222
– volume: 25
  start-page: 1422
  issue: 11
  year: 2009
  ident: 10.1016/j.neucom.2021.07.102_b0145
  article-title: Biopython: Freely available Python tools for computational molecular biology and bioinformatics
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btp163
– volume: 2016
  start-page: 1
  year: 2016
  ident: 10.1016/j.neucom.2021.07.102_b0275
  article-title: ASAP: A machine learning framework for local protein properties
  publication-title: Database
  doi: 10.1093/database/baw133
– volume: 10
  start-page: 1
  issue: 11
  year: 2015
  ident: 10.1016/j.neucom.2021.07.102_b0140
  article-title: Continuous distributed representation of biological sequences for deep proteomics and genomics
  publication-title: PLoS ONE
  doi: 10.1371/journal.pone.0141287
– ident: 10.1016/j.neucom.2021.07.102_b0015
  doi: 10.1101/626507
– ident: 10.1016/j.neucom.2021.07.102_b0160
  doi: 10.1021/ci400127q
– ident: 10.1016/j.neucom.2021.07.102_b0030
  doi: 10.1007/978-1-60327-194-3_2
– ident: 10.1016/j.neucom.2021.07.102_b0090
– volume: 10
  start-page: 1
  issue: JAN
  year: 2019
  ident: 10.1016/j.neucom.2021.07.102_b0180
  article-title: mlDEEPre: Multi-functional enzyme function prediction with hierarchical multi-label deep learning
  publication-title: Frontiers in Genetics
– ident: 10.1016/j.neucom.2021.07.102_b0010
  doi: 10.1093/nar/25.17.3389
– volume: 1
  year: 2014
  ident: 10.1016/j.neucom.2021.07.102_b0190
  article-title: An empirical study of different approaches for protein classification
  publication-title: Sci. World J.
– volume: 00
  start-page: 1
  issue: January
  year: 2019
  ident: 10.1016/j.neucom.2021.07.102_b0220
  article-title: iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA
  publication-title: RNA and protein sequence data, Briefings in Bioinformatics
– volume: 47
  issue: 20
  year: 2019
  ident: 10.1016/j.neucom.2021.07.102_b0290
  article-title: BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches
  publication-title: Nucleic acids research
  doi: 10.1093/nar/gkz740
– year: 2017
  ident: 10.1016/j.neucom.2021.07.102_b0100
  publication-title: Deep Learning with Python
– start-page: 2
  year: 2019
  ident: 10.1016/j.neucom.2021.07.102_b0285
  article-title: PyFeat: a Python-based effective feature generation tool for DNA, RNA and protein sequences
  publication-title: Bioinformatics
– year: 2015
  ident: 10.1016/j.neucom.2021.07.102_b0025
– volume: 15
  start-page: 1
  issue: 1
  year: 2014
  ident: 10.1016/j.neucom.2021.07.102_b0270
  article-title: SPiCE: A web-based tool for sequence-based protein classification and exploration
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-15-93
– volume: 34
  start-page: 760
  issue: 5
  year: 2018
  ident: 10.1016/j.neucom.2021.07.102_b0040
  article-title: DEEPre: Sequence-based enzyme EC number prediction by deep learning
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btx680
– volume: 21
  start-page: 1
  issue: 1
  year: 2020
  ident: 10.1016/j.neucom.2021.07.102_b0070
  article-title: ACEP: Improving antimicrobial peptides recognition through automatic feature fusion and amino acid embedding
  publication-title: BMC Genomics
  doi: 10.1186/s12864-020-06978-0
– volume: 429
  start-page: 416
  issue: 3
  year: 2017
  ident: 10.1016/j.neucom.2021.07.102_b0250
  article-title: PROFEAT Update: A Protein Features Web Server with Added Facility to Compute Network Descriptors for Studying Omics-Derived Networks
  publication-title: Journal of Molecular Biology
  doi: 10.1016/j.jmb.2016.10.013
– start-page: 1
  year: 2021
  ident: 10.1016/j.neucom.2021.07.102_b0295
  article-title: iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization
  publication-title: Nucleic Acids Res.
– volume: 12
  start-page: 1
  issue: 1
  year: 2019
  ident: 10.1016/j.neucom.2021.07.102_b0110
  article-title: Encodings and models for antimicrobial peptide classification for multi-resistant pathogens
  publication-title: BioData Mining
  doi: 10.1186/s13040-019-0196-x
– volume: 22
  start-page: 474
  issue: 1
  year: 2021
  ident: 10.1016/j.neucom.2021.07.102_b0265
  article-title: BioMedR: An R/CRAN package for integrated data analysis pipeline in biomedical study
  publication-title: Brief. Bioinform.
  doi: 10.1093/bib/bbz150
– ident: 10.1016/j.neucom.2021.07.102_b0210
  doi: 10.1016/j.ab.2019.04.011
– volume: 33
  start-page: 2753
  issue: 17
  year: 2017
  ident: 10.1016/j.neucom.2021.07.102_b0150
  article-title: modlAMP: Python for antimicrobial peptides
  publication-title: Bioinformatics (Oxford, England)
– ident: 10.1016/j.neucom.2021.07.102_b0095
– ident: 10.1016/j.neucom.2021.07.102_b0115
  doi: 10.1093/bioinformatics/btx531
– volume: 19
  start-page: 1
  issue: 1
  year: 2018
  ident: 10.1016/j.neucom.2021.07.102_b0050
  article-title: ECPred: A tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature
  publication-title: BMC Bioinformatics
  doi: 10.1186/s12859-018-2368-y
– ident: 10.1016/j.neucom.2021.07.102_b0055
  doi: 10.3389/fbioe.2020.00391
– volume: 8
  start-page: 13338
  issue: 8
  year: 2017
  ident: 10.1016/j.neucom.2021.07.102_b0280
  article-title: Pse-Analysis: A python package for DNA/RNA and protein/ peptide sequence analysis based on pseudo components and kernel methods
  publication-title: Oncotarget
  doi: 10.18632/oncotarget.14524
– ident: 10.1016/j.neucom.2021.07.102_b0120
  doi: 10.1016/j.eswa.2010.09.005
– volume: 34
  start-page: 2740
  issue: 16
  year: 2018
  ident: 10.1016/j.neucom.2021.07.102_b0065
  article-title: Deep learning improves antimicrobial peptide recognition
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bty179
– volume: 00
  start-page: 1
  issue: August
  year: 2019
  ident: 10.1016/j.neucom.2021.07.102_b0005
  article-title: Deep learning for mining protein data
  publication-title: Briefings in Bioinformatics
– volume: 29
  start-page: 960
  issue: 7
  year: 2013
  ident: 10.1016/j.neucom.2021.07.102_b0240
  article-title: Propy: A tool to generate various modes of Chou’s PseAAC
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btt072
– ident: 10.1016/j.neucom.2021.07.102_b0300
  doi: 10.1093/bib/bbx165
– volume: 31
  start-page: 3429
  issue: 21
  year: 2015
  ident: 10.1016/j.neucom.2021.07.102_b0255
  article-title: ProFET: Feature engineering captures high-level protein functions
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btv345
– ident: 10.1016/j.neucom.2021.07.102_b0060
  doi: 10.1038/s41598-018-19752-w
– volume: 3
  start-page: 1
  issue: 2
  year: 2021
  ident: 10.1016/j.neucom.2021.07.102_b0195
  article-title: A large-scale comparative study on peptide encodings for biomedical classification
  publication-title: NAR Genomics Bioinforma.
  doi: 10.1093/nargab/lqab039
– volume: 31
  start-page: 1857
  issue: 11
  year: 2015
  ident: 10.1016/j.neucom.2021.07.102_b0260
  article-title: Protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btv042
– volume: 59
  start-page: 1
  year: 2015
  ident: 10.1016/j.neucom.2021.07.102_b0170
  article-title: Enzymes: principles and biotechnological applications
  publication-title: Essays in Biochemistry
  doi: 10.1042/bse0590001
– ident: 10.1016/j.neucom.2021.07.102_b0165
– volume: 47
  start-page: D542
  issue: D1
  year: 2019
  ident: 10.1016/j.neucom.2021.07.102_b0185
  article-title: BRENDA in 2019: A European ELIXIR core data resource
  publication-title: Nucleic Acids Research
  doi: 10.1093/nar/gky1048
– volume: 8
  start-page: 1
  issue: September
  year: 2020
  ident: 10.1016/j.neucom.2021.07.102_b0205
  article-title: Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites
  publication-title: Frontiers in Cell and Developmental Biology
– ident: 10.1016/j.neucom.2021.07.102_b0135
  doi: 10.1101/2020.09.04.282814
– volume: 116
  start-page: 13996
  issue: 28
  year: 2019
  ident: 10.1016/j.neucom.2021.07.102_b0045
  article-title: Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers
  publication-title: Proceedings of the National Academy of Sciences of the United States of America
  doi: 10.1073/pnas.1821905116
– start-page: 1
  year: 2020
  ident: 10.1016/j.neucom.2021.07.102_b0130
– volume: 19
  start-page: 1978
  issue: 15
  year: 2003
  ident: 10.1016/j.neucom.2021.07.102_b0215
  article-title: Application of support vector machines for T-cell epitopes prediction
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btg255
– volume: 557
  start-page: 503
  issue: 7706
  year: 2018
  ident: 10.1016/j.neucom.2021.07.102_b0020
  article-title: Mutant phenotypes for thousands of bacterial genes of unknown function
  publication-title: Nature
  doi: 10.1038/s41586-018-0124-0
– ident: 10.1016/j.neucom.2021.07.102_b0230
  doi: 10.1016/j.patter.2020.100178
– volume: 88
  start-page: 397
  issue: 3
  year: 2020
  ident: 10.1016/j.neucom.2021.07.102_b0080
  article-title: Machine learning techniques for protein function prediction, Proteins: Structure
  publication-title: Function and Bioinformatics
  doi: 10.1002/prot.25832
– volume: 12
  start-page: 2825
  year: 2011
  ident: 10.1016/j.neucom.2021.07.102_b0085
  article-title: Scikit-learn: Machine learning in Python
  publication-title: Journal of Machine Learning Research
– ident: 10.1016/j.neucom.2021.07.102_b0125
  doi: 10.1021/jm9700575
– ident: 10.1016/j.neucom.2021.07.102_b0155
  doi: 10.1101/599126
– ident: 10.1016/j.neucom.2021.07.102_b0035
  doi: 10.1073/pnas.1609893113
– ident: 10.1016/j.neucom.2021.07.102_b0200
  doi: 10.18632/oncotarget.20365
SSID ssj0017129
Score 2.4385316
Snippet The field of protein data mining has been growing rapidly in the last years. To characterize proteins and determine their function from their amino acid...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 172
SubjectTerms Antimicrobial peptide
Deep learning
Enzyme
Machine learning
Protein/peptide classification
Python Package
Title ProPythia: A Python package for protein classification based on machine and deep learning
URI https://dx.doi.org/10.1016/j.neucom.2021.07.102
Volume 484
WOSCitedRecordID wos000772806500002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1872-8286
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017129
  issn: 0925-2312
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwELZWWw5ceCNKAfnANSh2HnZ6W6EiQKVaQZGWU5TYs2X7SFbbbFX49czEdtpSVOiBS2RZG2d354tnPPPNDGOvK2FjbU0eVWktoxQqGdXGmiivtU1Salio-kThXbW3p2ezYjoa_Qy5MGfHqmn0-Xmx_K-ixjkUNqXO3kLcw6I4gWMUOl5R7Hj9J8FPV-30R_d9Ubmkcxq3VD_VHBE9h1iFfW0G4p-T5UxUIQcCUmiWggcnPcHSxRUswDK0lji4bMn2VT1M3xPCexsmJ1R0wRLCBu_CF2JqL1bOedtUlBrUDYpgt12funATYnSY_dwaF4T6tDhYe0K_d0vgiXYgAQb_oswiNB6vbLWubmnYLIVr2uP1rnBNiK5t6c67cPimgTXxe_BZgsqtilheqLAQtv9Nsw18w0BlOyzdKiWtUsYKZ1F9b0iVFXrMNiYfdmYfhxiUEtJVavQ_JCRe9uzA69_mz4bNJWNl_wG7508ZfOLQ8ZCNoHnE7ocOHtxv6I_ZtwEs23zCHVS4hwpHqHAPFX4VKryHCseBhwpHqHCCCg9QecK-vtvZf_s-8s02IpMkoouUABkXVtq5MBkYXRvI61jXRTw3AiqwGkwKmUzq2oKE2GR5VaQVnq_jOUibJE_ZuGkbeMZ4VudKGptZsJCm0hRoIsIc333QVlmbb7Ik_FOl8ZXoqSHKcXmTnDZZNNy1dJVY_vJ5FYRQemvSWYklIuvGO5_f8klb7O7FG_CCjbvVGl6yO-asW5yuXnlY_QIdEJsX
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=ProPythia%3A+A+Python+package+for+protein+classification+based+on+machine+and+deep+learning&rft.jtitle=Neurocomputing+%28Amsterdam%29&rft.au=Sequeira%2C+Ana+Marta&rft.au=Lousa%2C+Diana&rft.au=Rocha%2C+Miguel&rft.date=2022-05-01&rft.issn=0925-2312&rft.volume=484&rft.spage=172&rft.epage=182&rft_id=info:doi/10.1016%2Fj.neucom.2021.07.102&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_neucom_2021_07_102
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0925-2312&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0925-2312&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0925-2312&client=summon