DBP-GAPred: An intelligent method for prediction of DNA-binding proteins types by enhanced evolutionary profile features with ensemble learning

DNA-binding proteins (DBPs) perform an influential role in diverse biological activities like DNA replication, slicing, repair, and transcription. Some DBPs are indispensable for understanding many types of human cancers (i.e. lung, breast, and liver cancer) and chronic diseases (i.e. AIDS/HIV, asth...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of bioinformatics and computational biology Jg. 19; H. 4; S. 2150018
Hauptverfasser: Barukab, Omar, Ali, Farman, Khan, Sher Afzal
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Singapore 01.08.2021
Schlagworte:
ISSN:1757-6334, 1757-6334
Online-Zugang:Weitere Angaben
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract DNA-binding proteins (DBPs) perform an influential role in diverse biological activities like DNA replication, slicing, repair, and transcription. Some DBPs are indispensable for understanding many types of human cancers (i.e. lung, breast, and liver cancer) and chronic diseases (i.e. AIDS/HIV, asthma), while other kinds are involved in antibiotics, steroids, and anti-inflammatory drugs designing. These crucial processes are closely related to DBPs types. DBPs are categorized into single-stranded DNA-binding proteins (ssDBPs) and double-stranded DNA-binding proteins (dsDBPs). Few computational predictors have been reported for discriminating ssDBPs and dsDBPs. However, due to the limitations of the existing methods, an intelligent computational system is still highly desirable. In this work, features from protein sequences are discovered by extending the notion of dipeptide composition (DPC), evolutionary difference formula (EDF), and K-separated bigram (KSB) into the position-specific scoring matrix (PSSM). The highly intrinsic information was encoded by a compression approach named discrete cosine transform (DCT) and the model was trained with support vector machine (SVM). The prediction performance was further boosted by the genetic algorithm (GA) ensemble strategy. The novel predictor (DBP-GAPred) acquired 1.89%, 0.28%, and 6.63% higher accuracies on jackknife, 10-fold, and independent dataset tests, respectively than the best predictor. These outcomes confirm the superiority of our method over the existing predictors.
AbstractList DNA-binding proteins (DBPs) perform an influential role in diverse biological activities like DNA replication, slicing, repair, and transcription. Some DBPs are indispensable for understanding many types of human cancers (i.e. lung, breast, and liver cancer) and chronic diseases (i.e. AIDS/HIV, asthma), while other kinds are involved in antibiotics, steroids, and anti-inflammatory drugs designing. These crucial processes are closely related to DBPs types. DBPs are categorized into single-stranded DNA-binding proteins (ssDBPs) and double-stranded DNA-binding proteins (dsDBPs). Few computational predictors have been reported for discriminating ssDBPs and dsDBPs. However, due to the limitations of the existing methods, an intelligent computational system is still highly desirable. In this work, features from protein sequences are discovered by extending the notion of dipeptide composition (DPC), evolutionary difference formula (EDF), and K-separated bigram (KSB) into the position-specific scoring matrix (PSSM). The highly intrinsic information was encoded by a compression approach named discrete cosine transform (DCT) and the model was trained with support vector machine (SVM). The prediction performance was further boosted by the genetic algorithm (GA) ensemble strategy. The novel predictor (DBP-GAPred) acquired 1.89%, 0.28%, and 6.63% higher accuracies on jackknife, 10-fold, and independent dataset tests, respectively than the best predictor. These outcomes confirm the superiority of our method over the existing predictors.
DNA-binding proteins (DBPs) perform an influential role in diverse biological activities like DNA replication, slicing, repair, and transcription. Some DBPs are indispensable for understanding many types of human cancers (i.e. lung, breast, and liver cancer) and chronic diseases (i.e. AIDS/HIV, asthma), while other kinds are involved in antibiotics, steroids, and anti-inflammatory drugs designing. These crucial processes are closely related to DBPs types. DBPs are categorized into single-stranded DNA-binding proteins (ssDBPs) and double-stranded DNA-binding proteins (dsDBPs). Few computational predictors have been reported for discriminating ssDBPs and dsDBPs. However, due to the limitations of the existing methods, an intelligent computational system is still highly desirable. In this work, features from protein sequences are discovered by extending the notion of dipeptide composition (DPC), evolutionary difference formula (EDF), and K-separated bigram (KSB) into the position-specific scoring matrix (PSSM). The highly intrinsic information was encoded by a compression approach named discrete cosine transform (DCT) and the model was trained with support vector machine (SVM). The prediction performance was further boosted by the genetic algorithm (GA) ensemble strategy. The novel predictor (DBP-GAPred) acquired 1.89%, 0.28%, and 6.63% higher accuracies on jackknife, 10-fold, and independent dataset tests, respectively than the best predictor. These outcomes confirm the superiority of our method over the existing predictors.DNA-binding proteins (DBPs) perform an influential role in diverse biological activities like DNA replication, slicing, repair, and transcription. Some DBPs are indispensable for understanding many types of human cancers (i.e. lung, breast, and liver cancer) and chronic diseases (i.e. AIDS/HIV, asthma), while other kinds are involved in antibiotics, steroids, and anti-inflammatory drugs designing. These crucial processes are closely related to DBPs types. DBPs are categorized into single-stranded DNA-binding proteins (ssDBPs) and double-stranded DNA-binding proteins (dsDBPs). Few computational predictors have been reported for discriminating ssDBPs and dsDBPs. However, due to the limitations of the existing methods, an intelligent computational system is still highly desirable. In this work, features from protein sequences are discovered by extending the notion of dipeptide composition (DPC), evolutionary difference formula (EDF), and K-separated bigram (KSB) into the position-specific scoring matrix (PSSM). The highly intrinsic information was encoded by a compression approach named discrete cosine transform (DCT) and the model was trained with support vector machine (SVM). The prediction performance was further boosted by the genetic algorithm (GA) ensemble strategy. The novel predictor (DBP-GAPred) acquired 1.89%, 0.28%, and 6.63% higher accuracies on jackknife, 10-fold, and independent dataset tests, respectively than the best predictor. These outcomes confirm the superiority of our method over the existing predictors.
Author Ali, Farman
Barukab, Omar
Khan, Sher Afzal
Author_xml – sequence: 1
  givenname: Omar
  surname: Barukab
  fullname: Barukab, Omar
  organization: Faculty of Computing and Information Technology, King Abdulaziz University, Rabigh 21911 Jeddah, Saudi Arabia
– sequence: 2
  givenname: Farman
  orcidid: 0000-0002-0914-1577
  surname: Ali
  fullname: Ali, Farman
  organization: School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, P. R. China
– sequence: 3
  givenname: Sher Afzal
  surname: Khan
  fullname: Khan, Sher Afzal
  organization: Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
BackLink https://www.ncbi.nlm.nih.gov/pubmed/34291709$$D View this record in MEDLINE/PubMed
BookMark eNpNkMtOwzAQRS1UBKXwAWyQl2wCdvxIwy60vKQKKgHryLHHraXEKYkD6lfwy7iiSKxmdOfM1dw5QSPfekDonJIrSnl6_UpSmmcpiUUQQqf5ARrTTGSJZIyP_vVH6JjxNKcZycfoe367TB6KZQfmBhceOx-grt0KfMANhHVrsG07vIlzp4NrPW4tnj8XSeW8cX4VJ20A53scthvocbXF4NfKazAYPtt62O2obrvjrKsBW1Bh6CL55cI6sj00VZRrUJ2Pfqfo0Kq6h7N9naD3-7u32WOyeHl4mhWLRLN4d8xhjOYSiLCpBUukEbaSmsmKcKW4tFpapjPLcq5JRaa5yTMpmCJWZnkU0wm6_PWNd30M0Ieycb2O0ZWHdujLVAjOBGV0h17s0aFqwJSbzjUxUfn3xPQHYN11GQ
CitedBy_id crossref_primary_10_1016_j_artmed_2024_102860
crossref_primary_10_1016_j_chemolab_2022_104682
crossref_primary_10_1080_07391102_2024_2329777
crossref_primary_10_1080_07391102_2023_2243523
crossref_primary_10_1016_j_ab_2024_115603
crossref_primary_10_1155_2022_5483115
crossref_primary_10_1016_j_ymeth_2024_04_004
crossref_primary_10_1080_07391102_2023_2269280
crossref_primary_10_1109_ACCESS_2023_3321100
crossref_primary_10_1016_j_chemolab_2022_104516
crossref_primary_10_1007_s11831_023_09933_w
crossref_primary_10_1016_j_chemolab_2022_104639
crossref_primary_10_1016_j_jocs_2024_102388
crossref_primary_10_1109_ACCESS_2023_3274601
crossref_primary_10_1016_j_chemolab_2022_104729
crossref_primary_10_1038_s41598_024_84146_0
crossref_primary_10_1016_j_ijbiomac_2024_136475
crossref_primary_10_1016_j_compbiomed_2022_105533
crossref_primary_10_1038_s41598_022_24501_1
crossref_primary_10_1016_j_compbiomed_2021_105006
crossref_primary_10_1016_j_bspc_2022_103856
crossref_primary_10_1016_j_ijbiomac_2025_143844
crossref_primary_10_1038_s41598_022_09484_3
crossref_primary_10_1016_j_compbiomed_2022_106311
ContentType Journal Article
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1142/S0219720021500189
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
EISSN 1757-6334
ExternalDocumentID 34291709
Genre Research Support, Non-U.S. Gov't
Journal Article
GroupedDBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
ID FETCH-LOGICAL-c3709-63ddc46e05f2fef06d5fb6c36b04aa46fc6f3c7f394c0b089d97653a0f6793942
IEDL.DBID 7X8
ISICitedReferencesCount 33
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000692083900008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1757-6334
IngestDate Fri Jul 11 07:58:14 EDT 2025
Thu Jan 02 22:56:43 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 4
Keywords genetic algorithm
discrete cosine transform
support vector machine
position-specific scoring matrix
DNA-binding proteins
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c3709-63ddc46e05f2fef06d5fb6c36b04aa46fc6f3c7f394c0b089d97653a0f6793942
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0002-0914-1577
PMID 34291709
PQID 2554351314
PQPubID 23479
ParticipantIDs proquest_miscellaneous_2554351314
pubmed_primary_34291709
PublicationCentury 2000
PublicationDate 2021-08-00
20210801
PublicationDateYYYYMMDD 2021-08-01
PublicationDate_xml – month: 08
  year: 2021
  text: 2021-08-00
PublicationDecade 2020
PublicationPlace Singapore
PublicationPlace_xml – name: Singapore
PublicationTitle Journal of bioinformatics and computational biology
PublicationTitleAlternate J Bioinform Comput Biol
PublicationYear 2021
Score 2.4057002
Snippet DNA-binding proteins (DBPs) perform an influential role in diverse biological activities like DNA replication, slicing, repair, and transcription. Some DBPs...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 2150018
SubjectTerms Algorithms
Amino Acid Sequence
Computational Biology
Databases, Protein
DNA-Binding Proteins - genetics
Humans
Position-Specific Scoring Matrices
Support Vector Machine
Title DBP-GAPred: An intelligent method for prediction of DNA-binding proteins types by enhanced evolutionary profile features with ensemble learning
URI https://www.ncbi.nlm.nih.gov/pubmed/34291709
https://www.proquest.com/docview/2554351314
Volume 19
WOSCitedRecordID wos000692083900008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LT9wwELYK9NALUPXFo2iQeo1wMn6suaDl1R7KaqW20t5Wjh9lD80uG0DiV_CXO_aa9lZV4pJDlEjWzNjfl5nJfIx9MhJ9HNSiklaJSljyhUGJVarItAMCNN1mT3_Vo9FgMjHjknDrS1vl05mYD2o_dylHfkTUl5C9xlqcLG6qpBqVqqtFQmONbSBRmRTVepL_ftNSVwpRlEJmLZqjbwRnRuemBJm06P5BKjO4XG49d1nbbLPQShiu4uA1exG6N-zx_HRcfR6Ol8Efw7CD2Z_xm7ewUo4GoqywWKZiTXIQzCOcj4bpazlBGuQhDrOuh5So7aF9gNBd55YBCPclZu3yAYrwN8SQx4T2kLK79GwffrV0uyhT_HzLflxefD_7UhUBhsqh5oYM6L0TKnAZmxgiV17GVjlULRfWChWdiuh0RCMcb_nAeCI3Ei2Pira9Ec07tt7Nu_CBATrboLZSucYLQkTTKs95IDbGMXCBO-zwyb5TCvBUtbBdmN_1078W3mHvV06aLlaTOKZIaFrTSnf_4-099qpJ_Si5eW-fbUTa3uEje-nub2f98iBHDl1H46vfUU7PNA
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=DBP-GAPred%3A+An+intelligent+method+for+prediction+of+DNA-binding+proteins+types+by+enhanced+evolutionary+profile+features+with+ensemble+learning&rft.jtitle=Journal+of+bioinformatics+and+computational+biology&rft.au=Barukab%2C+Omar&rft.au=Ali%2C+Farman&rft.au=Khan%2C+Sher+Afzal&rft.date=2021-08-01&rft.issn=1757-6334&rft.eissn=1757-6334&rft.volume=19&rft.issue=4&rft.spage=2150018&rft_id=info:doi/10.1142%2FS0219720021500189&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1757-6334&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1757-6334&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1757-6334&client=summon