DBP-GAPred: An intelligent method for prediction of DNA-binding proteins types by enhanced evolutionary profile features with ensemble learning
DNA-binding proteins (DBPs) perform an influential role in diverse biological activities like DNA replication, slicing, repair, and transcription. Some DBPs are indispensable for understanding many types of human cancers (i.e. lung, breast, and liver cancer) and chronic diseases (i.e. AIDS/HIV, asth...
Gespeichert in:
| Veröffentlicht in: | Journal of bioinformatics and computational biology Jg. 19; H. 4; S. 2150018 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Singapore
01.08.2021
|
| Schlagworte: | |
| ISSN: | 1757-6334, 1757-6334 |
| Online-Zugang: | Weitere Angaben |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | DNA-binding proteins (DBPs) perform an influential role in diverse biological activities like DNA replication, slicing, repair, and transcription. Some DBPs are indispensable for understanding many types of human cancers (i.e. lung, breast, and liver cancer) and chronic diseases (i.e. AIDS/HIV, asthma), while other kinds are involved in antibiotics, steroids, and anti-inflammatory drugs designing. These crucial processes are closely related to DBPs types. DBPs are categorized into single-stranded DNA-binding proteins (ssDBPs) and double-stranded DNA-binding proteins (dsDBPs). Few computational predictors have been reported for discriminating ssDBPs and dsDBPs. However, due to the limitations of the existing methods, an intelligent computational system is still highly desirable. In this work, features from protein sequences are discovered by extending the notion of dipeptide composition (DPC), evolutionary difference formula (EDF), and K-separated bigram (KSB) into the position-specific scoring matrix (PSSM). The highly intrinsic information was encoded by a compression approach named discrete cosine transform (DCT) and the model was trained with support vector machine (SVM). The prediction performance was further boosted by the genetic algorithm (GA) ensemble strategy. The novel predictor (DBP-GAPred) acquired 1.89%, 0.28%, and 6.63% higher accuracies on jackknife, 10-fold, and independent dataset tests, respectively than the best predictor. These outcomes confirm the superiority of our method over the existing predictors. |
|---|---|
| AbstractList | DNA-binding proteins (DBPs) perform an influential role in diverse biological activities like DNA replication, slicing, repair, and transcription. Some DBPs are indispensable for understanding many types of human cancers (i.e. lung, breast, and liver cancer) and chronic diseases (i.e. AIDS/HIV, asthma), while other kinds are involved in antibiotics, steroids, and anti-inflammatory drugs designing. These crucial processes are closely related to DBPs types. DBPs are categorized into single-stranded DNA-binding proteins (ssDBPs) and double-stranded DNA-binding proteins (dsDBPs). Few computational predictors have been reported for discriminating ssDBPs and dsDBPs. However, due to the limitations of the existing methods, an intelligent computational system is still highly desirable. In this work, features from protein sequences are discovered by extending the notion of dipeptide composition (DPC), evolutionary difference formula (EDF), and K-separated bigram (KSB) into the position-specific scoring matrix (PSSM). The highly intrinsic information was encoded by a compression approach named discrete cosine transform (DCT) and the model was trained with support vector machine (SVM). The prediction performance was further boosted by the genetic algorithm (GA) ensemble strategy. The novel predictor (DBP-GAPred) acquired 1.89%, 0.28%, and 6.63% higher accuracies on jackknife, 10-fold, and independent dataset tests, respectively than the best predictor. These outcomes confirm the superiority of our method over the existing predictors. DNA-binding proteins (DBPs) perform an influential role in diverse biological activities like DNA replication, slicing, repair, and transcription. Some DBPs are indispensable for understanding many types of human cancers (i.e. lung, breast, and liver cancer) and chronic diseases (i.e. AIDS/HIV, asthma), while other kinds are involved in antibiotics, steroids, and anti-inflammatory drugs designing. These crucial processes are closely related to DBPs types. DBPs are categorized into single-stranded DNA-binding proteins (ssDBPs) and double-stranded DNA-binding proteins (dsDBPs). Few computational predictors have been reported for discriminating ssDBPs and dsDBPs. However, due to the limitations of the existing methods, an intelligent computational system is still highly desirable. In this work, features from protein sequences are discovered by extending the notion of dipeptide composition (DPC), evolutionary difference formula (EDF), and K-separated bigram (KSB) into the position-specific scoring matrix (PSSM). The highly intrinsic information was encoded by a compression approach named discrete cosine transform (DCT) and the model was trained with support vector machine (SVM). The prediction performance was further boosted by the genetic algorithm (GA) ensemble strategy. The novel predictor (DBP-GAPred) acquired 1.89%, 0.28%, and 6.63% higher accuracies on jackknife, 10-fold, and independent dataset tests, respectively than the best predictor. These outcomes confirm the superiority of our method over the existing predictors.DNA-binding proteins (DBPs) perform an influential role in diverse biological activities like DNA replication, slicing, repair, and transcription. Some DBPs are indispensable for understanding many types of human cancers (i.e. lung, breast, and liver cancer) and chronic diseases (i.e. AIDS/HIV, asthma), while other kinds are involved in antibiotics, steroids, and anti-inflammatory drugs designing. These crucial processes are closely related to DBPs types. DBPs are categorized into single-stranded DNA-binding proteins (ssDBPs) and double-stranded DNA-binding proteins (dsDBPs). Few computational predictors have been reported for discriminating ssDBPs and dsDBPs. However, due to the limitations of the existing methods, an intelligent computational system is still highly desirable. In this work, features from protein sequences are discovered by extending the notion of dipeptide composition (DPC), evolutionary difference formula (EDF), and K-separated bigram (KSB) into the position-specific scoring matrix (PSSM). The highly intrinsic information was encoded by a compression approach named discrete cosine transform (DCT) and the model was trained with support vector machine (SVM). The prediction performance was further boosted by the genetic algorithm (GA) ensemble strategy. The novel predictor (DBP-GAPred) acquired 1.89%, 0.28%, and 6.63% higher accuracies on jackknife, 10-fold, and independent dataset tests, respectively than the best predictor. These outcomes confirm the superiority of our method over the existing predictors. |
| Author | Ali, Farman Barukab, Omar Khan, Sher Afzal |
| Author_xml | – sequence: 1 givenname: Omar surname: Barukab fullname: Barukab, Omar organization: Faculty of Computing and Information Technology, King Abdulaziz University, Rabigh 21911 Jeddah, Saudi Arabia – sequence: 2 givenname: Farman orcidid: 0000-0002-0914-1577 surname: Ali fullname: Ali, Farman organization: School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, P. R. China – sequence: 3 givenname: Sher Afzal surname: Khan fullname: Khan, Sher Afzal organization: Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/34291709$$D View this record in MEDLINE/PubMed |
| BookMark | eNpNkMtOwzAQRS1UBKXwAWyQl2wCdvxIwy60vKQKKgHryLHHraXEKYkD6lfwy7iiSKxmdOfM1dw5QSPfekDonJIrSnl6_UpSmmcpiUUQQqf5ARrTTGSJZIyP_vVH6JjxNKcZycfoe367TB6KZQfmBhceOx-grt0KfMANhHVrsG07vIlzp4NrPW4tnj8XSeW8cX4VJ20A53scthvocbXF4NfKazAYPtt62O2obrvjrKsBW1Bh6CL55cI6sj00VZRrUJ2Pfqfo0Kq6h7N9naD3-7u32WOyeHl4mhWLRLN4d8xhjOYSiLCpBUukEbaSmsmKcKW4tFpapjPLcq5JRaa5yTMpmCJWZnkU0wm6_PWNd30M0Ieycb2O0ZWHdujLVAjOBGV0h17s0aFqwJSbzjUxUfn3xPQHYN11GQ |
| CitedBy_id | crossref_primary_10_1016_j_artmed_2024_102860 crossref_primary_10_1016_j_chemolab_2022_104682 crossref_primary_10_1080_07391102_2024_2329777 crossref_primary_10_1080_07391102_2023_2243523 crossref_primary_10_1016_j_ab_2024_115603 crossref_primary_10_1155_2022_5483115 crossref_primary_10_1016_j_ymeth_2024_04_004 crossref_primary_10_1080_07391102_2023_2269280 crossref_primary_10_1109_ACCESS_2023_3321100 crossref_primary_10_1016_j_chemolab_2022_104516 crossref_primary_10_1007_s11831_023_09933_w crossref_primary_10_1016_j_chemolab_2022_104639 crossref_primary_10_1016_j_jocs_2024_102388 crossref_primary_10_1109_ACCESS_2023_3274601 crossref_primary_10_1016_j_chemolab_2022_104729 crossref_primary_10_1038_s41598_024_84146_0 crossref_primary_10_1016_j_ijbiomac_2024_136475 crossref_primary_10_1016_j_compbiomed_2022_105533 crossref_primary_10_1038_s41598_022_24501_1 crossref_primary_10_1016_j_compbiomed_2021_105006 crossref_primary_10_1016_j_bspc_2022_103856 crossref_primary_10_1016_j_ijbiomac_2025_143844 crossref_primary_10_1038_s41598_022_09484_3 crossref_primary_10_1016_j_compbiomed_2022_106311 |
| ContentType | Journal Article |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1142/S0219720021500189 |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| EISSN | 1757-6334 |
| ExternalDocumentID | 34291709 |
| Genre | Research Support, Non-U.S. Gov't Journal Article |
| GroupedDBID | CGR CUY CVF ECM EIF NPM 7X8 |
| ID | FETCH-LOGICAL-c3709-63ddc46e05f2fef06d5fb6c36b04aa46fc6f3c7f394c0b089d97653a0f6793942 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 33 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000692083900008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1757-6334 |
| IngestDate | Fri Jul 11 07:58:14 EDT 2025 Thu Jan 02 22:56:43 EST 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 4 |
| Keywords | genetic algorithm discrete cosine transform support vector machine position-specific scoring matrix DNA-binding proteins |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c3709-63ddc46e05f2fef06d5fb6c36b04aa46fc6f3c7f394c0b089d97653a0f6793942 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ORCID | 0000-0002-0914-1577 |
| PMID | 34291709 |
| PQID | 2554351314 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_2554351314 pubmed_primary_34291709 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-08-00 20210801 |
| PublicationDateYYYYMMDD | 2021-08-01 |
| PublicationDate_xml | – month: 08 year: 2021 text: 2021-08-00 |
| PublicationDecade | 2020 |
| PublicationPlace | Singapore |
| PublicationPlace_xml | – name: Singapore |
| PublicationTitle | Journal of bioinformatics and computational biology |
| PublicationTitleAlternate | J Bioinform Comput Biol |
| PublicationYear | 2021 |
| Score | 2.4057002 |
| Snippet | DNA-binding proteins (DBPs) perform an influential role in diverse biological activities like DNA replication, slicing, repair, and transcription. Some DBPs... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 2150018 |
| SubjectTerms | Algorithms Amino Acid Sequence Computational Biology Databases, Protein DNA-Binding Proteins - genetics Humans Position-Specific Scoring Matrices Support Vector Machine |
| Title | DBP-GAPred: An intelligent method for prediction of DNA-binding proteins types by enhanced evolutionary profile features with ensemble learning |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/34291709 https://www.proquest.com/docview/2554351314 |
| Volume | 19 |
| WOSCitedRecordID | wos000692083900008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LT9wwELYK9NALUPXFo2iQeo1wMn6suaDl1R7KaqW20t5Wjh9lD80uG0DiV_CXO_aa9lZV4pJDlEjWzNjfl5nJfIx9MhJ9HNSiklaJSljyhUGJVarItAMCNN1mT3_Vo9FgMjHjknDrS1vl05mYD2o_dylHfkTUl5C9xlqcLG6qpBqVqqtFQmONbSBRmRTVepL_ftNSVwpRlEJmLZqjbwRnRuemBJm06P5BKjO4XG49d1nbbLPQShiu4uA1exG6N-zx_HRcfR6Ol8Efw7CD2Z_xm7ewUo4GoqywWKZiTXIQzCOcj4bpazlBGuQhDrOuh5So7aF9gNBd55YBCPclZu3yAYrwN8SQx4T2kLK79GwffrV0uyhT_HzLflxefD_7UhUBhsqh5oYM6L0TKnAZmxgiV17GVjlULRfWChWdiuh0RCMcb_nAeCI3Ei2Pira9Ec07tt7Nu_CBATrboLZSucYLQkTTKs95IDbGMXCBO-zwyb5TCvBUtbBdmN_1078W3mHvV06aLlaTOKZIaFrTSnf_4-099qpJ_Si5eW-fbUTa3uEje-nub2f98iBHDl1H46vfUU7PNA |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=DBP-GAPred%3A+An+intelligent+method+for+prediction+of+DNA-binding+proteins+types+by+enhanced+evolutionary+profile+features+with+ensemble+learning&rft.jtitle=Journal+of+bioinformatics+and+computational+biology&rft.au=Barukab%2C+Omar&rft.au=Ali%2C+Farman&rft.au=Khan%2C+Sher+Afzal&rft.date=2021-08-01&rft.issn=1757-6334&rft.eissn=1757-6334&rft.volume=19&rft.issue=4&rft.spage=2150018&rft_id=info:doi/10.1142%2FS0219720021500189&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1757-6334&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1757-6334&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1757-6334&client=summon |