BWM: A Novel, Provable, Ensemble-based Dynamic Programming Algorithm for Sparse Approximations of Computational Protein Design
Sparse energy functions that ignore long range interactions between residue pairs are frequently used by protein design algorithms to reduce computational cost. Current dynamic programming algorithms that fully exploit the optimal substructure produced by these energy functions only compute the GMEC...
Uložené v:
| Vydané v: | Journal of computational biology Ročník 23; číslo 6; s. 413 |
|---|---|
| Hlavní autori: | , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
United States
01.06.2016
|
| Predmet: | |
| ISSN: | 1557-8666, 1557-8666 |
| On-line prístup: | Zistit podrobnosti o prístupe |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Sparse energy functions that ignore long range interactions between residue pairs are frequently used by protein design algorithms to reduce computational cost. Current dynamic programming algorithms that fully exploit the optimal substructure produced by these energy functions only compute the GMEC. This disproportionately favors the sequence of a single, static conformation and overlooks better binding sequences with multiple low-energy conformations. Provable, ensemble-based algorithms such as A* avoid this problem, but A* cannot guarantee better performance than exhaustive enumeration. We propose a novel, provable, dynamic programming algorithm called Branch-Width Minimization* (BWM*) to enumerate a gap-free ensemble of conformations in order of increasing energy. Given a branch-decomposition of branch-width w for an n-residue protein design with at most q discrete side-chain conformations per residue, BWM* returns the sparse GMEC in O([Formula: see text]) time and enumerates each additional conformation in merely O([Formula: see text]) time. We define a new measure, Total Effective Search Space (TESS), which can be computed efficiently a priori before BWM* or A* is run. We ran BWM* on 67 protein design problems and found that TESS discriminated between BWM*-efficient and A*-efficient cases with 100% accuracy. As predicted by TESS and validated experimentally, BWM* outperforms A* in 73% of the cases and computes the full ensemble or a close approximation faster than A*, enumerating each additional conformation in milliseconds. Unlike A*, the performance of BWM* can be predicted in polynomial time before running the algorithm, which gives protein designers the power to choose the most efficient algorithm for their particular design problem. |
|---|---|
| AbstractList | Sparse energy functions that ignore long range interactions between residue pairs are frequently used by protein design algorithms to reduce computational cost. Current dynamic programming algorithms that fully exploit the optimal substructure produced by these energy functions only compute the GMEC. This disproportionately favors the sequence of a single, static conformation and overlooks better binding sequences with multiple low-energy conformations. Provable, ensemble-based algorithms such as A* avoid this problem, but A* cannot guarantee better performance than exhaustive enumeration. We propose a novel, provable, dynamic programming algorithm called Branch-Width Minimization* (BWM*) to enumerate a gap-free ensemble of conformations in order of increasing energy. Given a branch-decomposition of branch-width w for an n-residue protein design with at most q discrete side-chain conformations per residue, BWM* returns the sparse GMEC in O([Formula: see text]) time and enumerates each additional conformation in merely O([Formula: see text]) time. We define a new measure, Total Effective Search Space (TESS), which can be computed efficiently a priori before BWM* or A* is run. We ran BWM* on 67 protein design problems and found that TESS discriminated between BWM*-efficient and A*-efficient cases with 100% accuracy. As predicted by TESS and validated experimentally, BWM* outperforms A* in 73% of the cases and computes the full ensemble or a close approximation faster than A*, enumerating each additional conformation in milliseconds. Unlike A*, the performance of BWM* can be predicted in polynomial time before running the algorithm, which gives protein designers the power to choose the most efficient algorithm for their particular design problem. Sparse energy functions that ignore long range interactions between residue pairs are frequently used by protein design algorithms to reduce computational cost. Current dynamic programming algorithms that fully exploit the optimal substructure produced by these energy functions only compute the GMEC. This disproportionately favors the sequence of a single, static conformation and overlooks better binding sequences with multiple low-energy conformations. Provable, ensemble-based algorithms such as A* avoid this problem, but A* cannot guarantee better performance than exhaustive enumeration. We propose a novel, provable, dynamic programming algorithm called Branch-Width Minimization* (BWM*) to enumerate a gap-free ensemble of conformations in order of increasing energy. Given a branch-decomposition of branch-width w for an n-residue protein design with at most q discrete side-chain conformations per residue, BWM* returns the sparse GMEC in O([Formula: see text]) time and enumerates each additional conformation in merely O([Formula: see text]) time. We define a new measure, Total Effective Search Space (TESS), which can be computed efficiently a priori before BWM* or A* is run. We ran BWM* on 67 protein design problems and found that TESS discriminated between BWM*-efficient and A*-efficient cases with 100% accuracy. As predicted by TESS and validated experimentally, BWM* outperforms A* in 73% of the cases and computes the full ensemble or a close approximation faster than A*, enumerating each additional conformation in milliseconds. Unlike A*, the performance of BWM* can be predicted in polynomial time before running the algorithm, which gives protein designers the power to choose the most efficient algorithm for their particular design problem.Sparse energy functions that ignore long range interactions between residue pairs are frequently used by protein design algorithms to reduce computational cost. Current dynamic programming algorithms that fully exploit the optimal substructure produced by these energy functions only compute the GMEC. This disproportionately favors the sequence of a single, static conformation and overlooks better binding sequences with multiple low-energy conformations. Provable, ensemble-based algorithms such as A* avoid this problem, but A* cannot guarantee better performance than exhaustive enumeration. We propose a novel, provable, dynamic programming algorithm called Branch-Width Minimization* (BWM*) to enumerate a gap-free ensemble of conformations in order of increasing energy. Given a branch-decomposition of branch-width w for an n-residue protein design with at most q discrete side-chain conformations per residue, BWM* returns the sparse GMEC in O([Formula: see text]) time and enumerates each additional conformation in merely O([Formula: see text]) time. We define a new measure, Total Effective Search Space (TESS), which can be computed efficiently a priori before BWM* or A* is run. We ran BWM* on 67 protein design problems and found that TESS discriminated between BWM*-efficient and A*-efficient cases with 100% accuracy. As predicted by TESS and validated experimentally, BWM* outperforms A* in 73% of the cases and computes the full ensemble or a close approximation faster than A*, enumerating each additional conformation in milliseconds. Unlike A*, the performance of BWM* can be predicted in polynomial time before running the algorithm, which gives protein designers the power to choose the most efficient algorithm for their particular design problem. |
| Author | Donald, Bruce R Jain, Swati Jou, Jonathan D Georgiev, Ivelin S |
| Author_xml | – sequence: 1 givenname: Jonathan D surname: Jou fullname: Jou, Jonathan D organization: 1 Department of Computer Science, Duke University , Durham, North Carolina – sequence: 2 givenname: Swati surname: Jain fullname: Jain, Swati organization: 3 Department of Computational Biology and Bioinformatics Program, Duke University , Durham, North Carolina – sequence: 3 givenname: Ivelin S surname: Georgiev fullname: Georgiev, Ivelin S organization: 1 Department of Computer Science, Duke University , Durham, North Carolina – sequence: 4 givenname: Bruce R surname: Donald fullname: Donald, Bruce R organization: 4 Department of Chemistry, Duke University , Durham, North Carolina |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/26744898$$D View this record in MEDLINE/PubMed |
| BookMark | eNpNkLtPwzAYxC1URFtgZEUeGZriR2LHbKUtD6k8JECMkZPYISi2Q5xUdOFvJ4UiMd2d9NPpu28MBtZZBcAJRlOMYnGemXRKEI6mCItwD4xwFPEgZowN_vkhGHv_jhCmDPEDMCSMh2Es4hH4uny9u4AzeO_WqprAx8atZVqpCVxar0zvglR6lcPFxkpTZlugaKQxpS3grCpcU7ZvBmrXwKdaNl7BWV037rM0si2d9dBpOHem7tqfLKttQatKCxfKl4U9AvtaVl4d7_QQvFwtn-c3werh-nY-WwVZGMdtwPOU0lSzTFCJiKaiH4wQo5prkktMeKY0QVmONcVMIEUZZ5gIGQuqdEolOQRnv739cR-d8m1iSp-pqpJWuc4nmIsoZhwR2qOnO7RLjcqTuunHNJvk72fkG5nIcHY |
| CitedBy_id | crossref_primary_10_1089_cmb_2017_0267 crossref_primary_10_1002_prot_25623 crossref_primary_10_1089_cmb_2016_0136 crossref_primary_10_3390_a14060168 crossref_primary_10_1089_cmb_2024_0669 crossref_primary_10_1371_journal_pcbi_1005346 crossref_primary_10_1016_j_cbpa_2018_07_022 crossref_primary_10_1002_jcc_25522 crossref_primary_10_1089_cmb_2015_0234 crossref_primary_10_1515_hsz_2021_0384 crossref_primary_10_1016_j_cels_2022_09_003 crossref_primary_10_1089_cmb_2022_0254 crossref_primary_10_1002_prot_26263 |
| ContentType | Journal Article |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1089/cmb.2015.0194 |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Biology Mathematics |
| EISSN | 1557-8666 |
| ExternalDocumentID | 26744898 |
| Genre | Journal Article Research Support, N.I.H., Extramural |
| GrantInformation_xml | – fundername: NIGMS NIH HHS grantid: R01 GM078031 – fundername: NIGMS NIH HHS grantid: R01 GM073919 – fundername: NIGMS NIH HHS grantid: R01 GM073930 |
| GroupedDBID | --- 0R~ 29K 34G 39C 4.4 53G 5GY ABBKN ABEFU ACGFO ADBBV AENEX AFOSN AI. ALMA_UNASSIGNED_HOLDINGS BAWUL BNQNF CAG CGR COF CS3 CUY CVF D-I DIK DU5 EBS ECM EIF EJD F5P IAO IER IGS IHR IM4 ITC MV1 NPM NQHIM O9- P2P R.V RIG RML RMSOB RNS TN5 TR2 UE5 VH1 7X8 SCNPE |
| ID | FETCH-LOGICAL-c488t-7db33bf6c93a02f390150063f7f2da127cef20cd1f31690e3676129a893efb3a2 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 16 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000377432500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1557-8666 |
| IngestDate | Wed Oct 01 14:46:23 EDT 2025 Thu Apr 03 07:03:48 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 6 |
| Keywords | sparse residue interaction graphs ensemble-based algorithms dynamic programming provable algorithms OSPREY branch-decomposition protein design |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c488t-7db33bf6c93a02f390150063f7f2da127cef20cd1f31690e3676129a893efb3a2 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| OpenAccessLink | https://www.ncbi.nlm.nih.gov/pmc/articles/4904165 |
| PMID | 26744898 |
| PQID | 1795867023 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_1795867023 pubmed_primary_26744898 |
| PublicationCentury | 2000 |
| PublicationDate | 2016-06-00 20160601 |
| PublicationDateYYYYMMDD | 2016-06-01 |
| PublicationDate_xml | – month: 06 year: 2016 text: 2016-06-00 |
| PublicationDecade | 2010 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | Journal of computational biology |
| PublicationTitleAlternate | J Comput Biol |
| PublicationYear | 2016 |
| SSID | ssj0013607 |
| Score | 2.2379644 |
| Snippet | Sparse energy functions that ignore long range interactions between residue pairs are frequently used by protein design algorithms to reduce computational... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 413 |
| SubjectTerms | Algorithms Amino Acid Sequence Computational Biology - methods Models, Molecular Protein Conformation Proteins - chemistry Software |
| Title | BWM: A Novel, Provable, Ensemble-based Dynamic Programming Algorithm for Sparse Approximations of Computational Protein Design |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/26744898 https://www.proquest.com/docview/1795867023 |
| Volume | 23 |
| WOSCitedRecordID | wos000377432500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1JS8QwFA6uoAf3fSGCR4OdpJO0XmTc8OAMAyrObUjTRAecdrTjoBd_u-81dbkIgpfSQzfSl7zvLfk-Qvaj0MZGG8fqTtdZGHLFIARLmDTgfKwMXJiU7PpXqtWKOp24XSXciqqt8nNNLBfqNDeYIz8Ew6lHUoGLOR48MVSNwupqJaExTiYFQBmcmKrzo4ogy-3S4DJhJQacXnFsBlF8aPoJ9nUhZyfqFf-GLksvczH_3-9bIHMVvqQNbxCLZMxmS2TaK06-LZHZ5hdNa7FM3k_umke0QVv5yD4e0PZzPsKdVAf0PCtsH84YOrmUnnnZerwAm7n64O5o4_EeXj986FOAvfR6AAGypQ1kKH_t-e2QBc0d9aoRVcYRH4DqmvSs7BtZIbcX5zenl6wSZGAG5vmQqTQRInHwH4UOuPPpEsA4Tjme6hpXxjoemLTmBFbfLLLBAZ7QgImsS4Tmq2QiyzO7TqirCy1CU5PcxqGMuA5MlEBoGGkhteJmg-x9DnMXDB6rGDqz-UvR_R7oDbLm_1V34Jk5ulwqCDfjaPMPd2-RGTAA6du-tsmkg-lud8iUGQ17xfNuaUlwbLWbH3B602c |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=BWM%3A+A+Novel%2C+Provable%2C+Ensemble-based+Dynamic+Programming+Algorithm+for+Sparse+Approximations+of+Computational+Protein+Design&rft.jtitle=Journal+of+computational+biology&rft.au=Jou%2C+Jonathan+D&rft.au=Jain%2C+Swati&rft.au=Georgiev%2C+Ivelin+S&rft.au=Donald%2C+Bruce+R&rft.date=2016-06-01&rft.eissn=1557-8666&rft.volume=23&rft.issue=6&rft.spage=413&rft_id=info:doi/10.1089%2Fcmb.2015.0194&rft_id=info%3Apmid%2F26744898&rft_id=info%3Apmid%2F26744898&rft.externalDocID=26744898 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1557-8666&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1557-8666&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1557-8666&client=summon |