BWM: A Novel, Provable, Ensemble-based Dynamic Programming Algorithm for Sparse Approximations of Computational Protein Design

Sparse energy functions that ignore long range interactions between residue pairs are frequently used by protein design algorithms to reduce computational cost. Current dynamic programming algorithms that fully exploit the optimal substructure produced by these energy functions only compute the GMEC...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Journal of computational biology Ročník 23; číslo 6; s. 413
Hlavní autori: Jou, Jonathan D, Jain, Swati, Georgiev, Ivelin S, Donald, Bruce R
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: United States 01.06.2016
Predmet:
ISSN:1557-8666, 1557-8666
On-line prístup:Zistit podrobnosti o prístupe
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Sparse energy functions that ignore long range interactions between residue pairs are frequently used by protein design algorithms to reduce computational cost. Current dynamic programming algorithms that fully exploit the optimal substructure produced by these energy functions only compute the GMEC. This disproportionately favors the sequence of a single, static conformation and overlooks better binding sequences with multiple low-energy conformations. Provable, ensemble-based algorithms such as A* avoid this problem, but A* cannot guarantee better performance than exhaustive enumeration. We propose a novel, provable, dynamic programming algorithm called Branch-Width Minimization* (BWM*) to enumerate a gap-free ensemble of conformations in order of increasing energy. Given a branch-decomposition of branch-width w for an n-residue protein design with at most q discrete side-chain conformations per residue, BWM* returns the sparse GMEC in O([Formula: see text]) time and enumerates each additional conformation in merely O([Formula: see text]) time. We define a new measure, Total Effective Search Space (TESS), which can be computed efficiently a priori before BWM* or A* is run. We ran BWM* on 67 protein design problems and found that TESS discriminated between BWM*-efficient and A*-efficient cases with 100% accuracy. As predicted by TESS and validated experimentally, BWM* outperforms A* in 73% of the cases and computes the full ensemble or a close approximation faster than A*, enumerating each additional conformation in milliseconds. Unlike A*, the performance of BWM* can be predicted in polynomial time before running the algorithm, which gives protein designers the power to choose the most efficient algorithm for their particular design problem.
AbstractList Sparse energy functions that ignore long range interactions between residue pairs are frequently used by protein design algorithms to reduce computational cost. Current dynamic programming algorithms that fully exploit the optimal substructure produced by these energy functions only compute the GMEC. This disproportionately favors the sequence of a single, static conformation and overlooks better binding sequences with multiple low-energy conformations. Provable, ensemble-based algorithms such as A* avoid this problem, but A* cannot guarantee better performance than exhaustive enumeration. We propose a novel, provable, dynamic programming algorithm called Branch-Width Minimization* (BWM*) to enumerate a gap-free ensemble of conformations in order of increasing energy. Given a branch-decomposition of branch-width w for an n-residue protein design with at most q discrete side-chain conformations per residue, BWM* returns the sparse GMEC in O([Formula: see text]) time and enumerates each additional conformation in merely O([Formula: see text]) time. We define a new measure, Total Effective Search Space (TESS), which can be computed efficiently a priori before BWM* or A* is run. We ran BWM* on 67 protein design problems and found that TESS discriminated between BWM*-efficient and A*-efficient cases with 100% accuracy. As predicted by TESS and validated experimentally, BWM* outperforms A* in 73% of the cases and computes the full ensemble or a close approximation faster than A*, enumerating each additional conformation in milliseconds. Unlike A*, the performance of BWM* can be predicted in polynomial time before running the algorithm, which gives protein designers the power to choose the most efficient algorithm for their particular design problem.
Sparse energy functions that ignore long range interactions between residue pairs are frequently used by protein design algorithms to reduce computational cost. Current dynamic programming algorithms that fully exploit the optimal substructure produced by these energy functions only compute the GMEC. This disproportionately favors the sequence of a single, static conformation and overlooks better binding sequences with multiple low-energy conformations. Provable, ensemble-based algorithms such as A* avoid this problem, but A* cannot guarantee better performance than exhaustive enumeration. We propose a novel, provable, dynamic programming algorithm called Branch-Width Minimization* (BWM*) to enumerate a gap-free ensemble of conformations in order of increasing energy. Given a branch-decomposition of branch-width w for an n-residue protein design with at most q discrete side-chain conformations per residue, BWM* returns the sparse GMEC in O([Formula: see text]) time and enumerates each additional conformation in merely O([Formula: see text]) time. We define a new measure, Total Effective Search Space (TESS), which can be computed efficiently a priori before BWM* or A* is run. We ran BWM* on 67 protein design problems and found that TESS discriminated between BWM*-efficient and A*-efficient cases with 100% accuracy. As predicted by TESS and validated experimentally, BWM* outperforms A* in 73% of the cases and computes the full ensemble or a close approximation faster than A*, enumerating each additional conformation in milliseconds. Unlike A*, the performance of BWM* can be predicted in polynomial time before running the algorithm, which gives protein designers the power to choose the most efficient algorithm for their particular design problem.Sparse energy functions that ignore long range interactions between residue pairs are frequently used by protein design algorithms to reduce computational cost. Current dynamic programming algorithms that fully exploit the optimal substructure produced by these energy functions only compute the GMEC. This disproportionately favors the sequence of a single, static conformation and overlooks better binding sequences with multiple low-energy conformations. Provable, ensemble-based algorithms such as A* avoid this problem, but A* cannot guarantee better performance than exhaustive enumeration. We propose a novel, provable, dynamic programming algorithm called Branch-Width Minimization* (BWM*) to enumerate a gap-free ensemble of conformations in order of increasing energy. Given a branch-decomposition of branch-width w for an n-residue protein design with at most q discrete side-chain conformations per residue, BWM* returns the sparse GMEC in O([Formula: see text]) time and enumerates each additional conformation in merely O([Formula: see text]) time. We define a new measure, Total Effective Search Space (TESS), which can be computed efficiently a priori before BWM* or A* is run. We ran BWM* on 67 protein design problems and found that TESS discriminated between BWM*-efficient and A*-efficient cases with 100% accuracy. As predicted by TESS and validated experimentally, BWM* outperforms A* in 73% of the cases and computes the full ensemble or a close approximation faster than A*, enumerating each additional conformation in milliseconds. Unlike A*, the performance of BWM* can be predicted in polynomial time before running the algorithm, which gives protein designers the power to choose the most efficient algorithm for their particular design problem.
Author Donald, Bruce R
Jain, Swati
Jou, Jonathan D
Georgiev, Ivelin S
Author_xml – sequence: 1
  givenname: Jonathan D
  surname: Jou
  fullname: Jou, Jonathan D
  organization: 1 Department of Computer Science, Duke University , Durham, North Carolina
– sequence: 2
  givenname: Swati
  surname: Jain
  fullname: Jain, Swati
  organization: 3 Department of Computational Biology and Bioinformatics Program, Duke University , Durham, North Carolina
– sequence: 3
  givenname: Ivelin S
  surname: Georgiev
  fullname: Georgiev, Ivelin S
  organization: 1 Department of Computer Science, Duke University , Durham, North Carolina
– sequence: 4
  givenname: Bruce R
  surname: Donald
  fullname: Donald, Bruce R
  organization: 4 Department of Chemistry, Duke University , Durham, North Carolina
BackLink https://www.ncbi.nlm.nih.gov/pubmed/26744898$$D View this record in MEDLINE/PubMed
BookMark eNpNkLtPwzAYxC1URFtgZEUeGZriR2LHbKUtD6k8JECMkZPYISi2Q5xUdOFvJ4UiMd2d9NPpu28MBtZZBcAJRlOMYnGemXRKEI6mCItwD4xwFPEgZowN_vkhGHv_jhCmDPEDMCSMh2Es4hH4uny9u4AzeO_WqprAx8atZVqpCVxar0zvglR6lcPFxkpTZlugaKQxpS3grCpcU7ZvBmrXwKdaNl7BWV037rM0si2d9dBpOHem7tqfLKttQatKCxfKl4U9AvtaVl4d7_QQvFwtn-c3werh-nY-WwVZGMdtwPOU0lSzTFCJiKaiH4wQo5prkktMeKY0QVmONcVMIEUZZ5gIGQuqdEolOQRnv739cR-d8m1iSp-pqpJWuc4nmIsoZhwR2qOnO7RLjcqTuunHNJvk72fkG5nIcHY
CitedBy_id crossref_primary_10_1089_cmb_2017_0267
crossref_primary_10_1002_prot_25623
crossref_primary_10_1089_cmb_2016_0136
crossref_primary_10_3390_a14060168
crossref_primary_10_1089_cmb_2024_0669
crossref_primary_10_1371_journal_pcbi_1005346
crossref_primary_10_1016_j_cbpa_2018_07_022
crossref_primary_10_1002_jcc_25522
crossref_primary_10_1089_cmb_2015_0234
crossref_primary_10_1515_hsz_2021_0384
crossref_primary_10_1016_j_cels_2022_09_003
crossref_primary_10_1089_cmb_2022_0254
crossref_primary_10_1002_prot_26263
ContentType Journal Article
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1089/cmb.2015.0194
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Biology
Mathematics
EISSN 1557-8666
ExternalDocumentID 26744898
Genre Journal Article
Research Support, N.I.H., Extramural
GrantInformation_xml – fundername: NIGMS NIH HHS
  grantid: R01 GM078031
– fundername: NIGMS NIH HHS
  grantid: R01 GM073919
– fundername: NIGMS NIH HHS
  grantid: R01 GM073930
GroupedDBID ---
0R~
29K
34G
39C
4.4
53G
5GY
ABBKN
ABEFU
ACGFO
ADBBV
AENEX
AFOSN
AI.
ALMA_UNASSIGNED_HOLDINGS
BAWUL
BNQNF
CAG
CGR
COF
CS3
CUY
CVF
D-I
DIK
DU5
EBS
ECM
EIF
EJD
F5P
IAO
IER
IGS
IHR
IM4
ITC
MV1
NPM
NQHIM
O9-
P2P
R.V
RIG
RML
RMSOB
RNS
TN5
TR2
UE5
VH1
7X8
SCNPE
ID FETCH-LOGICAL-c488t-7db33bf6c93a02f390150063f7f2da127cef20cd1f31690e3676129a893efb3a2
IEDL.DBID 7X8
ISICitedReferencesCount 16
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000377432500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1557-8666
IngestDate Wed Oct 01 14:46:23 EDT 2025
Thu Apr 03 07:03:48 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 6
Keywords sparse residue interaction graphs
ensemble-based algorithms
dynamic programming
provable algorithms
OSPREY
branch-decomposition
protein design
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c488t-7db33bf6c93a02f390150063f7f2da127cef20cd1f31690e3676129a893efb3a2
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
OpenAccessLink https://www.ncbi.nlm.nih.gov/pmc/articles/4904165
PMID 26744898
PQID 1795867023
PQPubID 23479
ParticipantIDs proquest_miscellaneous_1795867023
pubmed_primary_26744898
PublicationCentury 2000
PublicationDate 2016-06-00
20160601
PublicationDateYYYYMMDD 2016-06-01
PublicationDate_xml – month: 06
  year: 2016
  text: 2016-06-00
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Journal of computational biology
PublicationTitleAlternate J Comput Biol
PublicationYear 2016
SSID ssj0013607
Score 2.2379644
Snippet Sparse energy functions that ignore long range interactions between residue pairs are frequently used by protein design algorithms to reduce computational...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 413
SubjectTerms Algorithms
Amino Acid Sequence
Computational Biology - methods
Models, Molecular
Protein Conformation
Proteins - chemistry
Software
Title BWM: A Novel, Provable, Ensemble-based Dynamic Programming Algorithm for Sparse Approximations of Computational Protein Design
URI https://www.ncbi.nlm.nih.gov/pubmed/26744898
https://www.proquest.com/docview/1795867023
Volume 23
WOSCitedRecordID wos000377432500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1JS8QwFA6uoAf3fSGCR4OdpJO0XmTc8OAMAyrObUjTRAecdrTjoBd_u-81dbkIgpfSQzfSl7zvLfk-Qvaj0MZGG8fqTtdZGHLFIARLmDTgfKwMXJiU7PpXqtWKOp24XSXciqqt8nNNLBfqNDeYIz8Ew6lHUoGLOR48MVSNwupqJaExTiYFQBmcmKrzo4ogy-3S4DJhJQacXnFsBlF8aPoJ9nUhZyfqFf-GLksvczH_3-9bIHMVvqQNbxCLZMxmS2TaK06-LZHZ5hdNa7FM3k_umke0QVv5yD4e0PZzPsKdVAf0PCtsH84YOrmUnnnZerwAm7n64O5o4_EeXj986FOAvfR6AAGypQ1kKH_t-e2QBc0d9aoRVcYRH4DqmvSs7BtZIbcX5zenl6wSZGAG5vmQqTQRInHwH4UOuPPpEsA4Tjme6hpXxjoemLTmBFbfLLLBAZ7QgImsS4Tmq2QiyzO7TqirCy1CU5PcxqGMuA5MlEBoGGkhteJmg-x9DnMXDB6rGDqz-UvR_R7oDbLm_1V34Jk5ulwqCDfjaPMPd2-RGTAA6du-tsmkg-lud8iUGQ17xfNuaUlwbLWbH3B602c
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=BWM%3A+A+Novel%2C+Provable%2C+Ensemble-based+Dynamic+Programming+Algorithm+for+Sparse+Approximations+of+Computational+Protein+Design&rft.jtitle=Journal+of+computational+biology&rft.au=Jou%2C+Jonathan+D&rft.au=Jain%2C+Swati&rft.au=Georgiev%2C+Ivelin+S&rft.au=Donald%2C+Bruce+R&rft.date=2016-06-01&rft.eissn=1557-8666&rft.volume=23&rft.issue=6&rft.spage=413&rft_id=info:doi/10.1089%2Fcmb.2015.0194&rft_id=info%3Apmid%2F26744898&rft_id=info%3Apmid%2F26744898&rft.externalDocID=26744898
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1557-8666&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1557-8666&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1557-8666&client=summon