BBK (Branch and Bound Over K): A Provable and Efficient Ensemble-Based Protein Design Algorithm to Optimize Stability and Binding Affinity Over Large Sequence Spaces

Computational protein design (CPD) algorithms that compute binding affinity, K , search for sequences with an energetically favorable free energy of binding. Recent work shows that three principles improve the biological accuracy of CPD: ensemble-based design, continuous flexibility of backbone and...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of computational biology Vol. 25; no. 7; p. 726
Main Authors:	Ojewole, Adegoke A, Jou, Jonathan D, Fowler, Vance G, Donald, Bruce R
Format:	Journal Article
Language:	English
Published:	United States 01.07.2018
Subjects:	Algorithms Amino Acid Sequence - genetics Computational Biology - methods Entropy Humans Models, Molecular Protein Conformation Proteins - chemistry Software molecular ensembles sublinear algorithms structural biology OSPREY predicting binding affinity protein design
ISSN:	1557-8666, 1557-8666
Online Access:	Get more information
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Computational protein design (CPD) algorithms that compute binding affinity, K , search for sequences with an energetically favorable free energy of binding. Recent work shows that three principles improve the biological accuracy of CPD: ensemble-based design, continuous flexibility of backbone and side-chain conformations, and provable guarantees of accuracy with respect to the input. However, previous methods that use all three design principles are single-sequence (SS) algorithms, which are very costly: linear in the number of sequences and thus exponential in the number of simultaneously mutable residues. To address this computational challenge, we introduce BBK, a new CPD algorithm whose key innovation is the multisequence (MS) bound: BBK efficiently computes a single provable upper bound to approximate K for a combinatorial number of sequences, and avoids SS computation for all provably suboptimal sequences. Thus, to our knowledge, BBK* is the first provable, ensemble-based CPD algorithm to run in time sublinear in the number of sequences. Computational experiments on 204 protein design problems show that BBK* finds the tightest binding sequences while approximating K for up to 10 -fold fewer sequences than the previous state-of-the-art algorithms, which require exhaustive enumeration of sequences. Furthermore, for 51 protein-ligand design problems, BBK* provably approximates K up to 1982-fold faster than the previous state-of-the-art iMinDEE/[Formula: see text]/[Formula: see text] algorithm. Therefore, BBK* not only accelerates protein designs that are possible with previous provable algorithms, but also efficiently performs designs that are too large for previous methods.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1557-8666 1557-8666
DOI:	10.1089/cmb.2017.0267