Computational design of novel Cas9 PAM-interacting domains using evolution-based modelling and structural quality assessment

We present here an approach to protein design that combines (i) scarce functional information such as experimental data (ii) evolutionary information learned from a natural sequence variants and (iii) physics-grounded modeling. Using a Restricted Boltzmann Machine (RBM), we learn a sequence model of...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:PLoS computational biology Ročník 19; číslo 11; s. e1011621
Hlavní autori: Malbranke, Cyril, Rostain, William, Depardieu, Florence, Cocco, Simona, Monasson, Rémi, Bikard, David
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: United States Public Library of Science 01.11.2023
PLOS
Public Library of Science (PLoS)
Predmet:
ISSN:1553-7358, 1553-734X, 1553-7358
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:We present here an approach to protein design that combines (i) scarce functional information such as experimental data (ii) evolutionary information learned from a natural sequence variants and (iii) physics-grounded modeling. Using a Restricted Boltzmann Machine (RBM), we learn a sequence model of a protein family. We use semi-supervision to leverage available functional information during the RBM training. We then propose a strategy to explore the protein representation space that can be informed by external models such as an empirical force-field method (FoldX). Our approach is applied to a domain of the Cas9 protein responsible for recognition of a short DNA motif. We experimentally assess the functionality of 71 variants generated to explore a range of RBM and FoldX energies. Sequences with as many as 50 differences (20% of the protein domain) to the wild-type retained functionality. Overall, 21/71 sequences designed with our method were functional. Interestingly, 6/71 sequences showed an improved activity in comparison with the original wild-type protein sequence. These results demonstrate the interest in further exploring the synergies between machine-learning of protein sequence representations and physics grounded modeling strategies informed by structural information.
Bibliografia:new_version
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
PMCID: PMC10729993
The authors have declared that no competing interests exist.
ISSN:1553-7358
1553-734X
1553-7358
DOI:10.1371/journal.pcbi.1011621