Improving sequence-based modeling of protein families using secondary-structure quality assessment

Abstract Motivation Modeling of protein family sequence distribution from homologous sequence data recently received considerable attention, in particular for structure and function predictions, as well as for protein design. In particular, direct coupling analysis, a method to infer effective pairw...

Full description

Saved in:
Bibliographic Details
Published in:Bioinformatics (Oxford, England) Vol. 37; no. 22; pp. 4083 - 4090
Main Authors: Malbranke, Cyril, Bikard, David, Cocco, Simona, Monasson, Rémi
Format: Journal Article
Language:English
Published: England Oxford University Press 18.11.2021
Oxford Publishing Limited (England)
Subjects:
ISSN:1367-4803, 1367-4811, 1367-4811
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Motivation Modeling of protein family sequence distribution from homologous sequence data recently received considerable attention, in particular for structure and function predictions, as well as for protein design. In particular, direct coupling analysis, a method to infer effective pairwise interactions between residues, was shown to capture important structural constraints and to successfully generate functional protein sequences. Building on this and other graphical models, we introduce a new framework to assess the quality of the secondary structures of the generated sequences with respect to reference structures for the family. Results We introduce two scoring functions characterizing the likeliness of the secondary structure of a protein sequence to match a reference structure, called Dot Product and Pattern Matching. We test these scores on published experimental protein mutagenesis and design dataset, and show improvement in the detection of nonfunctional sequences. We also show that use of these scores help rejecting nonfunctional sequences generated by graphical models (Restricted Boltzmann Machines) learned from homologous sequence alignments. Availability and implementation Data and code available at https://github.com/CyrilMa/ssqa Supplementary information Supplementary data are available at Bioinformatics online.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1367-4803
1367-4811
1367-4811
DOI:10.1093/bioinformatics/btab442