Critical assessment of missense variant effect predictors on disease-relevant variant data

Saved in:
Bibliographic Details
Title: Critical assessment of missense variant effect predictors on disease-relevant variant data
Authors: Ruchir Rastogi, Ryan Chung, Sindy Li, Chang Li, Kyoungyeul Lee, Junwoo Woo, Dong-Wook Kim, Changwon Keum, Giulia Babbi, Pier Luigi Martelli, Castrense Savojardo, Rita Casadio, Kirsley Chennen, Thomas Weber, Olivier Poch, François Ancien, Gabriel Cia, Fabrizio Pucci, Daniele Raimondi, Wim Vranken, Marianne Rooman, Céline Marquet, Tobias Olenyi, Burkhard Rost, Gaia Andreoletti, Akash Kamandula, Yisu Peng, Constantina Bakolitsa, Matthew Mort, David N. Cooper, Timothy Bergquist, Vikas Pejaver, Xiaoming Liu, Predrag Radivojac, Steven E. Brenner, Nilah M. Ioannidis
Contributors: Interuniversity Institute of Bioinformatics in Brussels, Department of Bio-engineering Sciences, Faculty of Sciences and Bioengineering Sciences, Structural Biology Brussels, Chemistry, Basic (bio-) Medical Sciences, Informatics and Applied Informatics, Federated labs AI and Robotics, IR Academic Unit
Source: Hum Genet
Publisher Information: Springer Science and Business Media LLC, 2024.
Publication Year: 2024
Subject Terms: Gene Frequency, Mutation, Missense/genetics, Databases, Genetic, Mutation, Missense, Humans, Computational Biology, Genetic Predisposition to Disease, disease-relevant variant data, missense variant effect predictors, CAGI, Computational Biology/methods, ddc, Original Investigation, 3. Good health
Description: Regular, systematic, and independent assessments of computational tools that are used to predict the pathogenicity of missense variants are necessary to evaluate their clinical and research utility and guide future improvements. The Critical Assessment of Genome Interpretation (CAGI) conducts the ongoing Annotate-All-Missense (Missense Marathon) challenge, in which missense variant effect predictors (also called variant impact predictors) are evaluated on missense variants added to disease-relevant databases following the prediction submission deadline. Here we assess predictors submitted to the CAGI 6 Annotate-All-Missense challenge, predictors commonly used in clinical genetics, and recently developed deep learning methods. We examine performance across a range of settings relevant for clinical and research applications, focusing on different subsets of the evaluation data as well as high-specificity and high-sensitivity regimes. Our evaluations reveal notable advances in current methods relative to older, well-cited tools in the field. While meta-predictors tend to outperform their constituent individual predictors, several newer individual predictors perform comparably to commonly used meta-predictors. Predictor performance varies between high-specificity and high-sensitivity regimes, highlighting that different methods may be optimal for different use cases. We also characterize two potential sources of bias. Predictors that incorporate allele frequency as a predictive feature tend to have reduced performance when distinguishing pathogenic variants from very rare benign variants, and predictors trained on pathogenicity labels from curated variant databases often inherit gene-level label imbalances. Our findings help illuminate the clinical and research utility of modern missense variant effect predictors and identify potential areas for future development.
Document Type: Article
Other literature type
File Description: application/pdf
Language: English
ISSN: 1432-1203
0340-6717
DOI: 10.1007/s00439-025-02732-2
DOI: 10.1101/2024.06.06.597828
Access URL: https://pubmed.ncbi.nlm.nih.gov/40113603
https://pubmed.ncbi.nlm.nih.gov/38895200
https://biblio.vub.ac.be/vubir/critical-assessment-of-missense-variant-effect-predictors-on-diseaserelevant-variant-data(633006a4-aa1b-48c9-b84f-8c9838272f93).html
https://hdl.handle.net/11585/1016063
https://doi.org/10.1007/s00439-025-02732-2
https://mediatum.ub.tum.de/1792170
Rights: CC BY
Accession Number: edsair.doi.dedup.....f51d440ba93916b8b3add059be87bbd8
Database: OpenAIRE
Description
Abstract:Regular, systematic, and independent assessments of computational tools that are used to predict the pathogenicity of missense variants are necessary to evaluate their clinical and research utility and guide future improvements. The Critical Assessment of Genome Interpretation (CAGI) conducts the ongoing Annotate-All-Missense (Missense Marathon) challenge, in which missense variant effect predictors (also called variant impact predictors) are evaluated on missense variants added to disease-relevant databases following the prediction submission deadline. Here we assess predictors submitted to the CAGI 6 Annotate-All-Missense challenge, predictors commonly used in clinical genetics, and recently developed deep learning methods. We examine performance across a range of settings relevant for clinical and research applications, focusing on different subsets of the evaluation data as well as high-specificity and high-sensitivity regimes. Our evaluations reveal notable advances in current methods relative to older, well-cited tools in the field. While meta-predictors tend to outperform their constituent individual predictors, several newer individual predictors perform comparably to commonly used meta-predictors. Predictor performance varies between high-specificity and high-sensitivity regimes, highlighting that different methods may be optimal for different use cases. We also characterize two potential sources of bias. Predictors that incorporate allele frequency as a predictive feature tend to have reduced performance when distinguishing pathogenic variants from very rare benign variants, and predictors trained on pathogenicity labels from curated variant databases often inherit gene-level label imbalances. Our findings help illuminate the clinical and research utility of modern missense variant effect predictors and identify potential areas for future development.
ISSN:14321203
03406717
DOI:10.1007/s00439-025-02732-2