Critical assessment of missense variant effect predictors on disease-relevant variant data
Saved in:
| Title: | Critical assessment of missense variant effect predictors on disease-relevant variant data |
|---|---|
| Authors: | Ruchir Rastogi, Ryan Chung, Sindy Li, Chang Li, Kyoungyeul Lee, Junwoo Woo, Dong-Wook Kim, Changwon Keum, Giulia Babbi, Pier Luigi Martelli, Castrense Savojardo, Rita Casadio, Kirsley Chennen, Thomas Weber, Olivier Poch, François Ancien, Gabriel Cia, Fabrizio Pucci, Daniele Raimondi, Wim Vranken, Marianne Rooman, Céline Marquet, Tobias Olenyi, Burkhard Rost, Gaia Andreoletti, Akash Kamandula, Yisu Peng, Constantina Bakolitsa, Matthew Mort, David N. Cooper, Timothy Bergquist, Vikas Pejaver, Xiaoming Liu, Predrag Radivojac, Steven E. Brenner, Nilah M. Ioannidis |
| Contributors: | Interuniversity Institute of Bioinformatics in Brussels, Department of Bio-engineering Sciences, Faculty of Sciences and Bioengineering Sciences, Structural Biology Brussels, Chemistry, Basic (bio-) Medical Sciences, Informatics and Applied Informatics, Federated labs AI and Robotics, IR Academic Unit |
| Source: | Hum Genet |
| Publisher Information: | Springer Science and Business Media LLC, 2024. |
| Publication Year: | 2024 |
| Subject Terms: | Gene Frequency, Mutation, Missense/genetics, Databases, Genetic, Mutation, Missense, Humans, Computational Biology, Genetic Predisposition to Disease, disease-relevant variant data, missense variant effect predictors, CAGI, Computational Biology/methods, ddc, Original Investigation, 3. Good health |
| Description: | Regular, systematic, and independent assessments of computational tools that are used to predict the pathogenicity of missense variants are necessary to evaluate their clinical and research utility and guide future improvements. The Critical Assessment of Genome Interpretation (CAGI) conducts the ongoing Annotate-All-Missense (Missense Marathon) challenge, in which missense variant effect predictors (also called variant impact predictors) are evaluated on missense variants added to disease-relevant databases following the prediction submission deadline. Here we assess predictors submitted to the CAGI 6 Annotate-All-Missense challenge, predictors commonly used in clinical genetics, and recently developed deep learning methods. We examine performance across a range of settings relevant for clinical and research applications, focusing on different subsets of the evaluation data as well as high-specificity and high-sensitivity regimes. Our evaluations reveal notable advances in current methods relative to older, well-cited tools in the field. While meta-predictors tend to outperform their constituent individual predictors, several newer individual predictors perform comparably to commonly used meta-predictors. Predictor performance varies between high-specificity and high-sensitivity regimes, highlighting that different methods may be optimal for different use cases. We also characterize two potential sources of bias. Predictors that incorporate allele frequency as a predictive feature tend to have reduced performance when distinguishing pathogenic variants from very rare benign variants, and predictors trained on pathogenicity labels from curated variant databases often inherit gene-level label imbalances. Our findings help illuminate the clinical and research utility of modern missense variant effect predictors and identify potential areas for future development. |
| Document Type: | Article Other literature type |
| File Description: | application/pdf |
| Language: | English |
| ISSN: | 1432-1203 0340-6717 |
| DOI: | 10.1007/s00439-025-02732-2 |
| DOI: | 10.1101/2024.06.06.597828 |
| Access URL: | https://pubmed.ncbi.nlm.nih.gov/40113603 https://pubmed.ncbi.nlm.nih.gov/38895200 https://biblio.vub.ac.be/vubir/critical-assessment-of-missense-variant-effect-predictors-on-diseaserelevant-variant-data(633006a4-aa1b-48c9-b84f-8c9838272f93).html https://hdl.handle.net/11585/1016063 https://doi.org/10.1007/s00439-025-02732-2 https://mediatum.ub.tum.de/1792170 |
| Rights: | CC BY |
| Accession Number: | edsair.doi.dedup.....f51d440ba93916b8b3add059be87bbd8 |
| Database: | OpenAIRE |
| Abstract: | Regular, systematic, and independent assessments of computational tools that are used to predict the pathogenicity of missense variants are necessary to evaluate their clinical and research utility and guide future improvements. The Critical Assessment of Genome Interpretation (CAGI) conducts the ongoing Annotate-All-Missense (Missense Marathon) challenge, in which missense variant effect predictors (also called variant impact predictors) are evaluated on missense variants added to disease-relevant databases following the prediction submission deadline. Here we assess predictors submitted to the CAGI 6 Annotate-All-Missense challenge, predictors commonly used in clinical genetics, and recently developed deep learning methods. We examine performance across a range of settings relevant for clinical and research applications, focusing on different subsets of the evaluation data as well as high-specificity and high-sensitivity regimes. Our evaluations reveal notable advances in current methods relative to older, well-cited tools in the field. While meta-predictors tend to outperform their constituent individual predictors, several newer individual predictors perform comparably to commonly used meta-predictors. Predictor performance varies between high-specificity and high-sensitivity regimes, highlighting that different methods may be optimal for different use cases. We also characterize two potential sources of bias. Predictors that incorporate allele frequency as a predictive feature tend to have reduced performance when distinguishing pathogenic variants from very rare benign variants, and predictors trained on pathogenicity labels from curated variant databases often inherit gene-level label imbalances. Our findings help illuminate the clinical and research utility of modern missense variant effect predictors and identify potential areas for future development. |
|---|---|
| ISSN: | 14321203 03406717 |
| DOI: | 10.1007/s00439-025-02732-2 |
Full Text Finder
Nájsť tento článok vo Web of Science