A systematic comparison of pharmacogene star allele calling bioinformatics algorithms: a focus on CYP2D6 genotyping

Genetic variation in genes encoding cytochrome P450 enzymes has important clinical implications for drug metabolism. Bioinformatics algorithms for genotyping these highly polymorphic genes using high-throughput sequence data and automating phenotype prediction have recently been developed. The CYP2D...

Full description

Saved in:
Bibliographic Details
Published in:Npj genomic medicine Vol. 5; no. 1; p. 30
Main Authors: Twesigomwe, David, Wright, Galen E. B., Drögemöller, Britt I., da Rocha, Jorge, Lombard, Zané, Hazelhurst, Scott
Format: Journal Article
Language:English
Published: London Nature Publishing Group UK 03.08.2020
Nature Publishing Group
Subjects:
ISSN:2056-7944, 2056-7944
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Genetic variation in genes encoding cytochrome P450 enzymes has important clinical implications for drug metabolism. Bioinformatics algorithms for genotyping these highly polymorphic genes using high-throughput sequence data and automating phenotype prediction have recently been developed. The CYP2D6 gene is often used as a model during the validation of these algorithms due to its clinical importance, high polymorphism, and structural variations. However, the validation process is often limited to common star alleles due to scarcity of reference datasets. In addition, there has been no comprehensive benchmark of these algorithms to date. We performed a systematic comparison of three star allele calling algorithms using 4618 simulations as well as 75 whole-genome sequence samples from the GeT-RM project. Overall, we found that Aldy and Astrolabe are better suited to call both common and rare diplotypes compared to Stargazer, which is affected by population structure. Aldy was the best performing algorithm in calling CYP2D6 structural variants followed by Stargazer, whereas Astrolabe had limitations especially in calling hybrid rearrangements. We found that ensemble genotyping, characterised by taking a consensus of genotypes called by all three algorithms, has higher haplotype concordance but it is prone to ambiguities whenever complete discrepancies between the tools arise. Further, we evaluated the effects of sequencing coverage and indel misalignment on genotyping accuracy. Our account of the strengths and limitations of these algorithms is extremely important to clinicians and researchers in the pharmacogenomics and precision medicine communities looking to haplotype CYP2D6 and other pharmacogenes using high-throughput sequencing data.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2056-7944
2056-7944
DOI:10.1038/s41525-020-0135-2