Refined variant calling pipeline on RNA-seq data of breast cancer cell lines without matched-normal samples

Saved in:
Bibliographic Details
Title: Refined variant calling pipeline on RNA-seq data of breast cancer cell lines without matched-normal samples
Authors: Sonja Eberth, Julia Koblitz, Laura Steenpaß, Claudia Pommerenke
Source: BMC Research Notes, Vol 18, Iss 1, Pp 1-7 (2025)
Publisher Information: BMC, 2025.
Publication Year: 2025
Collection: LCC:Medicine
LCC:Biology (General)
LCC:Science (General)
Subject Terms: Variant calling, RNA-seq, Breast cancer, Cancer cell lines, COSMIC, DSMZCellDive, Medicine, Biology (General), QH301-705.5, Science (General), Q1-390
Description: Abstract Objective RNA-seq delivers valuable insights both to transcriptional patterns and mutational landscapes for transcribed genes. However, as tumour cell lines frequently lack their matched-normal counterpart, variant calling without the paired normal sample is still challenging. In order to exclude variants of common genetic variation without a matched-normal control, filtering strategies need to be developed to identify tumour relevant variants in cell lines. Results Here, variants of 29 breast cancer cell lines were called on RNA-seq data via HaplotypeCaller. Low read depth sites, RNA-edit sites, and low complexity regions in coding regions were excluded. Common variants were filtered using 1000 genomes, gnomAD, and dbSNP data. Starting from hundred thousands of single nucleotide variants and small insertions and deletions, about thousand variants remained after filtering for each sample. Extracted variants were validated against the Catalogue of Somatic Mutations in Cancer (COSMIC) for 10 cell lines included in both data sets. Approximately half of the COSMIC variants were successfully called. Importantly, missing variants could mainly be attributed to sites with low read depth. Moreover, filtered variants also included all 10 cancer gene census COSMIC variants, a condensed hallmark variant set.
Document Type: article
File Description: electronic resource
Language: English
ISSN: 1756-0500
Relation: https://doaj.org/toc/1756-0500
DOI: 10.1186/s13104-025-07140-3
Access URL: https://doaj.org/article/c47bc3df040b49aa8e03a445be5bb583
Accession Number: edsdoj.47bc3df040b49aa8e03a445be5bb583
Database: Directory of Open Access Journals
Description
Abstract:Abstract Objective RNA-seq delivers valuable insights both to transcriptional patterns and mutational landscapes for transcribed genes. However, as tumour cell lines frequently lack their matched-normal counterpart, variant calling without the paired normal sample is still challenging. In order to exclude variants of common genetic variation without a matched-normal control, filtering strategies need to be developed to identify tumour relevant variants in cell lines. Results Here, variants of 29 breast cancer cell lines were called on RNA-seq data via HaplotypeCaller. Low read depth sites, RNA-edit sites, and low complexity regions in coding regions were excluded. Common variants were filtered using 1000 genomes, gnomAD, and dbSNP data. Starting from hundred thousands of single nucleotide variants and small insertions and deletions, about thousand variants remained after filtering for each sample. Extracted variants were validated against the Catalogue of Somatic Mutations in Cancer (COSMIC) for 10 cell lines included in both data sets. Approximately half of the COSMIC variants were successfully called. Importantly, missing variants could mainly be attributed to sites with low read depth. Moreover, filtered variants also included all 10 cancer gene census COSMIC variants, a condensed hallmark variant set.
ISSN:17560500
DOI:10.1186/s13104-025-07140-3