LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing

Background Long-read RNA-Seq techniques can generate reads that encompass a large proportion or the entire mRNA/cDNA molecules, so they are expected to address inherited limitations of short-read RNA-Seq techniques that typically generate < 150 bp reads. However, there is a general lack of softwa...

Full description

Saved in:
Bibliographic Details
Published in:BMC genomics Vol. 21; no. Suppl 11; pp. 793 - 12
Main Authors: Liu, Qian, Hu, Yu, Stucky, Andres, Fang, Li, Zhong, Jiang F., Wang, Kai
Format: Journal Article
Language:English
Published: London BioMed Central 29.12.2020
BioMed Central Ltd
Springer Nature B.V
BMC
Subjects:
ISSN:1471-2164, 1471-2164
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background Long-read RNA-Seq techniques can generate reads that encompass a large proportion or the entire mRNA/cDNA molecules, so they are expected to address inherited limitations of short-read RNA-Seq techniques that typically generate < 150 bp reads. However, there is a general lack of software tools for gene fusion detection from long-read RNA-seq data, which takes into account the high basecalling error rates and the presence of alignment errors. Results In this study, we developed a fast computational tool, LongGF, to efficiently detect candidate gene fusions from long-read RNA-seq data, including cDNA sequencing data and direct mRNA sequencing data. We evaluated LongGF on tens of simulated long-read RNA-seq datasets, and demonstrated its superior performance in gene fusion detection. We also tested LongGF on a Nanopore direct mRNA sequencing dataset and a PacBio sequencing dataset generated on a mixture of 10 cancer cell lines, and found that LongGF achieved better performance to detect known gene fusions over existing computational tools. Furthermore, we tested LongGF on a Nanopore cDNA sequencing dataset on acute myeloid leukemia, and pinpointed the exact location of a translocation (previously known in cytogenetic resolution) in base resolution, which was further validated by Sanger sequencing. Conclusions In summary, LongGF will greatly facilitate the discovery of candidate gene fusion events from long-read RNA-Seq data, especially in cancer samples. LongGF is implemented in C++ and is available at https://github.com/WGLab/LongGF .
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1471-2164
1471-2164
DOI:10.1186/s12864-020-07207-4