Bubble: a fast single-cell RNA-seq imputation using an autoencoder constrained by bulk RNA-seq data

Abstract Single-cell RNA-sequencing technology (scRNA-seq) brings research to single-cell resolution. However, a major drawback of scRNA-seq is large sparsity, i.e. expressed genes with no reads due to technical noise or limited sequence depth during the scRNA-seq protocol. This phenomenon is also c...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Briefings in bioinformatics Jg. 24; H. 1
Hauptverfasser: Chen, Siqi, Yan, Xuhua, Zheng, Ruiqing, Li, Min
Format: Journal Article
Sprache:Englisch
Veröffentlicht: England Oxford University Press 19.01.2023
Oxford Publishing Limited (England)
Schlagworte:
ISSN:1467-5463, 1477-4054, 1477-4054
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Abstract Single-cell RNA-sequencing technology (scRNA-seq) brings research to single-cell resolution. However, a major drawback of scRNA-seq is large sparsity, i.e. expressed genes with no reads due to technical noise or limited sequence depth during the scRNA-seq protocol. This phenomenon is also called ‘dropout’ events, which likely affect downstream analyses such as differential expression analysis, the clustering and visualization of cell subpopulations, cellular trajectory inference, etc. Therefore, there is a need to develop a method to identify and impute these dropout events. We propose Bubble, which first identifies dropout events from all zeros based on expression rate and coefficient of variation of genes within cell subpopulation, and then leverages an autoencoder constrained by bulk RNA-seq data to only impute those values. Unlike other deep learning-based imputation methods, Bubble fuses the matched bulk RNA-seq data as a constraint to reduce the introduction of false positive signals. Using simulated and several real scRNA-seq datasets, we demonstrate that Bubble enhances the recovery of missing values, gene-to-gene and cell-to-cell correlations, and reduces the introduction of false positive signals. Regarding some crucial downstream analyses of scRNA-seq data, Bubble facilitates the identification of differentially expressed genes, improves the performance of clustering and visualization, and aids the construction of cellular trajectory. More importantly, Bubble provides fast and scalable imputation with minimal memory usage.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1467-5463
1477-4054
1477-4054
DOI:10.1093/bib/bbac580