Model-based analysis of sample index hopping reveals its widespread artifacts in multiplexed single-cell RNA-sequencing

Index hopping is the main cause of incorrect sample assignment of sequencing reads in multiplexed pooled libraries. We introduce a statistical model for estimating the sample index-hopping rate in multiplexed droplet-based single-cell RNA-seq data and for probabilistic inference of the true sample o...

Full description

Saved in:
Bibliographic Details
Published in:Nature communications Vol. 11; no. 1; pp. 2704 - 8
Main Authors: Farouni, Rick, Djambazian, Haig, Ferri, Lorenzo E., Ragoussis, Jiannis, Najafabadi, Hamed S.
Format: Journal Article
Language:English
Published: London Nature Publishing Group UK 01.06.2020
Nature Publishing Group
Nature Portfolio
Subjects:
ISSN:2041-1723, 2041-1723
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Index hopping is the main cause of incorrect sample assignment of sequencing reads in multiplexed pooled libraries. We introduce a statistical model for estimating the sample index-hopping rate in multiplexed droplet-based single-cell RNA-seq data and for probabilistic inference of the true sample of origin of hopped reads. We analyze several datasets and estimate the sample index hopping probability to range between 0.003–0.009, a small number that counter-intuitively gives rise to a large fraction of phantom molecules — the fraction of phantom molecules exceeds 8% in more than 25% of samples and reaches as high as 85% in low-complexity samples. Phantom molecules lead to widespread complications in downstream analyses, including transcriptome mixing across cells, emergence of phantom copies of cells from other samples, and misclassification of empty droplets as cells. We demonstrate that our approach can correct for these artifacts by accurately purging the majority of phantom molecules from the data. Sample index hopping results in various artefacts in multiplexed scRNA-seq experiments. Here, the authors introduce a statistical model to estimate sample index hopping rate in droplet-based scRNA-seq data and show that artifacts can be corrected by purging phantom molecules from the data.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2041-1723
2041-1723
DOI:10.1038/s41467-020-16522-z