Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data
The advent of single-cell RNA sequencing (scRNA-seq) technologies has revolutionized transcriptomic studies. However, large-scale integrative analysis of scRNA-seq data remains a challenge largely due to unwanted batch effects and the limited transferabilty, interpretability, and scalability of the...
Saved in:
| Published in: | Nature communications Vol. 12; no. 1; pp. 5261 - 15 |
|---|---|
| Main Authors: | , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
London
Nature Publishing Group UK
06.09.2021
Nature Publishing Group Nature Portfolio |
| Subjects: | |
| ISSN: | 2041-1723, 2041-1723 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The advent of single-cell RNA sequencing (scRNA-seq) technologies has revolutionized transcriptomic studies. However, large-scale integrative analysis of scRNA-seq data remains a challenge largely due to unwanted batch effects and the limited transferabilty, interpretability, and scalability of the existing computational methods. We present single-cell Embedded Topic Model (scETM). Our key contribution is the utilization of a transferable neural-network-based encoder while having an interpretable linear decoder via a matrix tri-factorization. In particular, scETM simultaneously learns an encoder network to infer cell type mixture and a set of highly interpretable gene embeddings, topic embeddings, and batch-effect linear intercepts from multiple scRNA-seq datasets. scETM is scalable to over 10
6
cells and confers remarkable cross-tissue and cross-species zero-shot transfer-learning performance. Using gene set enrichment analysis, we find that scETM-learned topics are enriched in biologically meaningful and disease-related pathways. Lastly, scETM enables the incorporation of known gene sets into the gene embeddings, thereby directly learning the associations between pathways and topics via the topic embeddings.
Computational single-cell RNA-seq analyses often face challenges in scalability, model interpretability, and confounders. Here, we show a new model to address these challenges by learning meaningful embeddings from the data that simultaneously refine gene signatures and cell functions in diverse conditions. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ISSN: | 2041-1723 2041-1723 |
| DOI: | 10.1038/s41467-021-25534-2 |