Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data

The advent of single-cell RNA sequencing (scRNA-seq) technologies has revolutionized transcriptomic studies. However, large-scale integrative analysis of scRNA-seq data remains a challenge largely due to unwanted batch effects and the limited transferabilty, interpretability, and scalability of the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Nature communications Jg. 12; H. 1; S. 5261 - 15
Hauptverfasser:	Zhao, Yifan, Cai, Huiyu, Zhang, Zuobai, Tang, Jian, Li, Yue
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	London Nature Publishing Group UK 06.09.2021 Nature Publishing Group Nature Portfolio
Schlagworte:	631/114/1305 631/114/2404 Alzheimer Disease - genetics Alzheimer Disease - pathology Animals Clustering Coders Computer applications Computer science Data analysis Databases, Genetic Datasets Depressive Disorder, Major - genetics Depressive Disorder, Major - pathology Gene expression Gene Expression Profiling - methods Gene Expression Profiling - statistics & numerical data Gene sequencing Gene set enrichment analysis Genes, Mitochondrial Humanities and Social Sciences Humans Learning Mathematical analysis Mice Models, Genetic multidisciplinary Neural networks Neural Networks, Computer Retina - cytology Retina - physiology Ribonucleic acid RNA RNA, Small Cytoplasmic Science Science (multidisciplinary) Sequence Analysis, RNA - methods Sequence Analysis, RNA - statistics & numerical data Single-Cell Analysis - methods Transcriptomics Transfer learning
ISSN:	2041-1723, 2041-1723
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The advent of single-cell RNA sequencing (scRNA-seq) technologies has revolutionized transcriptomic studies. However, large-scale integrative analysis of scRNA-seq data remains a challenge largely due to unwanted batch effects and the limited transferabilty, interpretability, and scalability of the existing computational methods. We present single-cell Embedded Topic Model (scETM). Our key contribution is the utilization of a transferable neural-network-based encoder while having an interpretable linear decoder via a matrix tri-factorization. In particular, scETM simultaneously learns an encoder network to infer cell type mixture and a set of highly interpretable gene embeddings, topic embeddings, and batch-effect linear intercepts from multiple scRNA-seq datasets. scETM is scalable to over 10 6 cells and confers remarkable cross-tissue and cross-species zero-shot transfer-learning performance. Using gene set enrichment analysis, we find that scETM-learned topics are enriched in biologically meaningful and disease-related pathways. Lastly, scETM enables the incorporation of known gene sets into the gene embeddings, thereby directly learning the associations between pathways and topics via the topic embeddings. Computational single-cell RNA-seq analyses often face challenges in scalability, model interpretability, and confounders. Here, we show a new model to address these challenges by learning meaningful embeddings from the data that simultaneously refine gene signatures and cell functions in diverse conditions.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2041-1723 2041-1723
DOI:	10.1038/s41467-021-25534-2