Iterative single-cell multi-omic integration using online learning

Integrating large single-cell gene expression, chromatin accessibility and DNA methylation datasets requires general and scalable computational approaches. Here we describe online integrative non-negative matrix factorization (iNMF), an algorithm for integrating large, diverse and continually arrivi...

Full description

Saved in:
Bibliographic Details
Published in:Nature biotechnology Vol. 39; no. 8; pp. 1000 - 1007
Main Authors: Gao, Chao, Liu, Jialin, Kriebel, April R., Preissl, Sebastian, Luo, Chongyuan, Castanon, Rosa, Sandoval, Justin, Rivkin, Angeline, Nery, Joseph R., Behrens, Margarita M., Ecker, Joseph R., Ren, Bing, Welch, Joshua D.
Format: Journal Article
Language:English
Published: New York Nature Publishing Group US 01.08.2021
Nature Publishing Group
Subjects:
ISSN:1087-0156, 1546-1696, 1546-1696
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Integrating large single-cell gene expression, chromatin accessibility and DNA methylation datasets requires general and scalable computational approaches. Here we describe online integrative non-negative matrix factorization (iNMF), an algorithm for integrating large, diverse and continually arriving single-cell datasets. Our approach scales to arbitrarily large numbers of cells using fixed memory, iteratively incorporates new datasets as they are generated and allows many users to simultaneously analyze a single copy of a large dataset by streaming it over the internet. Iterative data addition can also be used to map new data to a reference dataset. Comparisons with previous methods indicate that the improvements in efficiency do not sacrifice dataset alignment and cluster preservation performance. We demonstrate the effectiveness of online iNMF by integrating more than 1 million cells on a standard laptop, integrating large single-cell RNA sequencing and spatial transcriptomic datasets, and iteratively constructing a single-cell multi-omic atlas of the mouse motor cortex. A new algorithm enables scalable and iterative integration of single-cell datasets.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
SP, CL, RC, JS, AR, JRN, MMB, JRE, and BR generated the snATAC-seq and snmC-seq data. JDW conceived the idea of online iNMF. CG and JDW developed and implemented the online iNMF algorithm. CG, JL, ARK, and JDW carried out data analyses. CG, JL, ARK, and JDW wrote the paper. All authors read and approved the final manuscript.
Author Contributions
Present Affiliation: Department of Human Genetics, University of California Los Angeles, Los Angeles, CA, USA
ISSN:1087-0156
1546-1696
1546-1696
DOI:10.1038/s41587-021-00867-x