Iterative single-cell multi-omic integration using online learning
Integrating large single-cell gene expression, chromatin accessibility and DNA methylation datasets requires general and scalable computational approaches. Here we describe online integrative non-negative matrix factorization (iNMF), an algorithm for integrating large, diverse and continually arrivi...
Saved in:
| Published in: | Nature biotechnology Vol. 39; no. 8; pp. 1000 - 1007 |
|---|---|
| Main Authors: | , , , , , , , , , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York
Nature Publishing Group US
01.08.2021
Nature Publishing Group |
| Subjects: | |
| ISSN: | 1087-0156, 1546-1696, 1546-1696 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Integrating large single-cell gene expression, chromatin accessibility and DNA methylation datasets requires general and scalable computational approaches. Here we describe online integrative non-negative matrix factorization (iNMF), an algorithm for integrating large, diverse and continually arriving single-cell datasets. Our approach scales to arbitrarily large numbers of cells using fixed memory, iteratively incorporates new datasets as they are generated and allows many users to simultaneously analyze a single copy of a large dataset by streaming it over the internet. Iterative data addition can also be used to map new data to a reference dataset. Comparisons with previous methods indicate that the improvements in efficiency do not sacrifice dataset alignment and cluster preservation performance. We demonstrate the effectiveness of online iNMF by integrating more than 1 million cells on a standard laptop, integrating large single-cell RNA sequencing and spatial transcriptomic datasets, and iteratively constructing a single-cell multi-omic atlas of the mouse motor cortex.
A new algorithm enables scalable and iterative integration of single-cell datasets. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 SP, CL, RC, JS, AR, JRN, MMB, JRE, and BR generated the snATAC-seq and snmC-seq data. JDW conceived the idea of online iNMF. CG and JDW developed and implemented the online iNMF algorithm. CG, JL, ARK, and JDW carried out data analyses. CG, JL, ARK, and JDW wrote the paper. All authors read and approved the final manuscript. Author Contributions Present Affiliation: Department of Human Genetics, University of California Los Angeles, Los Angeles, CA, USA |
| ISSN: | 1087-0156 1546-1696 1546-1696 |
| DOI: | 10.1038/s41587-021-00867-x |