Iterative single-cell multi-omic integration using online learning

Integrating large single-cell gene expression, chromatin accessibility and DNA methylation datasets requires general and scalable computational approaches. Here we describe online integrative non-negative matrix factorization (iNMF), an algorithm for integrating large, diverse and continually arrivi...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Nature biotechnology Ročník 39; číslo 8; s. 1000 - 1007
Hlavní autoři: Gao, Chao, Liu, Jialin, Kriebel, April R., Preissl, Sebastian, Luo, Chongyuan, Castanon, Rosa, Sandoval, Justin, Rivkin, Angeline, Nery, Joseph R., Behrens, Margarita M., Ecker, Joseph R., Ren, Bing, Welch, Joshua D.
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York Nature Publishing Group US 01.08.2021
Nature Publishing Group
Témata:
ISSN:1087-0156, 1546-1696, 1546-1696
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Integrating large single-cell gene expression, chromatin accessibility and DNA methylation datasets requires general and scalable computational approaches. Here we describe online integrative non-negative matrix factorization (iNMF), an algorithm for integrating large, diverse and continually arriving single-cell datasets. Our approach scales to arbitrarily large numbers of cells using fixed memory, iteratively incorporates new datasets as they are generated and allows many users to simultaneously analyze a single copy of a large dataset by streaming it over the internet. Iterative data addition can also be used to map new data to a reference dataset. Comparisons with previous methods indicate that the improvements in efficiency do not sacrifice dataset alignment and cluster preservation performance. We demonstrate the effectiveness of online iNMF by integrating more than 1 million cells on a standard laptop, integrating large single-cell RNA sequencing and spatial transcriptomic datasets, and iteratively constructing a single-cell multi-omic atlas of the mouse motor cortex. A new algorithm enables scalable and iterative integration of single-cell datasets.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
SP, CL, RC, JS, AR, JRN, MMB, JRE, and BR generated the snATAC-seq and snmC-seq data. JDW conceived the idea of online iNMF. CG and JDW developed and implemented the online iNMF algorithm. CG, JL, ARK, and JDW carried out data analyses. CG, JL, ARK, and JDW wrote the paper. All authors read and approved the final manuscript.
Author Contributions
Present Affiliation: Department of Human Genetics, University of California Los Angeles, Los Angeles, CA, USA
ISSN:1087-0156
1546-1696
1546-1696
DOI:10.1038/s41587-021-00867-x