Integration and transfer learning of single-cell transcriptomes via cFIT

Large, comprehensive collections of single-cell RNA sequencing (scRNA-seq) datasets have been generated that allow for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. As new methods arise to measure distinct cellular modalities, a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the National Academy of Sciences - PNAS Jg. 118; H. 10
Hauptverfasser: Peng, Minshi, Li, Yue, Wamsley, Brie, Wei, Yuting, Roeder, Kathryn
Format: Journal Article
Sprache:Englisch
Veröffentlicht: United States 09.03.2021
Schlagworte:
ISSN:1091-6490, 1091-6490
Online-Zugang:Weitere Angaben
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Large, comprehensive collections of single-cell RNA sequencing (scRNA-seq) datasets have been generated that allow for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets or transfer knowledge from one to the other to better understand cellular identity and functions. Here, we present a simple yet surprisingly effective method named common factor integration and transfer learning (cFIT) for capturing various batch effects across experiments, technologies, subjects, and even species. The proposed method models the shared information between various datasets by a common factor space while allowing for unique distortions and shifts in genewise expression in each batch. The model parameters are learned under an iterative nonnegative matrix factorization (NMF) framework and then used for synchronized integration from across-domain assays. In addition, the model enables transferring via low-rank matrix from more informative data to allow for precise identification in data of lower quality. Compared with existing approaches, our method imposes weaker assumptions on the cell composition of each individual dataset; however, it is shown to be more reliable in preserving biological variations. We apply cFIT to multiple scRNA-seq datasets of developing brain from human and mouse, varying by technologies and developmental stages. The successful integration and transfer uncover the transcriptional resemblance across systems. The study helps establish a comprehensive landscape of brain cell-type diversity and provides insights into brain development.
AbstractList Large, comprehensive collections of single-cell RNA sequencing (scRNA-seq) datasets have been generated that allow for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets or transfer knowledge from one to the other to better understand cellular identity and functions. Here, we present a simple yet surprisingly effective method named common factor integration and transfer learning (cFIT) for capturing various batch effects across experiments, technologies, subjects, and even species. The proposed method models the shared information between various datasets by a common factor space while allowing for unique distortions and shifts in genewise expression in each batch. The model parameters are learned under an iterative nonnegative matrix factorization (NMF) framework and then used for synchronized integration from across-domain assays. In addition, the model enables transferring via low-rank matrix from more informative data to allow for precise identification in data of lower quality. Compared with existing approaches, our method imposes weaker assumptions on the cell composition of each individual dataset; however, it is shown to be more reliable in preserving biological variations. We apply cFIT to multiple scRNA-seq datasets of developing brain from human and mouse, varying by technologies and developmental stages. The successful integration and transfer uncover the transcriptional resemblance across systems. The study helps establish a comprehensive landscape of brain cell-type diversity and provides insights into brain development.
Large, comprehensive collections of single-cell RNA sequencing (scRNA-seq) datasets have been generated that allow for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets or transfer knowledge from one to the other to better understand cellular identity and functions. Here, we present a simple yet surprisingly effective method named common factor integration and transfer learning (cFIT) for capturing various batch effects across experiments, technologies, subjects, and even species. The proposed method models the shared information between various datasets by a common factor space while allowing for unique distortions and shifts in genewise expression in each batch. The model parameters are learned under an iterative nonnegative matrix factorization (NMF) framework and then used for synchronized integration from across-domain assays. In addition, the model enables transferring via low-rank matrix from more informative data to allow for precise identification in data of lower quality. Compared with existing approaches, our method imposes weaker assumptions on the cell composition of each individual dataset; however, it is shown to be more reliable in preserving biological variations. We apply cFIT to multiple scRNA-seq datasets of developing brain from human and mouse, varying by technologies and developmental stages. The successful integration and transfer uncover the transcriptional resemblance across systems. The study helps establish a comprehensive landscape of brain cell-type diversity and provides insights into brain development.Large, comprehensive collections of single-cell RNA sequencing (scRNA-seq) datasets have been generated that allow for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets or transfer knowledge from one to the other to better understand cellular identity and functions. Here, we present a simple yet surprisingly effective method named common factor integration and transfer learning (cFIT) for capturing various batch effects across experiments, technologies, subjects, and even species. The proposed method models the shared information between various datasets by a common factor space while allowing for unique distortions and shifts in genewise expression in each batch. The model parameters are learned under an iterative nonnegative matrix factorization (NMF) framework and then used for synchronized integration from across-domain assays. In addition, the model enables transferring via low-rank matrix from more informative data to allow for precise identification in data of lower quality. Compared with existing approaches, our method imposes weaker assumptions on the cell composition of each individual dataset; however, it is shown to be more reliable in preserving biological variations. We apply cFIT to multiple scRNA-seq datasets of developing brain from human and mouse, varying by technologies and developmental stages. The successful integration and transfer uncover the transcriptional resemblance across systems. The study helps establish a comprehensive landscape of brain cell-type diversity and provides insights into brain development.
Author Li, Yue
Wamsley, Brie
Wei, Yuting
Peng, Minshi
Roeder, Kathryn
Author_xml – sequence: 1
  givenname: Minshi
  surname: Peng
  fullname: Peng, Minshi
  organization: Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213
– sequence: 2
  givenname: Yue
  surname: Li
  fullname: Li, Yue
  organization: Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213
– sequence: 3
  givenname: Brie
  surname: Wamsley
  fullname: Wamsley, Brie
  organization: Neurogenetics Program, University of California, Los Angeles, CA 90095
– sequence: 4
  givenname: Yuting
  surname: Wei
  fullname: Wei, Yuting
  organization: Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213
– sequence: 5
  givenname: Kathryn
  orcidid: 0000-0002-8869-6254
  surname: Roeder
  fullname: Roeder, Kathryn
  email: roeder@andrew.cmu.edu
  organization: Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213
BackLink https://www.ncbi.nlm.nih.gov/pubmed/33658382$$D View this record in MEDLINE/PubMed
BookMark eNpNj89LwzAYhoNM3A89e5McvXTmS9I2OcpwrjDwMs8lTb-MSpvWpBP8761sgqf3PTy8PO-SzHzvkZB7YGtguXgavIlrzrgUSgCoK7IApiHJpGazf31OljF-MMZ0qtgNmQuRpUooviC7wo94DGZsek-Nr-kYjI8OA23RBN_4I-0djVO2mFhs2zNgQzOMfYeRfjWG2m1xuCXXzrQR7y65Iu_bl8Nml-zfXovN8z6xMtdj4liWaqiUyZXj2jphK1lbcLlgFpBlIIRUrtI52kk3r_WEWlAga1OlViBfkcfz7hD6zxPGseya-CtmPPanWHKpc4CUcZjQhwt6qjqsyyE0nQnf5d97_gO1tV5l
CitedBy_id crossref_primary_10_1073_pnas_2416516122
crossref_primary_10_1158_0008_5472_CAN_23_3005
crossref_primary_10_1016_j_crmeth_2023_100670
crossref_primary_10_1093_nargab_lqad024
crossref_primary_10_1038_s41467_023_36983_2
crossref_primary_10_1109_TIT_2021_3065700
crossref_primary_10_2174_1574893618666221130094050
crossref_primary_10_1038_s41467_025_63425_y
crossref_primary_10_1093_pnasnexus_pgae449
crossref_primary_10_1186_s13059_025_03675_7
crossref_primary_10_1214_23_AOS2341
crossref_primary_10_1038_s41467_022_31411_3
crossref_primary_10_1038_s41588_022_01104_0
crossref_primary_10_1186_s13059_021_02499_5
crossref_primary_10_1186_s13073_021_00944_5
crossref_primary_10_1016_j_cels_2023_12_002
crossref_primary_10_1093_nar_gkab481
crossref_primary_10_1038_s41556_023_01337_z
ContentType Journal Article
Copyright Copyright © 2021 the Author(s). Published by PNAS.
Copyright_xml – notice: Copyright © 2021 the Author(s). Published by PNAS.
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1073/pnas.2024383118
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Sciences (General)
EISSN 1091-6490
ExternalDocumentID 33658382
Genre Research Support, U.S. Gov't, Non-P.H.S
Journal Article
Research Support, N.I.H., Extramural
GrantInformation_xml – fundername: NIMH NIH HHS
  grantid: R01 MH123184
– fundername: NIMH NIH HHS
  grantid: U01 MH116489
– fundername: NIMH NIH HHS
  grantid: R37 MH057881
GroupedDBID ---
-DZ
-~X
.55
0R~
123
29P
2AX
2FS
2WC
4.4
53G
5RE
5VS
85S
AACGO
AAFWJ
AANCE
ABBHK
ABOCM
ABPLY
ABPPZ
ABTLG
ABXSQ
ABZEH
ACGOD
ACIWK
ACNCT
ACPRK
AENEX
AEUPB
AEXZC
AFFNX
AFOSN
AFRAH
ALMA_UNASSIGNED_HOLDINGS
BKOMP
CGR
CS3
CUY
CVF
D0L
DCCCD
DIK
DU5
E3Z
EBS
ECM
EIF
F5P
FRP
GX1
H13
HH5
HYE
IPSME
JAAYA
JBMMH
JENOY
JHFFW
JKQEH
JLS
JLXEF
JPM
JSG
JST
KQ8
L7B
LU7
N9A
NPM
N~3
O9-
OK1
PNE
PQQKQ
R.V
RHI
RNA
RNS
RPM
RXW
SA0
SJN
TAE
TN5
UKR
W8F
WH7
WOQ
WOW
X7M
XSW
Y6R
YBH
YKV
YSK
ZCA
~02
~KM
7X8
ID FETCH-LOGICAL-c479t-f06591b8a78f29cf3cb4dc1f730c1e0613348fb97ec0007d9b8ac1814dab5c3e2
IEDL.DBID 7X8
ISICitedReferencesCount 22
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000627429100100&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1091-6490
IngestDate Thu Sep 04 18:41:28 EDT 2025
Thu Apr 03 07:05:36 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 10
Keywords transfer learning
single-cell RNA-seq
brain cells
data integration
Language English
License Copyright © 2021 the Author(s). Published by PNAS.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c479t-f06591b8a78f29cf3cb4dc1f730c1e0613348fb97ec0007d9b8ac1814dab5c3e2
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0002-8869-6254
OpenAccessLink https://pubmed.ncbi.nlm.nih.gov/PMC7958425
PMID 33658382
PQID 2497115021
PQPubID 23479
ParticipantIDs proquest_miscellaneous_2497115021
pubmed_primary_33658382
PublicationCentury 2000
PublicationDate 2021-03-09
PublicationDateYYYYMMDD 2021-03-09
PublicationDate_xml – month: 03
  year: 2021
  text: 2021-03-09
  day: 09
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Proceedings of the National Academy of Sciences - PNAS
PublicationTitleAlternate Proc Natl Acad Sci U S A
PublicationYear 2021
SSID ssj0009580
Score 2.4927902
Snippet Large, comprehensive collections of single-cell RNA sequencing (scRNA-seq) datasets have been generated that allow for the full transcriptional...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
SubjectTerms Animals
Exome Sequencing
Humans
Machine Learning
Mice
RNA-Seq
Single-Cell Analysis
Software
Transcriptome
Title Integration and transfer learning of single-cell transcriptomes via cFIT
URI https://www.ncbi.nlm.nih.gov/pubmed/33658382
https://www.proquest.com/docview/2497115021
Volume 118
WOSCitedRecordID wos000627429100100&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3NS8MwFA_qPHhR5-f8IoIHPYQtH2uak4g4toNjhwm7lTRNZKBtXef-fvPaDL0IglB6Skp4eR-_vvf4PYRuhPFRP3OapD2piMg0I7qvNMmEYUpQJXUvrYdNyPE4ns3UJCTcqtBWufaJtaPOCgM58q7_TZCAXhi9Lz8ITI2C6moYobGJWtxDGTBMOYt_kO7GDRuBoiQSqrem9pG8W-YayLoZMHVSGv-OL-s4M9j77wn30W5AmPihUYk22rD5AWoHG67wbSCavjtEw1HgivB3g3We4WWNYu0Ch1kSr7hwGJIJb5ZAhr9ZUPuZ4t1_azXX2AxG0yP0MniaPg5JmKxAjJBqSRxUU2kaaxk7pozjJhWZoc6bu6EWQjwXsUuVtAZARKb8UuOxgL_LtG-4ZcdoKy9ye4qwoKYfOcXTCHhyLIwulybS_lHOOms76HotrcRrLhxW57b4rJJveXXQSSPypGwoNhLOI6jnsrM_7D5HOwwaTaAxTF2glvN2ay_Rtlkt59XiqlYJ_x5Pnr8AF43BHg
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Integration+and+transfer+learning+of+single-cell+transcriptomes+via+cFIT&rft.jtitle=Proceedings+of+the+National+Academy+of+Sciences+-+PNAS&rft.au=Peng%2C+Minshi&rft.au=Li%2C+Yue&rft.au=Wamsley%2C+Brie&rft.au=Wei%2C+Yuting&rft.date=2021-03-09&rft.eissn=1091-6490&rft.volume=118&rft.issue=10&rft_id=info:doi/10.1073%2Fpnas.2024383118&rft_id=info%3Apmid%2F33658382&rft_id=info%3Apmid%2F33658382&rft.externalDocID=33658382
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1091-6490&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1091-6490&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1091-6490&client=summon