Integration and transfer learning of single-cell transcriptomes via cFIT
Large, comprehensive collections of single-cell RNA sequencing (scRNA-seq) datasets have been generated that allow for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. As new methods arise to measure distinct cellular modalities, a...
Gespeichert in:
| Veröffentlicht in: | Proceedings of the National Academy of Sciences - PNAS Jg. 118; H. 10 |
|---|---|
| Hauptverfasser: | , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
United States
09.03.2021
|
| Schlagworte: | |
| ISSN: | 1091-6490, 1091-6490 |
| Online-Zugang: | Weitere Angaben |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Large, comprehensive collections of single-cell RNA sequencing (scRNA-seq) datasets have been generated that allow for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets or transfer knowledge from one to the other to better understand cellular identity and functions. Here, we present a simple yet surprisingly effective method named common factor integration and transfer learning (cFIT) for capturing various batch effects across experiments, technologies, subjects, and even species. The proposed method models the shared information between various datasets by a common factor space while allowing for unique distortions and shifts in genewise expression in each batch. The model parameters are learned under an iterative nonnegative matrix factorization (NMF) framework and then used for synchronized integration from across-domain assays. In addition, the model enables transferring via low-rank matrix from more informative data to allow for precise identification in data of lower quality. Compared with existing approaches, our method imposes weaker assumptions on the cell composition of each individual dataset; however, it is shown to be more reliable in preserving biological variations. We apply cFIT to multiple scRNA-seq datasets of developing brain from human and mouse, varying by technologies and developmental stages. The successful integration and transfer uncover the transcriptional resemblance across systems. The study helps establish a comprehensive landscape of brain cell-type diversity and provides insights into brain development. |
|---|---|
| AbstractList | Large, comprehensive collections of single-cell RNA sequencing (scRNA-seq) datasets have been generated that allow for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets or transfer knowledge from one to the other to better understand cellular identity and functions. Here, we present a simple yet surprisingly effective method named common factor integration and transfer learning (cFIT) for capturing various batch effects across experiments, technologies, subjects, and even species. The proposed method models the shared information between various datasets by a common factor space while allowing for unique distortions and shifts in genewise expression in each batch. The model parameters are learned under an iterative nonnegative matrix factorization (NMF) framework and then used for synchronized integration from across-domain assays. In addition, the model enables transferring via low-rank matrix from more informative data to allow for precise identification in data of lower quality. Compared with existing approaches, our method imposes weaker assumptions on the cell composition of each individual dataset; however, it is shown to be more reliable in preserving biological variations. We apply cFIT to multiple scRNA-seq datasets of developing brain from human and mouse, varying by technologies and developmental stages. The successful integration and transfer uncover the transcriptional resemblance across systems. The study helps establish a comprehensive landscape of brain cell-type diversity and provides insights into brain development. Large, comprehensive collections of single-cell RNA sequencing (scRNA-seq) datasets have been generated that allow for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets or transfer knowledge from one to the other to better understand cellular identity and functions. Here, we present a simple yet surprisingly effective method named common factor integration and transfer learning (cFIT) for capturing various batch effects across experiments, technologies, subjects, and even species. The proposed method models the shared information between various datasets by a common factor space while allowing for unique distortions and shifts in genewise expression in each batch. The model parameters are learned under an iterative nonnegative matrix factorization (NMF) framework and then used for synchronized integration from across-domain assays. In addition, the model enables transferring via low-rank matrix from more informative data to allow for precise identification in data of lower quality. Compared with existing approaches, our method imposes weaker assumptions on the cell composition of each individual dataset; however, it is shown to be more reliable in preserving biological variations. We apply cFIT to multiple scRNA-seq datasets of developing brain from human and mouse, varying by technologies and developmental stages. The successful integration and transfer uncover the transcriptional resemblance across systems. The study helps establish a comprehensive landscape of brain cell-type diversity and provides insights into brain development.Large, comprehensive collections of single-cell RNA sequencing (scRNA-seq) datasets have been generated that allow for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets or transfer knowledge from one to the other to better understand cellular identity and functions. Here, we present a simple yet surprisingly effective method named common factor integration and transfer learning (cFIT) for capturing various batch effects across experiments, technologies, subjects, and even species. The proposed method models the shared information between various datasets by a common factor space while allowing for unique distortions and shifts in genewise expression in each batch. The model parameters are learned under an iterative nonnegative matrix factorization (NMF) framework and then used for synchronized integration from across-domain assays. In addition, the model enables transferring via low-rank matrix from more informative data to allow for precise identification in data of lower quality. Compared with existing approaches, our method imposes weaker assumptions on the cell composition of each individual dataset; however, it is shown to be more reliable in preserving biological variations. We apply cFIT to multiple scRNA-seq datasets of developing brain from human and mouse, varying by technologies and developmental stages. The successful integration and transfer uncover the transcriptional resemblance across systems. The study helps establish a comprehensive landscape of brain cell-type diversity and provides insights into brain development. |
| Author | Li, Yue Wamsley, Brie Wei, Yuting Peng, Minshi Roeder, Kathryn |
| Author_xml | – sequence: 1 givenname: Minshi surname: Peng fullname: Peng, Minshi organization: Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213 – sequence: 2 givenname: Yue surname: Li fullname: Li, Yue organization: Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213 – sequence: 3 givenname: Brie surname: Wamsley fullname: Wamsley, Brie organization: Neurogenetics Program, University of California, Los Angeles, CA 90095 – sequence: 4 givenname: Yuting surname: Wei fullname: Wei, Yuting organization: Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213 – sequence: 5 givenname: Kathryn orcidid: 0000-0002-8869-6254 surname: Roeder fullname: Roeder, Kathryn email: roeder@andrew.cmu.edu organization: Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213 |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/33658382$$D View this record in MEDLINE/PubMed |
| BookMark | eNpNj89LwzAYhoNM3A89e5McvXTmS9I2OcpwrjDwMs8lTb-MSpvWpBP8761sgqf3PTy8PO-SzHzvkZB7YGtguXgavIlrzrgUSgCoK7IApiHJpGazf31OljF-MMZ0qtgNmQuRpUooviC7wo94DGZsek-Nr-kYjI8OA23RBN_4I-0djVO2mFhs2zNgQzOMfYeRfjWG2m1xuCXXzrQR7y65Iu_bl8Nml-zfXovN8z6xMtdj4liWaqiUyZXj2jphK1lbcLlgFpBlIIRUrtI52kk3r_WEWlAga1OlViBfkcfz7hD6zxPGseya-CtmPPanWHKpc4CUcZjQhwt6qjqsyyE0nQnf5d97_gO1tV5l |
| CitedBy_id | crossref_primary_10_1073_pnas_2416516122 crossref_primary_10_1158_0008_5472_CAN_23_3005 crossref_primary_10_1016_j_crmeth_2023_100670 crossref_primary_10_1093_nargab_lqad024 crossref_primary_10_1038_s41467_023_36983_2 crossref_primary_10_1109_TIT_2021_3065700 crossref_primary_10_2174_1574893618666221130094050 crossref_primary_10_1038_s41467_025_63425_y crossref_primary_10_1093_pnasnexus_pgae449 crossref_primary_10_1186_s13059_025_03675_7 crossref_primary_10_1214_23_AOS2341 crossref_primary_10_1038_s41467_022_31411_3 crossref_primary_10_1038_s41588_022_01104_0 crossref_primary_10_1186_s13059_021_02499_5 crossref_primary_10_1186_s13073_021_00944_5 crossref_primary_10_1016_j_cels_2023_12_002 crossref_primary_10_1093_nar_gkab481 crossref_primary_10_1038_s41556_023_01337_z |
| ContentType | Journal Article |
| Copyright | Copyright © 2021 the Author(s). Published by PNAS. |
| Copyright_xml | – notice: Copyright © 2021 the Author(s). Published by PNAS. |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1073/pnas.2024383118 |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Sciences (General) |
| EISSN | 1091-6490 |
| ExternalDocumentID | 33658382 |
| Genre | Research Support, U.S. Gov't, Non-P.H.S Journal Article Research Support, N.I.H., Extramural |
| GrantInformation_xml | – fundername: NIMH NIH HHS grantid: R01 MH123184 – fundername: NIMH NIH HHS grantid: U01 MH116489 – fundername: NIMH NIH HHS grantid: R37 MH057881 |
| GroupedDBID | --- -DZ -~X .55 0R~ 123 29P 2AX 2FS 2WC 4.4 53G 5RE 5VS 85S AACGO AAFWJ AANCE ABBHK ABOCM ABPLY ABPPZ ABTLG ABXSQ ABZEH ACGOD ACIWK ACNCT ACPRK AENEX AEUPB AEXZC AFFNX AFOSN AFRAH ALMA_UNASSIGNED_HOLDINGS BKOMP CGR CS3 CUY CVF D0L DCCCD DIK DU5 E3Z EBS ECM EIF F5P FRP GX1 H13 HH5 HYE IPSME JAAYA JBMMH JENOY JHFFW JKQEH JLS JLXEF JPM JSG JST KQ8 L7B LU7 N9A NPM N~3 O9- OK1 PNE PQQKQ R.V RHI RNA RNS RPM RXW SA0 SJN TAE TN5 UKR W8F WH7 WOQ WOW X7M XSW Y6R YBH YKV YSK ZCA ~02 ~KM 7X8 |
| ID | FETCH-LOGICAL-c479t-f06591b8a78f29cf3cb4dc1f730c1e0613348fb97ec0007d9b8ac1814dab5c3e2 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 22 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000627429100100&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1091-6490 |
| IngestDate | Thu Sep 04 18:41:28 EDT 2025 Thu Apr 03 07:05:36 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 10 |
| Keywords | transfer learning single-cell RNA-seq brain cells data integration |
| Language | English |
| License | Copyright © 2021 the Author(s). Published by PNAS. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c479t-f06591b8a78f29cf3cb4dc1f730c1e0613348fb97ec0007d9b8ac1814dab5c3e2 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ORCID | 0000-0002-8869-6254 |
| OpenAccessLink | https://pubmed.ncbi.nlm.nih.gov/PMC7958425 |
| PMID | 33658382 |
| PQID | 2497115021 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_2497115021 pubmed_primary_33658382 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-03-09 |
| PublicationDateYYYYMMDD | 2021-03-09 |
| PublicationDate_xml | – month: 03 year: 2021 text: 2021-03-09 day: 09 |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | Proceedings of the National Academy of Sciences - PNAS |
| PublicationTitleAlternate | Proc Natl Acad Sci U S A |
| PublicationYear | 2021 |
| SSID | ssj0009580 |
| Score | 2.4927902 |
| Snippet | Large, comprehensive collections of single-cell RNA sequencing (scRNA-seq) datasets have been generated that allow for the full transcriptional... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| SubjectTerms | Animals Exome Sequencing Humans Machine Learning Mice RNA-Seq Single-Cell Analysis Software Transcriptome |
| Title | Integration and transfer learning of single-cell transcriptomes via cFIT |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/33658382 https://www.proquest.com/docview/2497115021 |
| Volume | 118 |
| WOSCitedRecordID | wos000627429100100&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3NS8MwFA_qPHhR5-f8IoIHPYQtH2uak4g4toNjhwm7lTRNZKBtXef-fvPaDL0IglB6Skp4eR-_vvf4PYRuhPFRP3OapD2piMg0I7qvNMmEYUpQJXUvrYdNyPE4ns3UJCTcqtBWufaJtaPOCgM58q7_TZCAXhi9Lz8ITI2C6moYobGJWtxDGTBMOYt_kO7GDRuBoiQSqrem9pG8W-YayLoZMHVSGv-OL-s4M9j77wn30W5AmPihUYk22rD5AWoHG67wbSCavjtEw1HgivB3g3We4WWNYu0Ch1kSr7hwGJIJb5ZAhr9ZUPuZ4t1_azXX2AxG0yP0MniaPg5JmKxAjJBqSRxUU2kaaxk7pozjJhWZoc6bu6EWQjwXsUuVtAZARKb8UuOxgL_LtG-4ZcdoKy9ye4qwoKYfOcXTCHhyLIwulybS_lHOOms76HotrcRrLhxW57b4rJJveXXQSSPypGwoNhLOI6jnsrM_7D5HOwwaTaAxTF2glvN2ay_Rtlkt59XiqlYJ_x5Pnr8AF43BHg |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Integration+and+transfer+learning+of+single-cell+transcriptomes+via+cFIT&rft.jtitle=Proceedings+of+the+National+Academy+of+Sciences+-+PNAS&rft.au=Peng%2C+Minshi&rft.au=Li%2C+Yue&rft.au=Wamsley%2C+Brie&rft.au=Wei%2C+Yuting&rft.date=2021-03-09&rft.eissn=1091-6490&rft.volume=118&rft.issue=10&rft_id=info:doi/10.1073%2Fpnas.2024383118&rft_id=info%3Apmid%2F33658382&rft_id=info%3Apmid%2F33658382&rft.externalDocID=33658382 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1091-6490&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1091-6490&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1091-6490&client=summon |