Visualizing the structure of RNA-seq expression data using grade of membership models

Grade of membership models, also known as "admixture models", "topic models" or "Latent Dirichlet Allocation", are a generalization of cluster models that allow each sample to have membership in multiple clusters. These models are widely used in population genetics to m...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PLoS genetics Jg. 13; H. 3; S. e1006599
Hauptverfasser: Dey, Kushal K., Hsiao, Chiaowen Joyce, Stephens, Matthew
Format: Journal Article
Sprache:Englisch
Veröffentlicht: United States Public Library of Science 23.03.2017
Public Library of Science (PLoS)
Schlagworte:
ISSN:1553-7404, 1553-7390, 1553-7404
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Grade of membership models, also known as "admixture models", "topic models" or "Latent Dirichlet Allocation", are a generalization of cluster models that allow each sample to have membership in multiple clusters. These models are widely used in population genetics to model admixed individuals who have ancestry from multiple "populations", and in natural language processing to model documents having words from multiple "topics". Here we illustrate the potential for these models to cluster samples of RNA-seq gene expression data, measured on either bulk samples or single cells. We also provide methods to help interpret the clusters, by identifying genes that are distinctively expressed in each cluster. By applying these methods to several example RNA-seq applications we demonstrate their utility in identifying and summarizing structure and heterogeneity. Applied to data from the GTEx project on 53 human tissues, the approach highlights similarities among biologically-related tissues and identifies distinctively-expressed genes that recapitulate known biology. Applied to single-cell expression data from mouse preimplantation embryos, the approach highlights both discrete and continuous variation through early embryonic development stages, and highlights genes involved in a variety of relevant processes-from germ cell development, through compaction and morula formation, to the formation of inner cell mass and trophoblast at the blastocyst stage. The methods are implemented in the Bioconductor package CountClust.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
The authors have declared that no competing interests exist.
Conceptualization: MS KKD.Data curation: MS.Formal analysis: KKD CJH.Funding acquisition: MS.Investigation: MS.Methodology: MS KKD.Project administration: MS.Resources: MS.Software: KKD CJH MS.Supervision: MS.Validation: KKD CJH MS.Visualization: KKD CJH MS.Writing – original draft: KKD CJH MS.Writing – review & editing: KKD CJH MS.
ISSN:1553-7404
1553-7390
1553-7404
DOI:10.1371/journal.pgen.1006599