Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders

The Cancer Genome Atlas (TCGA) has profiled over 10,000 tumors across 33 different cancer-types for many genomic features, including gene expression levels. Gene expression measurements capture substantial information about the state of each tumor. Certain classes of deep neural network models are c...

Full description

Saved in:
Bibliographic Details
Published in:Biocomputing 2018 Vol. 23; pp. 80 - 91
Main Authors: Way, Gregory P., Greene, Casey S.
Format: Book Chapter Journal Article
Language:English
Published: United States WORLD SCIENTIFIC 01.01.2018
Subjects:
ISBN:9789813235526, 9789813235540, 9813235543, 9813235527, 9789813235533, 9813235535
ISSN:2335-6936, 2335-6936
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The Cancer Genome Atlas (TCGA) has profiled over 10,000 tumors across 33 different cancer-types for many genomic features, including gene expression levels. Gene expression measurements capture substantial information about the state of each tumor. Certain classes of deep neural network models are capable of learning a meaningful latent space. Such a latent space could be used to explore and generate hypothetical gene expression profiles under various types of molecular and genetic perturbation. For example, one might wish to use such a model to predict a tumor's response to specific therapies or to characterize complex gene expression activations existing in differential proportions in different tumors. Variational autoencoders (VAEs) are a deep neural network approach capable of generating meaningful latent spaces for image and text data. In this work, we sought to determine the extent to which a VAE can be trained to model cancer gene expression, and whether or not such a VAE would capture biologically-relevant features. In the following report, we introduce a VAE trained on TCGA pan-cancer RNA-seq data, identify specific patterns in the VAE encoded features, and discuss potential merits of the approach. We name our method "Tybalt" after an instigative, cat-like character who sets a cascading chain of events in motion in Shakespeare's "Romeo and Juliet". From a systems biology perspective, Tybalt could one day aid in cancer stratification or predict specific activated expression patterns that would result from genetic changes or treatment effects.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISBN:9789813235526
9789813235540
9813235543
9813235527
9789813235533
9813235535
ISSN:2335-6936
2335-6936
DOI:10.1142/9789813235533_0008