0062 Improved Circadian Data Ordering in the Presence of Biological and Technical Confounds

Abstract Introduction We recently used unsupervised machine learning to order genome scale data along a circadian cycle. CYCLOPS (Anafi et al PNAS 2017) encodes high dimensional genomic data onto an ellipse and offers the potential to identify circadian patterns in large data-sets. This approach req...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Sleep (New York, N.Y.) Jg. 43; H. Supplement_1; S. A24 - A26
Hauptverfasser: Hammarlund, J, Anafi, R
Format: Journal Article
Sprache:Englisch
Veröffentlicht: US Oxford University Press 27.05.2020
Schlagworte:
ISSN:0161-8105, 1550-9109
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Abstract Introduction We recently used unsupervised machine learning to order genome scale data along a circadian cycle. CYCLOPS (Anafi et al PNAS 2017) encodes high dimensional genomic data onto an ellipse and offers the potential to identify circadian patterns in large data-sets. This approach requires many samples from a wide range of circadian phases. Individual data-sets often lack sufficient samples. Composite expression repositories vastly increase the available data. However, these agglomerated datasets also introduce technical (e.g. processing site) and biological (e.g. age or disease) confounders that may hamper circadian ordering. Methods Using the FLUX machine learning library we expanded the CYCLOPS network. We incorporated additional encoding and decoding layers that model the influence of labeled confounding variables. These layers feed into a fully connected autoencoder with a circular bottleneck, encoding the estimated phase of each sample. The expanded network simultaneously estimates the influence of confounding variables along with circadian phase. We compared the performance of the original and expanded networks using both real and simulated expression data. In a first test, we used time-labeled data from a single-center describing human cortical samples obtained at autopsy. To generate a second, idealized processing center, we introduced gene specific biases in expression along with a bias in sample collection time. In a second test, we combined human lung biopsy data from two medical centers. Results The performance of the original CYCLOPS network degraded with the introduction of increasing, non-circadian confounds. The expanded network was able to more accurately assess circadian phase over a wider range of confounding influences. Conclusion The addition of labeled confounding variables into the network architecture improves circadian data ordering. The use of the expanded network should facilitate the application of CYCLOPS to multi-center data and expand the data available for circadian analysis. Support This work was supported by the National Cancer Institute (1R01CA227485-01)
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0161-8105
1550-9109
DOI:10.1093/sleep/zsaa056.060