0062 Improved Circadian Data Ordering in the Presence of Biological and Technical Confounds
Abstract Introduction We recently used unsupervised machine learning to order genome scale data along a circadian cycle. CYCLOPS (Anafi et al PNAS 2017) encodes high dimensional genomic data onto an ellipse and offers the potential to identify circadian patterns in large data-sets. This approach req...
Gespeichert in:
| Veröffentlicht in: | Sleep (New York, N.Y.) Jg. 43; H. Supplement_1; S. A24 - A26 |
|---|---|
| Hauptverfasser: | , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
US
Oxford University Press
27.05.2020
|
| Schlagworte: | |
| ISSN: | 0161-8105, 1550-9109 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | Abstract
Introduction
We recently used unsupervised machine learning to order genome scale data along a circadian cycle. CYCLOPS (Anafi et al PNAS 2017) encodes high dimensional genomic data onto an ellipse and offers the potential to identify circadian patterns in large data-sets. This approach requires many samples from a wide range of circadian phases. Individual data-sets often lack sufficient samples. Composite expression repositories vastly increase the available data. However, these agglomerated datasets also introduce technical (e.g. processing site) and biological (e.g. age or disease) confounders that may hamper circadian ordering.
Methods
Using the FLUX machine learning library we expanded the CYCLOPS network. We incorporated additional encoding and decoding layers that model the influence of labeled confounding variables. These layers feed into a fully connected autoencoder with a circular bottleneck, encoding the estimated phase of each sample. The expanded network simultaneously estimates the influence of confounding variables along with circadian phase.
We compared the performance of the original and expanded networks using both real and simulated expression data. In a first test, we used time-labeled data from a single-center describing human cortical samples obtained at autopsy. To generate a second, idealized processing center, we introduced gene specific biases in expression along with a bias in sample collection time. In a second test, we combined human lung biopsy data from two medical centers.
Results
The performance of the original CYCLOPS network degraded with the introduction of increasing, non-circadian confounds. The expanded network was able to more accurately assess circadian phase over a wider range of confounding influences.
Conclusion
The addition of labeled confounding variables into the network architecture improves circadian data ordering. The use of the expanded network should facilitate the application of CYCLOPS to multi-center data and expand the data available for circadian analysis.
Support
This work was supported by the National Cancer Institute (1R01CA227485-01) |
|---|---|
| Bibliographie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0161-8105 1550-9109 |
| DOI: | 10.1093/sleep/zsaa056.060 |