A generalized and versatile framework to train and evaluate autoencoders for biological representation learning and beyond: AUTOENCODIX

Insights and discoveries in complex biological systems, e.g. for personalized medicine, are gained by the combination of large, feature-rich and high-dimensional data with powerful computational methods uncovering patterns and relationships. In recent years, autoencoders, a family of deep learning-b...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:bioRxiv
Hauptverfasser: Joas, Maximilian, Jurenaite, Neringa, Praščević, Dušan, Scherf, Nico, Ewald, Jan
Format: Paper
Sprache:Englisch
Veröffentlicht: Cold Spring Harbor Cold Spring Harbor Laboratory Press 20.12.2024
Cold Spring Harbor Laboratory
Ausgabe:1.1
Schlagworte:
ISSN:2692-8205, 2692-8205
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Insights and discoveries in complex biological systems, e.g. for personalized medicine, are gained by the combination of large, feature-rich and high-dimensional data with powerful computational methods uncovering patterns and relationships. In recent years, autoencoders, a family of deep learning-based methods for representation learning, are advancing data-driven research due to their variability and non-linear power of multi-modal data integration. Despite their success, current implementations lack standardization, versatility, comparability, and generalizability preventing a broad application. To fill the gap, we present AUTOENCODIX (https://github.com/jan-forest/autoencodix), an open-source framework, designed as a standardized and flexible pipeline for preprocessing, training, and evaluation of autoencoder architectures. These architectures, like ontology-based and cross-modal autoencoders, provide key advantages over traditional methods via explainability of embeddings or the ability to translate across data modalities. We show the value of our framework by its application to data sets from pan-cancer studies (TCGA), single-cell sequencing as well as in combination with imaging. Our studies provide important user-centric insights and recommendations to navigate through architectures, hyperparameters, and important trade-offs in representation learning. Those include reconstruction capability of input data, the quality of embedding for downstream machine learning models, or the reliability of ontology-based embeddings for explainability. In summary, our versatile and generalizable framework allows multi-modal data integration in biomedical research and any other data-driven fields of research. Hence, it can serve as a open-source platform for several major trends and research using autoencoders including architectural improvements, explainability, or training of large-scale pre-trained models.Competing Interest StatementThe authors have declared no competing interest.Footnotes* https://github.com/jan-forest/autoencodix
AbstractList Insights and discoveries in complex biological systems, e.g. for personalized medicine, are gained by the combination of large, feature-rich and high-dimensional data with powerful computational methods uncovering patterns and relationships. In recent years, autoencoders, a family of deep learning-based methods for representation learning, are advancing data-driven research due to their variability and non-linear power of multi-modal data integration. Despite their success, current implementations lack standardization, versatility, comparability, and generalizability preventing a broad application. To fill the gap, we present AUTOENCODIX (https://github.com/jan-forest/autoencodix), an open-source framework, designed as a standardized and flexible pipeline for preprocessing, training, and evaluation of autoencoder architectures. These architectures, like ontology-based and cross-modal autoencoders, provide key advantages over traditional methods via explainability of embeddings or the ability to translate across data modalities. We show the value of our framework by its application to data sets from pan-cancer studies (TCGA), single-cell sequencing as well as in combination with imaging. Our studies provide important user-centric insights and recommendations to navigate through architectures, hyperparameters, and important trade-offs in representation learning. Those include reconstruction capability of input data, the quality of embedding for downstream machine learning models, or the reliability of ontology-based embeddings for explainability. In summary, our versatile and generalizable framework allows multi-modal data integration in biomedical research and any other data-driven fields of research. Hence, it can serve as a open-source platform for several major trends and research using autoencoders including architectural improvements, explainability, or training of large-scale pre-trained models.
Insights and discoveries in complex biological systems, e.g. for personalized medicine, are gained by the combination of large, feature-rich and high-dimensional data with powerful computational methods uncovering patterns and relationships. In recent years, autoencoders, a family of deep learning-based methods for representation learning, are advancing data-driven research due to their variability and non-linear power of multi-modal data integration. Despite their success, current implementations lack standardization, versatility, comparability, and generalizability preventing a broad application. To fill the gap, we present AUTOENCODIX (https://github.com/jan-forest/autoencodix), an open-source framework, designed as a standardized and flexible pipeline for preprocessing, training, and evaluation of autoencoder architectures. These architectures, like ontology-based and cross-modal autoencoders, provide key advantages over traditional methods via explainability of embeddings or the ability to translate across data modalities. We show the value of our framework by its application to data sets from pan-cancer studies (TCGA), single-cell sequencing as well as in combination with imaging. Our studies provide important user-centric insights and recommendations to navigate through architectures, hyperparameters, and important trade-offs in representation learning. Those include reconstruction capability of input data, the quality of embedding for downstream machine learning models, or the reliability of ontology-based embeddings for explainability. In summary, our versatile and generalizable framework allows multi-modal data integration in biomedical research and any other data-driven fields of research. Hence, it can serve as a open-source platform for several major trends and research using autoencoders including architectural improvements, explainability, or training of large-scale pre-trained models.Competing Interest StatementThe authors have declared no competing interest.Footnotes* https://github.com/jan-forest/autoencodix
Author Jurenaite, Neringa
Scherf, Nico
Ewald, Jan
Joas, Maximilian
Praščević, Dušan
Author_xml – sequence: 1
  givenname: Maximilian
  surname: Joas
  fullname: Joas, Maximilian
– sequence: 2
  givenname: Neringa
  surname: Jurenaite
  fullname: Jurenaite, Neringa
– sequence: 3
  givenname: Dušan
  surname: Praščević
  fullname: Praščević, Dušan
– sequence: 4
  givenname: Nico
  surname: Scherf
  fullname: Scherf, Nico
– sequence: 5
  givenname: Jan
  surname: Ewald
  fullname: Ewald, Jan
BookMark eNpNkM1OwkAURicGExF5AHeTuHFDnf-27giikhDZQOKumenckmKZwWmL4gv42lZw4ere3Jx78uW7RD3nHSB0TUlEKaF3jDARURbROFIsSYk6Q32mUjZKGJG9f_sFGtb1hhDCUkV5LProe4zX4CDoqvwCi7WzeA-h1k1ZAS6C3sKHD2-48bgJunRHAPa6anUDWLeNB5d7233gwgdsSl_5dZnrCgfYBajBNZ3KO1yBDq5066PAwME7e4_Hq-Vi-jJZPMxer9B5oasahn9zgFaP0-XkeTRfPM0m4_nIUCLUiEshBVfE5GkucwHWSikN4Qm1cXdiIkkNMSIRAhQzHaG4TqUqWGxZkUjLB-j25O2ihs9yn-1CudXhkP12mFGW0Tg7ddihNyd0F_x7C3WTbXwbXJcu41TEMlYqYfwHof9zsw
ContentType Paper
Copyright 2024. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
2024, Posted by Cold Spring Harbor Laboratory
Copyright_xml – notice: 2024. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
– notice: 2024, Posted by Cold Spring Harbor Laboratory
DBID 8FE
8FH
ABUWG
AFKRA
AZQEC
BBNVY
BENPR
BHPHI
CCPQU
DWQXO
GNUQQ
HCIFZ
LK8
M7P
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
FX.
DOI 10.1101/2024.12.17.628906
DatabaseName ProQuest SciTech Collection
ProQuest Natural Science Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Central Essentials
Biological Science Collection
ProQuest Central
Natural Science Collection
ProQuest One
ProQuest Central Korea
ProQuest Central Student
SciTech Collection (ProQuest)
ProQuest Biological Science Collection
Biological Science Database
ProQuest Central Premium
ProQuest One Academic (New)
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central China
bioRxiv
DatabaseTitle Publicly Available Content Database
ProQuest Central Student
ProQuest One Academic Middle East (New)
ProQuest Biological Science Collection
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Natural Science Collection
Biological Science Database
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest One Academic UKI Edition
Natural Science Collection
ProQuest Central Korea
Biological Science Collection
ProQuest Central (New)
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList
Publicly Available Content Database
Database_xml – sequence: 1
  dbid: PIMPY
  name: Publicly Available Content Database
  url: http://search.proquest.com/publiccontent
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 2692-8205
Edition 1.1
ExternalDocumentID 2024.12.17.628906v1
Genre Working Paper/Pre-Print
GroupedDBID 8FE
8FH
ABUWG
AFKRA
ALMA_UNASSIGNED_HOLDINGS
AZQEC
BBNVY
BENPR
BHPHI
CCPQU
DWQXO
GNUQQ
HCIFZ
LK8
M7P
NQS
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PROAC
RHI
FX.
ID FETCH-LOGICAL-b1046-35454360bc9c5c4edd555b0381d7c9c2489b0b4844e62b5c463a956f27d2f85d3
IEDL.DBID PIMPY
ISSN 2692-8205
IngestDate Tue Dec 31 19:38:58 EST 2024
Fri Jul 25 09:17:50 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
License This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-b1046-35454360bc9c5c4edd555b0381d7c9c2489b0b4844e62b5c463a956f27d2f85d3
Notes SourceType-Working Papers-1
ObjectType-Working Paper/Pre-Print-1
content type line 50
Competing Interest Statement: The authors have declared no competing interest.
ORCID 0000-0002-9415-2317
0009-0005-2173-0621
0000-0003-4003-9121
0000-0002-3959-7452
OpenAccessLink https://www.proquest.com/publiccontent/docview/3147576682?pq-origsite=%requestingapplication%
PQID 3147576682
PQPubID 2050091
PageCount 26
ParticipantIDs biorxiv_primary_2024_12_17_628906
proquest_journals_3147576682
PublicationCentury 2000
PublicationDate 20241220
PublicationDateYYYYMMDD 2024-12-20
PublicationDate_xml – month: 12
  year: 2024
  text: 20241220
  day: 20
PublicationDecade 2020
PublicationPlace Cold Spring Harbor
PublicationPlace_xml – name: Cold Spring Harbor
PublicationTitle bioRxiv
PublicationYear 2024
Publisher Cold Spring Harbor Laboratory Press
Cold Spring Harbor Laboratory
Publisher_xml – name: Cold Spring Harbor Laboratory Press
– name: Cold Spring Harbor Laboratory
References Zhu, Bendl, Rahman, Vicari, Coleman, Clarence, Latouche, Tsankova, Li, Brennand (2024.12.17.628906v1.49) 2023; 9
Kasprzyk (2024.12.17.628906v1.55) 2011
Lupat, Perera, Loi, Li (2024.12.17.628906v1.12) 2023; 11
Paszke, Gross, Massa, Lerer, Bradbury, Chanan, Killeen, Lin, Gimelshein, Antiga (2024.12.17.628906v1.20) 2019; 32
Ma, Zhao, Xiao (2024.12.17.628906v1.56) 2021; 18
Berahmand, Daneshfar, Salehi, Li, Xu (2024.12.17.628906v1.4) 2024; 57
Chollet (2024.12.17.628906v1.52) 2015
Selby, Jakhmola, Sprang, Grossmann, Raki, Maani, Pavliuk, Ewald, Vollmer (2024.12.17.628906v1.28) 2024
Zitt, Paitz, Walter, Umlauft (2024.12.17.628906v1.36) 2024
Gao, Aksoy, Dogrusoz, Dresdner, Gross, Onur Sumer, Sun, Jacobsen, Sinha, Larsson (2024.12.17.628906v1.48) 2013; 6
Akiba, Sano, Yanase, Ohta, Koyama (2024.12.17.628906v1.25) 2019
Shrikumar, Greenside, Kundaje (2024.12.17.628906v1.33) 2017
Zhang, Zhang, Ren, Wu, Zhao (2024.12.17.628906v1.38) 2024; 40
Milacic, Beavers, Conley, Gong, Gillespie, Griss, Haw, Jassal, Matthews, May (2024.12.17.628906v1.26) 2024; 52
Hao, Gong, Zeng, Liu, Guo, Cheng, Wang, Ma, Zhang, Song (2024.12.17.628906v1.43) 2024
da Costa Avelar, Ou-Yang, Wu, Tsoka (2024.12.17.628906v1.15) 2024
Mathieu, Rainforth, Siddharth, Teh (2024.12.17.628906v1.29) 2019
Lotfollahi, Rybakov, Hrovatin, Hediyeh-Zadeh, Talavera-López, Misharin, Theis (2024.12.17.628906v1.7) 2023; 25
Liu, Lichtenberg, Hoadley, Poisson, Lazar, Cherniack, Kovatich, Benz, Levine, Lee (2024.12.17.628906v1.47) 2018; 173
Ma, Zhao, Xiao, Xu, Kou, Zhang, Wu, Wang, Du (2024.12.17.628906v1.51) 2021; 18
Virtanen, Gommers, Oliphant, Haberland, Reddy, Cournapeau, Burovski, Peterson, Weckesser, Bright, van der Walt, Brett, Wilson, Jarrod Millman, Mayorov, Nelson, Jones, Kern, Larson, Carey, Polat, Feng, Moore, VanderPlas, Laxalde, Perktold, Cimrman, Henriksen, Quintero, Harris, Archibald, Ribeiro, Pedregosa, van Mulbregt (2024.12.17.628906v1.45) 2020; 17
Ma, Zhang (2024.12.17.628906v1.9) 2019; 20
Fu, Li, Liu, Gao, Celikyilmaz, Carin (2024.12.17.628906v1.54) 2019
Rowe, Day (2024.12.17.628906v1.24) 2019; 21
Single-Cell Biology, Abdulla, Aevermann, Assis, Badajoz, Bell, Bezzi, Cakir, Chaffer, Chambers (2024.12.17.628906v1.50) 2023
Kim, Ionita, Lee, McKeague, Pattekar, Painter, Joost Wagenaar, Norton, Mathew (2024.12.17.628906v1.39) 2024
Li, Pei, Li (2024.12.17.628906v1.3) 2023; 138
Yang, Uhler (2024.12.17.628906v1.5) 2019
Lundberg (2024.12.17.628906v1.32) 2017
Doncevic, Herrmann (2024.12.17.628906v1.8) 2023; 39
Yang, Belyaeva, Venkatachalapathy, Damodaran, Katcoff, Radhakrishnan (2024.12.17.628906v1.23) 2021; 12
Hasin, Seldin, Lusis (2024.12.17.628906v1.16) 2017; 18
Virshup, Rybakov, Theis, Angerer (2024.12.17.628906v1.21) 2021
Kang, Ko, Mersha (2024.12.17.628906v1.2) 2022; 23
Ma, Jiang, Cheng, Xu (2024.12.17.628906v1.42) 2024
Franco, Rana, Cruz, Calderon, Azevedo, Ramos, Ghosh (2024.12.17.628906v1.11) 2021; 13
Polychronidou, Hou, Madan Babu, Liberali, Amit, Deplancke, Lahav, Itzkovitz, Mann, Saez-Rodriguez (2024.12.17.628906v1.40) 2023
Pedregosa, Varoquaux, Gramfort, Michel, Thirion, Grisel, Blondel, Prettenhofer, Weiss, Dubourg, Vanderplas, Passos, Cournapeau, Brucher, Perrot, Duchesnay (2024.12.17.628906v1.44) 2011; 12
Acharya, Mukhopadhyay (2024.12.17.628906v1.18) 2024
Esser-Skala, Fortelny (2024.12.17.628906v1.27) 2023; 9
Seninge, Anastopoulos, Ding, Stuart (2024.12.17.628906v1.6) 5684; 12
Hira, Razzaque, Angione, Scrivens, Sawan, Sarker (2024.12.17.628906v1.10) 2021; 11
Costa, Pérez, Sánchez (2024.12.17.628906v1.19) 2024
Eddahmani, Pham, Napoléon, Badoc, Fouefack, El-Bouz (2024.12.17.628906v1.31) 2023; 23
Huang, Tan, Lacoste, Courville (2024.12.17.628906v1.53) 2018; 31
Vincent, Larochelle, Bengio, Manzagol (2024.12.17.628906v1.35) 2008
Simidjievski, Bodnar, Tariq, Scherer, Terre, Shams, Jamnik, Liò (2024.12.17.628906v1.22) 2019; 10
Guo, Ye, Huang, Sakurai (2024.12.17.628906v1.14) 2024
Estermann, Wattenhofer (2024.12.17.628906v1.30) 2023
Weinstein, Collisson, Mills, Shaw, Ozenberger, Ellrott, Shmulevich, Sander, Stuart (2024.12.17.628906v1.17) 2013; 45
Ribeiro, Singh, Guestrin (2024.12.17.628906v1.34) 2016
Huang, Song, Shen, Hong, Gong, Deng, Zhang (2024.12.17.628906v1.13) 1313; 12
(2024.12.17.628906v1.46) 2020
Picard, Scott-Boyer, Bodein, Périn, Droit (2024.12.17.628906v1.1) 2021; 19
Cui, Wang, Maan, Pang, Luo, Duan (2024.12.17.628906v1.41) 2024
Eraslan, Simon, Mircea, Mueller, Theis (2024.12.17.628906v1.37) 2019; 10
References_xml – volume: 13
  start-page: 2013
  issue: 9
  year: 2021
  ident: 2024.12.17.628906v1.11
  article-title: Performance comparison of deep learning autoencoders for cancer subtype detection using multi-omics data
  publication-title: Cancers
– volume: 11
  start-page: 10912
  year: 2023
  end-page: 10924
  ident: 2024.12.17.628906v1.12
  article-title: Moanna: multi-omics autoencoder-based neural network algorithm for predicting breast cancer subtypes
  publication-title: IEEE Access
– start-page: 2024
  year: 2024
  end-page: 02
  ident: 2024.12.17.628906v1.39
  article-title: Cytometry masked autoencoder: An accurate and interpretable automated immunophenotyper
  publication-title: bioRxiv
– volume: 9
  start-page: eadg3754
  issue: 41
  year: 2023
  ident: 2024.12.17.628906v1.49
  article-title: Multiomic profiling of the developing human cerebral cortex at the single-cell level
  publication-title: Science Advances
– volume: 6
  start-page: pl1
  issue: 269
  year: 2013
  end-page: pl1
  ident: 2024.12.17.628906v1.48
  article-title: Integrative analysis of complex cancer genomics and clinical profiles using the cbioportal
  publication-title: Science Signaling
– year: 2020
  ident: 2024.12.17.628906v1.46
  article-title: scikit-learn-extra: a python module for machine learning that extends scikit-learn
– year: 2024
  ident: 2024.12.17.628906v1.15
  article-title: Pathway activity autoencoders for enhanced omics analysis and clinical interpretability
  publication-title: IEEE International Conference on Bioinformatics and Biomedicine
– volume: 32
  year: 2019
  ident: 2024.12.17.628906v1.20
  article-title: Pytorch: An imperative style, high-performance deep learning library
  publication-title: Advances in neural information processing systems
– start-page: 2024
  year: 2024
  end-page: 12
  ident: 2024.12.17.628906v1.28
  article-title: Visible neural networks for multi-omics integration: a critical review
  publication-title: bioRxiv
– start-page: 1
  year: 2024
  end-page: 2
  ident: 2024.12.17.628906v1.42
  article-title: Harnessing the deep learning power of foundation models in single-cell omics
  publication-title: Nature Reviews Molecular Cell Biology, pages
– volume: 12
  start-page: 2023
  issue: 10
  year: 1313
  ident: 2024.12.17.628906v1.13
  article-title: Deep learning methods for omics data imputation
  publication-title: Biology
– volume: 10
  start-page: 1205
  year: 2019
  ident: 2024.12.17.628906v1.22
  article-title: Variational autoencoders for cancer data integration: design principles and computational practice
  publication-title: Frontiers in Genetics
– volume: 31
  year: 2018
  ident: 2024.12.17.628906v1.53
  article-title: Improving explorability in variational inference with annealed variational objectives
  publication-title: Advances in Neural Information Processing Systems
– year: 2023
  ident: 2024.12.17.628906v1.30
  article-title: Dava: Disentangling adversarial variational autoencoder
  publication-title: arXiv preprint arXiv:2303.01384
– start-page: 1
  year: 2024
  end-page: 11
  ident: 2024.12.17.628906v1.41
  article-title: and Bo Wang. scgpt: toward building a foundation model for single-cell multi-omics using generative ai
  publication-title: Nature Methods
– volume: 39
  start-page: btad387
  issue: 6
  year: 2023
  ident: 2024.12.17.628906v1.8
  article-title: Biologically informed variational autoencoders allow predictive modeling of genetic and drug-induced perturbations
  publication-title: Bioinformatics
– start-page: 4402
  year: 2019
  end-page: 4412
  ident: 2024.12.17.628906v1.29
  publication-title: International Conference on Machine Learning
– year: 2015
  ident: 2024.12.17.628906v1.52
  article-title: Keras
– volume: 45
  start-page: 1113
  issue: 10
  year: 2013
  end-page: 1120
  ident: 2024.12.17.628906v1.17
  article-title: The cancer genome atlas pan-cancer analysis project
  publication-title: Nature Genetics
– volume: 25
  start-page: 337
  issue: 2
  year: 2023
  end-page: 350
  ident: 2024.12.17.628906v1.7
  article-title: Biologically informed deep learning to query gene programs in single-cell atlases
  publication-title: Nature Cell Biology
– start-page: 2021
  year: 2021
  end-page: 12
  ident: 2024.12.17.628906v1.21
  article-title: and F Alexander Wolf. anndata: Annotated data
  publication-title: bioRxiv
– start-page: 2023
  year: 2023
  end-page: 10
  ident: 2024.12.17.628906v1.50
  article-title: Cz cellxgene discover: A single-cell data platform for scalable exploration, analysis and modeling of aggregated data
  publication-title: bioRxiv
– volume: 10
  start-page: 390
  issue: 1
  year: 2019
  ident: 2024.12.17.628906v1.37
  article-title: Single-cell rna-seq denoising using a deep count autoencoder
  publication-title: Nature Communications
– year: 2017
  ident: 2024.12.17.628906v1.32
  article-title: A unified approach to interpreting model predictions
  publication-title: arXiv preprint arXiv:1705.07874
– volume: 173
  start-page: 400
  issue: 2
  year: 2018
  end-page: 416
  ident: 2024.12.17.628906v1.47
  article-title: An integrated tcga pan-cancer clinical data resource to drive high-quality survival outcome analytics
  publication-title: Cell
– volume: 12
  start-page: 31
  issue: 1
  year: 2021
  ident: 2024.12.17.628906v1.23
  article-title: GV Shiv the ashankar, and Caroline Uhler. Multi-domain translation between single-cell imaging and sequencing data using autoencoders
  publication-title: Nature Communications
– year: 2023
  ident: 2024.12.17.628906v1.40
  publication-title: Single-cell biology: what does the future hold?
– volume: 18
  start-page: 1
  year: 2017
  end-page: 15
  ident: 2024.12.17.628906v1.16
  article-title: Multi-omics approaches to disease
  publication-title: Genome Biology
– start-page: elae013
  year: 2024
  ident: 2024.12.17.628906v1.18
  article-title: A comprehensive review of machine learning techniques for multi-omics data integration: challenges and applications in precision oncology
  publication-title: Briefings in Functional Genomics
– volume: 138
  start-page: 110176
  year: 2023
  ident: 2024.12.17.628906v1.3
  article-title: A comprehensive survey on design and application of autoencoder in deep learning
  publication-title: Applied Soft Computing
– year: 2019
  ident: 2024.12.17.628906v1.54
  article-title: Cyclical annealing schedule: A simple approach to mitigating kl vanishing
  publication-title: arXiv preprint arXiv:1903.10145
– volume: 18
  start-page: 893
  year: 2021
  end-page: 902
  ident: 2024.12.17.628906v1.56
  article-title: A 4d single-cell protein atlas of transcription factors delineates spatiotemporal patterning during embryogenesis
  publication-title: Nature Methods
– volume: 17
  start-page: 261
  year: 2020
  end-page: 272
  ident: 2024.12.17.628906v1.45
  article-title: SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python
  publication-title: Nature Methods
– volume: 23
  start-page: 2362
  issue: 4
  year: 2023
  ident: 2024.12.17.628906v1.31
  article-title: Unsupervised learning of disentangled representation via auto-encoding: A survey
  publication-title: Sensors
– volume: 40
  start-page: btae599
  issue: 10
  year: 2024
  ident: 2024.12.17.628906v1.38
  article-title: and Guohua Wang. scdrmae: integrating masked autoencoder with residual attention networks to leverage omics feature dependencies for accurate cell clustering
  publication-title: Bioinformatics
– start-page: 3145
  year: 2017
  end-page: 3153
  ident: 2024.12.17.628906v1.33
  publication-title: International Conference on Machine Learning
– start-page: 2623
  year: 2019
  end-page: 2631
  ident: 2024.12.17.628906v1.25
  article-title: Optuna: A next-generation hyperparameter optimization framework
  publication-title: The 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
– volume: 23
  start-page: bbab454
  issue: 1
  year: 2022
  ident: 2024.12.17.628906v1.2
  article-title: A roadmap for multi-omics data integration using deep learning
  publication-title: Briefings in Bioinformatics
– volume: 9
  start-page: 50
  issue: 1
  year: 2023
  ident: 2024.12.17.628906v1.27
  article-title: Reliable interpretability of biology-inspired deep neural networks
  publication-title: NPJ Systems Biology and Applications
– volume: 52
  start-page: D672
  issue: D1
  year: 2024
  end-page: D678
  ident: 2024.12.17.628906v1.26
  article-title: The reactome pathway knowledgebase 2024
  publication-title: Nucleic Acids Research
– start-page: 1
  year: 2024
  end-page: 11
  ident: 2024.12.17.628906v1.43
  article-title: Large-scale foundation model on single-cell transcriptomics
  publication-title: Nature Methods, pages
– volume: 11
  start-page: 6265
  issue: 1
  year: 2021
  ident: 2024.12.17.628906v1.10
  article-title: Integrated multi-omics analysis of ovarian cancer using variational autoencoders
  publication-title: Scientific Reports
– start-page: 1096
  year: 2008
  end-page: 1103
  ident: 2024.12.17.628906v1.35
  article-title: Extracting and composing robust features with denoising autoencoders
  publication-title: Proceedings of the 25th international conference on Machine learning
– volume: 18
  start-page: 893
  issue: 8
  year: 2021
  end-page: 902
  ident: 2024.12.17.628906v1.51
  article-title: A 4d single-cell protein atlas of transcription factors delineates spatiotemporal patterning during embryogenesis
  publication-title: Nature Methods
– volume: 12
  start-page: 2825
  year: 2011
  end-page: 2830
  ident: 2024.12.17.628906v1.44
  article-title: Scikit-learn: Machine learning in Python
  publication-title: Journal of Machine Learning Research
– year: 2019
  ident: 2024.12.17.628906v1.5
  article-title: Multi-domain translation by learning uncoupled autoencoders
  publication-title: arXiv preprint arXiv:1902.03515
– volume: 20
  start-page: 944
  issue: Suppl 11
  year: 2019
  ident: 2024.12.17.628906v1.9
  article-title: Integrate multi-omics data with biological interaction networks using multi-view factorization autoencoder (mae)
  publication-title: BMC Genomics
– year: 2011
  ident: 2024.12.17.628906v1.55
  publication-title: Biomart: driving a paradigm change in biological data management
– year: 2024
  ident: 2024.12.17.628906v1.14
  article-title: Robust feature learning using contractive autoencoders for multi-omics clustering in cancer subtyping
  publication-title: Methods
– volume: 21
  start-page: 921
  issue: 10
  year: 2019
  ident: 2024.12.17.628906v1.24
  article-title: The sampling distribution of the total correlation for multivariate gaussian random variables
  publication-title: Entropy
– start-page: 1135
  year: 2016
  end-page: 1144
  ident: 2024.12.17.628906v1.34
  article-title: why should i trust you?" explaining the predictions of any classifier
  publication-title: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining
– volume: 12
  start-page: 2021
  issue: 1
  year: 5684
  ident: 2024.12.17.628906v1.6
  article-title: Vega is an interpretable generative model for inferring biological network activity in single-cell transcriptomics
  publication-title: Nature Communications
– volume: 57
  start-page: 28
  issue: 2
  year: 2024
  ident: 2024.12.17.628906v1.4
  article-title: Autoencoders and their applications in machine learning: a survey
  publication-title: Artificial Intelligence Review
– volume: 19
  start-page: 3735
  year: 2021
  end-page: 3746
  ident: 2024.12.17.628906v1.1
  article-title: Integration strategies of multi-omics data for machine learning analysis
  publication-title: Computational and Structural Biotechnology Journal
– year: 2024
  ident: 2024.12.17.628906v1.36
  article-title: Self-supervised coherence-based denoising on cryoseismological distributed acoustic sensing data
  publication-title: Authorea Preprints
– start-page: 1
  year: 2024
  end-page: 8
  ident: 2024.12.17.628906v1.19
  publication-title: 2024 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)
SSID ssj0002961374
Score 1.7437001
SecondaryResourceType preprint
Snippet Insights and discoveries in complex biological systems, e.g. for personalized medicine, are gained by the combination of large, feature-rich and...
SourceID biorxiv
proquest
SourceType Open Access Repository
Aggregation Database
SubjectTerms Bioinformatics
Deep learning
Embedding
Medical research
Ontology
Precision medicine
Sensory integration
Title A generalized and versatile framework to train and evaluate autoencoders for biological representation learning and beyond: AUTOENCODIX
URI https://www.proquest.com/docview/3147576682
https://www.biorxiv.org/content/10.1101/2024.12.17.628906
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1bS8MwFA66KfjkHS9zRPC12qZJ0_giXiYKOouozKfSNKkMpJ3bHOof8G-bk2b6IPjkaxPCoSc5OZcv30FoTxDOJROBp2KeezSKBdjBzBOh1oKYcEzbVPbDFe92415PJO559MjBKqc20Rrqmu0ZcNvGCB-oKoeM-UEYUG485SgmR4MXD3pIQa3VNdSYRU0g3vIbqJlcXieP3zkXIszlZYmZjRTGEBCfuUKn2ZiQBqCQGgz4fgTlN3CJZb8avvUnvwy1vX3OF_9X7iUjbzbQw2U0o8sVNF93o3xfRZ_H-Kkmoe5_aIWzUmHAbBjVPWtcTEFceFxh21fCTnBk4Rpnr-MKODEBF42NI4xrdifYAtgSZ04fOZXYNap4sgtI-4LmEJtzctPpnt6cXfbW0P155-70wnN9GjwJFWIvNF4YDSNf5iJnOdVKMcYklCAVN58IjYX0JY0p1RGRZkYUZiYsKwhXpIiZCtdRo6xKvYEwCbmJx5ivGeM0KjKZSymFEgXlmZIZ30S7TiXpoGbjSEFtaUDSgKe12jZRa6qJ1B3IUfrz47f-Ht5GC7AiIFaI30KN8fBV76C5fDLuj4Zt1DzpdJPbNkBEk7bbX1_Rtdyu
linkProvider ProQuest
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Lb9QwEB6VLag98VZfgCuVYyBx7DiuhBDqQ111u91DQcspxLFTrYSSZXf7_AP9N_xGZpwEDkjceuCaWJbs-Tyel78B2NFcKSN1FNhUFYFIUk16MA907Jzm6I45H8r-MlDDYToe69ES_OzewlBZZacTvaK2dUEx8vdxJBTaxknKP05_BNQ1irKrXQuNBhbH7uYKXbb5h_4-yvct54cHZ3tHQdtVIDCUzwxitBlEnISm0IUshLNWSmkoYWYVfuIi1SY0IhXCJdzgiCTO0YkoubK8TKWNcd4HsCwQ7GEPlkf9k9HX31EdrvF69NTPuE5UNTyUbSoVoU-BBkHBx0i9SyjBR0a3mdSz68nlX1eBv98OH_9vO_MEdySfutlTWHLVM3jUdNS8eQ53n9h5Q6Q9uXWW5ZVlVHeC8PvuWNkVorFFzXxvDD-gJTx3LL9Y1MTrSbXdDI151jBUEYyZJ__sHmpVrG22ce4nMP4V0C7Ds356MNw73e-PX8Dne1n7S-hVdeXWgPFYoU8pQyelEkmZm8IYo60uhcqtydU6bLdCz6YNo0hGwMginkUqa4CxDludrLNWqcyzP4Le-PfvN7BydHYyyAb94fEmrNLsVIHDwy3oLWYX7hU8LC4Xk_nsdYtfBt_uGxi_ADzWKQE
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+generalized+and+versatile+framework+to+train+and+evaluate+autoencoders+for+biological+representation+learning+and+beyond%3A+AUTOENCODIX&rft.jtitle=bioRxiv&rft.au=Joas%2C+Maximilian&rft.au=Jurenaite%2C+Neringa&rft.au=Pra%C5%A1%C4%8Devi%C4%87%2C+Du%C5%A1an&rft.au=Scherf%2C+Nico&rft.date=2024-12-20&rft.pub=Cold+Spring+Harbor+Laboratory+Press&rft.issn=2692-8205&rft.eissn=2692-8205&rft_id=info:doi/10.1101%2F2024.12.17.628906
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2692-8205&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2692-8205&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2692-8205&client=summon