A generalized and versatile framework to train and evaluate autoencoders for biological representation learning and beyond: AUTOENCODIX
Insights and discoveries in complex biological systems, e.g. for personalized medicine, are gained by the combination of large, feature-rich and high-dimensional data with powerful computational methods uncovering patterns and relationships. In recent years, autoencoders, a family of deep learning-b...
Gespeichert in:
| Veröffentlicht in: | bioRxiv |
|---|---|
| Hauptverfasser: | , , , , |
| Format: | Paper |
| Sprache: | Englisch |
| Veröffentlicht: |
Cold Spring Harbor
Cold Spring Harbor Laboratory Press
20.12.2024
Cold Spring Harbor Laboratory |
| Ausgabe: | 1.1 |
| Schlagworte: | |
| ISSN: | 2692-8205, 2692-8205 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Insights and discoveries in complex biological systems, e.g. for personalized medicine, are gained by the combination of large, feature-rich and high-dimensional data with powerful computational methods uncovering patterns and relationships. In recent years, autoencoders, a family of deep learning-based methods for representation learning, are advancing data-driven research due to their variability and non-linear power of multi-modal data integration. Despite their success, current implementations lack standardization, versatility, comparability, and generalizability preventing a broad application. To fill the gap, we present AUTOENCODIX (https://github.com/jan-forest/autoencodix), an open-source framework, designed as a standardized and flexible pipeline for preprocessing, training, and evaluation of autoencoder architectures. These architectures, like ontology-based and cross-modal autoencoders, provide key advantages over traditional methods via explainability of embeddings or the ability to translate across data modalities. We show the value of our framework by its application to data sets from pan-cancer studies (TCGA), single-cell sequencing as well as in combination with imaging. Our studies provide important user-centric insights and recommendations to navigate through architectures, hyperparameters, and important trade-offs in representation learning. Those include reconstruction capability of input data, the quality of embedding for downstream machine learning models, or the reliability of ontology-based embeddings for explainability. In summary, our versatile and generalizable framework allows multi-modal data integration in biomedical research and any other data-driven fields of research. Hence, it can serve as a open-source platform for several major trends and research using autoencoders including architectural improvements, explainability, or training of large-scale pre-trained models.Competing Interest StatementThe authors have declared no competing interest.Footnotes* https://github.com/jan-forest/autoencodix |
|---|---|
| AbstractList | Insights and discoveries in complex biological systems, e.g. for personalized medicine, are gained by the combination of large, feature-rich and high-dimensional data with powerful computational methods uncovering patterns and relationships. In recent years, autoencoders, a family of deep learning-based methods for representation learning, are advancing data-driven research due to their variability and non-linear power of multi-modal data integration. Despite their success, current implementations lack standardization, versatility, comparability, and generalizability preventing a broad application. To fill the gap, we present AUTOENCODIX (https://github.com/jan-forest/autoencodix), an open-source framework, designed as a standardized and flexible pipeline for preprocessing, training, and evaluation of autoencoder architectures. These architectures, like ontology-based and cross-modal autoencoders, provide key advantages over traditional methods via explainability of embeddings or the ability to translate across data modalities. We show the value of our framework by its application to data sets from pan-cancer studies (TCGA), single-cell sequencing as well as in combination with imaging. Our studies provide important user-centric insights and recommendations to navigate through architectures, hyperparameters, and important trade-offs in representation learning. Those include reconstruction capability of input data, the quality of embedding for downstream machine learning models, or the reliability of ontology-based embeddings for explainability. In summary, our versatile and generalizable framework allows multi-modal data integration in biomedical research and any other data-driven fields of research. Hence, it can serve as a open-source platform for several major trends and research using autoencoders including architectural improvements, explainability, or training of large-scale pre-trained models. Insights and discoveries in complex biological systems, e.g. for personalized medicine, are gained by the combination of large, feature-rich and high-dimensional data with powerful computational methods uncovering patterns and relationships. In recent years, autoencoders, a family of deep learning-based methods for representation learning, are advancing data-driven research due to their variability and non-linear power of multi-modal data integration. Despite their success, current implementations lack standardization, versatility, comparability, and generalizability preventing a broad application. To fill the gap, we present AUTOENCODIX (https://github.com/jan-forest/autoencodix), an open-source framework, designed as a standardized and flexible pipeline for preprocessing, training, and evaluation of autoencoder architectures. These architectures, like ontology-based and cross-modal autoencoders, provide key advantages over traditional methods via explainability of embeddings or the ability to translate across data modalities. We show the value of our framework by its application to data sets from pan-cancer studies (TCGA), single-cell sequencing as well as in combination with imaging. Our studies provide important user-centric insights and recommendations to navigate through architectures, hyperparameters, and important trade-offs in representation learning. Those include reconstruction capability of input data, the quality of embedding for downstream machine learning models, or the reliability of ontology-based embeddings for explainability. In summary, our versatile and generalizable framework allows multi-modal data integration in biomedical research and any other data-driven fields of research. Hence, it can serve as a open-source platform for several major trends and research using autoencoders including architectural improvements, explainability, or training of large-scale pre-trained models.Competing Interest StatementThe authors have declared no competing interest.Footnotes* https://github.com/jan-forest/autoencodix |
| Author | Jurenaite, Neringa Scherf, Nico Ewald, Jan Joas, Maximilian Praščević, Dušan |
| Author_xml | – sequence: 1 givenname: Maximilian surname: Joas fullname: Joas, Maximilian – sequence: 2 givenname: Neringa surname: Jurenaite fullname: Jurenaite, Neringa – sequence: 3 givenname: Dušan surname: Praščević fullname: Praščević, Dušan – sequence: 4 givenname: Nico surname: Scherf fullname: Scherf, Nico – sequence: 5 givenname: Jan surname: Ewald fullname: Ewald, Jan |
| BookMark | eNpNkM1OwkAURicGExF5AHeTuHFDnf-27giikhDZQOKumenckmKZwWmL4gv42lZw4ere3Jx78uW7RD3nHSB0TUlEKaF3jDARURbROFIsSYk6Q32mUjZKGJG9f_sFGtb1hhDCUkV5LProe4zX4CDoqvwCi7WzeA-h1k1ZAS6C3sKHD2-48bgJunRHAPa6anUDWLeNB5d7233gwgdsSl_5dZnrCgfYBajBNZ3KO1yBDq5066PAwME7e4_Hq-Vi-jJZPMxer9B5oasahn9zgFaP0-XkeTRfPM0m4_nIUCLUiEshBVfE5GkucwHWSikN4Qm1cXdiIkkNMSIRAhQzHaG4TqUqWGxZkUjLB-j25O2ihs9yn-1CudXhkP12mFGW0Tg7ddihNyd0F_x7C3WTbXwbXJcu41TEMlYqYfwHof9zsw |
| ContentType | Paper |
| Copyright | 2024. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. 2024, Posted by Cold Spring Harbor Laboratory |
| Copyright_xml | – notice: 2024. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: 2024, Posted by Cold Spring Harbor Laboratory |
| DBID | 8FE 8FH ABUWG AFKRA AZQEC BBNVY BENPR BHPHI CCPQU DWQXO GNUQQ HCIFZ LK8 M7P PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS FX. |
| DOI | 10.1101/2024.12.17.628906 |
| DatabaseName | ProQuest SciTech Collection ProQuest Natural Science Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials Biological Science Collection ProQuest Central Natural Science Collection ProQuest One ProQuest Central Korea ProQuest Central Student SciTech Collection (ProQuest) ProQuest Biological Science Collection Biological Science Database ProQuest Central Premium ProQuest One Academic (New) Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition ProQuest Central China bioRxiv |
| DatabaseTitle | Publicly Available Content Database ProQuest Central Student ProQuest One Academic Middle East (New) ProQuest Biological Science Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Natural Science Collection Biological Science Database ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest One Academic UKI Edition Natural Science Collection ProQuest Central Korea Biological Science Collection ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New) |
| DatabaseTitleList | Publicly Available Content Database |
| Database_xml | – sequence: 1 dbid: PIMPY name: Publicly Available Content Database url: http://search.proquest.com/publiccontent sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Biology |
| EISSN | 2692-8205 |
| Edition | 1.1 |
| ExternalDocumentID | 2024.12.17.628906v1 |
| Genre | Working Paper/Pre-Print |
| GroupedDBID | 8FE 8FH ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BBNVY BENPR BHPHI CCPQU DWQXO GNUQQ HCIFZ LK8 M7P NQS PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PROAC RHI FX. |
| ID | FETCH-LOGICAL-b1046-35454360bc9c5c4edd555b0381d7c9c2489b0b4844e62b5c463a956f27d2f85d3 |
| IEDL.DBID | PIMPY |
| ISSN | 2692-8205 |
| IngestDate | Tue Dec 31 19:38:58 EST 2024 Fri Jul 25 09:17:50 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| License | This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at http://creativecommons.org/licenses/by/4.0 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-b1046-35454360bc9c5c4edd555b0381d7c9c2489b0b4844e62b5c463a956f27d2f85d3 |
| Notes | SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50 Competing Interest Statement: The authors have declared no competing interest. |
| ORCID | 0000-0002-9415-2317 0009-0005-2173-0621 0000-0003-4003-9121 0000-0002-3959-7452 |
| OpenAccessLink | https://www.proquest.com/publiccontent/docview/3147576682?pq-origsite=%requestingapplication% |
| PQID | 3147576682 |
| PQPubID | 2050091 |
| PageCount | 26 |
| ParticipantIDs | biorxiv_primary_2024_12_17_628906 proquest_journals_3147576682 |
| PublicationCentury | 2000 |
| PublicationDate | 20241220 |
| PublicationDateYYYYMMDD | 2024-12-20 |
| PublicationDate_xml | – month: 12 year: 2024 text: 20241220 day: 20 |
| PublicationDecade | 2020 |
| PublicationPlace | Cold Spring Harbor |
| PublicationPlace_xml | – name: Cold Spring Harbor |
| PublicationTitle | bioRxiv |
| PublicationYear | 2024 |
| Publisher | Cold Spring Harbor Laboratory Press Cold Spring Harbor Laboratory |
| Publisher_xml | – name: Cold Spring Harbor Laboratory Press – name: Cold Spring Harbor Laboratory |
| References | Zhu, Bendl, Rahman, Vicari, Coleman, Clarence, Latouche, Tsankova, Li, Brennand (2024.12.17.628906v1.49) 2023; 9 Kasprzyk (2024.12.17.628906v1.55) 2011 Lupat, Perera, Loi, Li (2024.12.17.628906v1.12) 2023; 11 Paszke, Gross, Massa, Lerer, Bradbury, Chanan, Killeen, Lin, Gimelshein, Antiga (2024.12.17.628906v1.20) 2019; 32 Ma, Zhao, Xiao (2024.12.17.628906v1.56) 2021; 18 Berahmand, Daneshfar, Salehi, Li, Xu (2024.12.17.628906v1.4) 2024; 57 Chollet (2024.12.17.628906v1.52) 2015 Selby, Jakhmola, Sprang, Grossmann, Raki, Maani, Pavliuk, Ewald, Vollmer (2024.12.17.628906v1.28) 2024 Zitt, Paitz, Walter, Umlauft (2024.12.17.628906v1.36) 2024 Gao, Aksoy, Dogrusoz, Dresdner, Gross, Onur Sumer, Sun, Jacobsen, Sinha, Larsson (2024.12.17.628906v1.48) 2013; 6 Akiba, Sano, Yanase, Ohta, Koyama (2024.12.17.628906v1.25) 2019 Shrikumar, Greenside, Kundaje (2024.12.17.628906v1.33) 2017 Zhang, Zhang, Ren, Wu, Zhao (2024.12.17.628906v1.38) 2024; 40 Milacic, Beavers, Conley, Gong, Gillespie, Griss, Haw, Jassal, Matthews, May (2024.12.17.628906v1.26) 2024; 52 Hao, Gong, Zeng, Liu, Guo, Cheng, Wang, Ma, Zhang, Song (2024.12.17.628906v1.43) 2024 da Costa Avelar, Ou-Yang, Wu, Tsoka (2024.12.17.628906v1.15) 2024 Mathieu, Rainforth, Siddharth, Teh (2024.12.17.628906v1.29) 2019 Lotfollahi, Rybakov, Hrovatin, Hediyeh-Zadeh, Talavera-López, Misharin, Theis (2024.12.17.628906v1.7) 2023; 25 Liu, Lichtenberg, Hoadley, Poisson, Lazar, Cherniack, Kovatich, Benz, Levine, Lee (2024.12.17.628906v1.47) 2018; 173 Ma, Zhao, Xiao, Xu, Kou, Zhang, Wu, Wang, Du (2024.12.17.628906v1.51) 2021; 18 Virtanen, Gommers, Oliphant, Haberland, Reddy, Cournapeau, Burovski, Peterson, Weckesser, Bright, van der Walt, Brett, Wilson, Jarrod Millman, Mayorov, Nelson, Jones, Kern, Larson, Carey, Polat, Feng, Moore, VanderPlas, Laxalde, Perktold, Cimrman, Henriksen, Quintero, Harris, Archibald, Ribeiro, Pedregosa, van Mulbregt (2024.12.17.628906v1.45) 2020; 17 Ma, Zhang (2024.12.17.628906v1.9) 2019; 20 Fu, Li, Liu, Gao, Celikyilmaz, Carin (2024.12.17.628906v1.54) 2019 Rowe, Day (2024.12.17.628906v1.24) 2019; 21 Single-Cell Biology, Abdulla, Aevermann, Assis, Badajoz, Bell, Bezzi, Cakir, Chaffer, Chambers (2024.12.17.628906v1.50) 2023 Kim, Ionita, Lee, McKeague, Pattekar, Painter, Joost Wagenaar, Norton, Mathew (2024.12.17.628906v1.39) 2024 Li, Pei, Li (2024.12.17.628906v1.3) 2023; 138 Yang, Uhler (2024.12.17.628906v1.5) 2019 Lundberg (2024.12.17.628906v1.32) 2017 Doncevic, Herrmann (2024.12.17.628906v1.8) 2023; 39 Yang, Belyaeva, Venkatachalapathy, Damodaran, Katcoff, Radhakrishnan (2024.12.17.628906v1.23) 2021; 12 Hasin, Seldin, Lusis (2024.12.17.628906v1.16) 2017; 18 Virshup, Rybakov, Theis, Angerer (2024.12.17.628906v1.21) 2021 Kang, Ko, Mersha (2024.12.17.628906v1.2) 2022; 23 Ma, Jiang, Cheng, Xu (2024.12.17.628906v1.42) 2024 Franco, Rana, Cruz, Calderon, Azevedo, Ramos, Ghosh (2024.12.17.628906v1.11) 2021; 13 Polychronidou, Hou, Madan Babu, Liberali, Amit, Deplancke, Lahav, Itzkovitz, Mann, Saez-Rodriguez (2024.12.17.628906v1.40) 2023 Pedregosa, Varoquaux, Gramfort, Michel, Thirion, Grisel, Blondel, Prettenhofer, Weiss, Dubourg, Vanderplas, Passos, Cournapeau, Brucher, Perrot, Duchesnay (2024.12.17.628906v1.44) 2011; 12 Acharya, Mukhopadhyay (2024.12.17.628906v1.18) 2024 Esser-Skala, Fortelny (2024.12.17.628906v1.27) 2023; 9 Seninge, Anastopoulos, Ding, Stuart (2024.12.17.628906v1.6) 5684; 12 Hira, Razzaque, Angione, Scrivens, Sawan, Sarker (2024.12.17.628906v1.10) 2021; 11 Costa, Pérez, Sánchez (2024.12.17.628906v1.19) 2024 Eddahmani, Pham, Napoléon, Badoc, Fouefack, El-Bouz (2024.12.17.628906v1.31) 2023; 23 Huang, Tan, Lacoste, Courville (2024.12.17.628906v1.53) 2018; 31 Vincent, Larochelle, Bengio, Manzagol (2024.12.17.628906v1.35) 2008 Simidjievski, Bodnar, Tariq, Scherer, Terre, Shams, Jamnik, Liò (2024.12.17.628906v1.22) 2019; 10 Guo, Ye, Huang, Sakurai (2024.12.17.628906v1.14) 2024 Estermann, Wattenhofer (2024.12.17.628906v1.30) 2023 Weinstein, Collisson, Mills, Shaw, Ozenberger, Ellrott, Shmulevich, Sander, Stuart (2024.12.17.628906v1.17) 2013; 45 Ribeiro, Singh, Guestrin (2024.12.17.628906v1.34) 2016 Huang, Song, Shen, Hong, Gong, Deng, Zhang (2024.12.17.628906v1.13) 1313; 12 (2024.12.17.628906v1.46) 2020 Picard, Scott-Boyer, Bodein, Périn, Droit (2024.12.17.628906v1.1) 2021; 19 Cui, Wang, Maan, Pang, Luo, Duan (2024.12.17.628906v1.41) 2024 Eraslan, Simon, Mircea, Mueller, Theis (2024.12.17.628906v1.37) 2019; 10 |
| References_xml | – volume: 13 start-page: 2013 issue: 9 year: 2021 ident: 2024.12.17.628906v1.11 article-title: Performance comparison of deep learning autoencoders for cancer subtype detection using multi-omics data publication-title: Cancers – volume: 11 start-page: 10912 year: 2023 end-page: 10924 ident: 2024.12.17.628906v1.12 article-title: Moanna: multi-omics autoencoder-based neural network algorithm for predicting breast cancer subtypes publication-title: IEEE Access – start-page: 2024 year: 2024 end-page: 02 ident: 2024.12.17.628906v1.39 article-title: Cytometry masked autoencoder: An accurate and interpretable automated immunophenotyper publication-title: bioRxiv – volume: 9 start-page: eadg3754 issue: 41 year: 2023 ident: 2024.12.17.628906v1.49 article-title: Multiomic profiling of the developing human cerebral cortex at the single-cell level publication-title: Science Advances – volume: 6 start-page: pl1 issue: 269 year: 2013 end-page: pl1 ident: 2024.12.17.628906v1.48 article-title: Integrative analysis of complex cancer genomics and clinical profiles using the cbioportal publication-title: Science Signaling – year: 2020 ident: 2024.12.17.628906v1.46 article-title: scikit-learn-extra: a python module for machine learning that extends scikit-learn – year: 2024 ident: 2024.12.17.628906v1.15 article-title: Pathway activity autoencoders for enhanced omics analysis and clinical interpretability publication-title: IEEE International Conference on Bioinformatics and Biomedicine – volume: 32 year: 2019 ident: 2024.12.17.628906v1.20 article-title: Pytorch: An imperative style, high-performance deep learning library publication-title: Advances in neural information processing systems – start-page: 2024 year: 2024 end-page: 12 ident: 2024.12.17.628906v1.28 article-title: Visible neural networks for multi-omics integration: a critical review publication-title: bioRxiv – start-page: 1 year: 2024 end-page: 2 ident: 2024.12.17.628906v1.42 article-title: Harnessing the deep learning power of foundation models in single-cell omics publication-title: Nature Reviews Molecular Cell Biology, pages – volume: 12 start-page: 2023 issue: 10 year: 1313 ident: 2024.12.17.628906v1.13 article-title: Deep learning methods for omics data imputation publication-title: Biology – volume: 10 start-page: 1205 year: 2019 ident: 2024.12.17.628906v1.22 article-title: Variational autoencoders for cancer data integration: design principles and computational practice publication-title: Frontiers in Genetics – volume: 31 year: 2018 ident: 2024.12.17.628906v1.53 article-title: Improving explorability in variational inference with annealed variational objectives publication-title: Advances in Neural Information Processing Systems – year: 2023 ident: 2024.12.17.628906v1.30 article-title: Dava: Disentangling adversarial variational autoencoder publication-title: arXiv preprint arXiv:2303.01384 – start-page: 1 year: 2024 end-page: 11 ident: 2024.12.17.628906v1.41 article-title: and Bo Wang. scgpt: toward building a foundation model for single-cell multi-omics using generative ai publication-title: Nature Methods – volume: 39 start-page: btad387 issue: 6 year: 2023 ident: 2024.12.17.628906v1.8 article-title: Biologically informed variational autoencoders allow predictive modeling of genetic and drug-induced perturbations publication-title: Bioinformatics – start-page: 4402 year: 2019 end-page: 4412 ident: 2024.12.17.628906v1.29 publication-title: International Conference on Machine Learning – year: 2015 ident: 2024.12.17.628906v1.52 article-title: Keras – volume: 45 start-page: 1113 issue: 10 year: 2013 end-page: 1120 ident: 2024.12.17.628906v1.17 article-title: The cancer genome atlas pan-cancer analysis project publication-title: Nature Genetics – volume: 25 start-page: 337 issue: 2 year: 2023 end-page: 350 ident: 2024.12.17.628906v1.7 article-title: Biologically informed deep learning to query gene programs in single-cell atlases publication-title: Nature Cell Biology – start-page: 2021 year: 2021 end-page: 12 ident: 2024.12.17.628906v1.21 article-title: and F Alexander Wolf. anndata: Annotated data publication-title: bioRxiv – start-page: 2023 year: 2023 end-page: 10 ident: 2024.12.17.628906v1.50 article-title: Cz cellxgene discover: A single-cell data platform for scalable exploration, analysis and modeling of aggregated data publication-title: bioRxiv – volume: 10 start-page: 390 issue: 1 year: 2019 ident: 2024.12.17.628906v1.37 article-title: Single-cell rna-seq denoising using a deep count autoencoder publication-title: Nature Communications – year: 2017 ident: 2024.12.17.628906v1.32 article-title: A unified approach to interpreting model predictions publication-title: arXiv preprint arXiv:1705.07874 – volume: 173 start-page: 400 issue: 2 year: 2018 end-page: 416 ident: 2024.12.17.628906v1.47 article-title: An integrated tcga pan-cancer clinical data resource to drive high-quality survival outcome analytics publication-title: Cell – volume: 12 start-page: 31 issue: 1 year: 2021 ident: 2024.12.17.628906v1.23 article-title: GV Shiv the ashankar, and Caroline Uhler. Multi-domain translation between single-cell imaging and sequencing data using autoencoders publication-title: Nature Communications – year: 2023 ident: 2024.12.17.628906v1.40 publication-title: Single-cell biology: what does the future hold? – volume: 18 start-page: 1 year: 2017 end-page: 15 ident: 2024.12.17.628906v1.16 article-title: Multi-omics approaches to disease publication-title: Genome Biology – start-page: elae013 year: 2024 ident: 2024.12.17.628906v1.18 article-title: A comprehensive review of machine learning techniques for multi-omics data integration: challenges and applications in precision oncology publication-title: Briefings in Functional Genomics – volume: 138 start-page: 110176 year: 2023 ident: 2024.12.17.628906v1.3 article-title: A comprehensive survey on design and application of autoencoder in deep learning publication-title: Applied Soft Computing – year: 2019 ident: 2024.12.17.628906v1.54 article-title: Cyclical annealing schedule: A simple approach to mitigating kl vanishing publication-title: arXiv preprint arXiv:1903.10145 – volume: 18 start-page: 893 year: 2021 end-page: 902 ident: 2024.12.17.628906v1.56 article-title: A 4d single-cell protein atlas of transcription factors delineates spatiotemporal patterning during embryogenesis publication-title: Nature Methods – volume: 17 start-page: 261 year: 2020 end-page: 272 ident: 2024.12.17.628906v1.45 article-title: SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python publication-title: Nature Methods – volume: 23 start-page: 2362 issue: 4 year: 2023 ident: 2024.12.17.628906v1.31 article-title: Unsupervised learning of disentangled representation via auto-encoding: A survey publication-title: Sensors – volume: 40 start-page: btae599 issue: 10 year: 2024 ident: 2024.12.17.628906v1.38 article-title: and Guohua Wang. scdrmae: integrating masked autoencoder with residual attention networks to leverage omics feature dependencies for accurate cell clustering publication-title: Bioinformatics – start-page: 3145 year: 2017 end-page: 3153 ident: 2024.12.17.628906v1.33 publication-title: International Conference on Machine Learning – start-page: 2623 year: 2019 end-page: 2631 ident: 2024.12.17.628906v1.25 article-title: Optuna: A next-generation hyperparameter optimization framework publication-title: The 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining – volume: 23 start-page: bbab454 issue: 1 year: 2022 ident: 2024.12.17.628906v1.2 article-title: A roadmap for multi-omics data integration using deep learning publication-title: Briefings in Bioinformatics – volume: 9 start-page: 50 issue: 1 year: 2023 ident: 2024.12.17.628906v1.27 article-title: Reliable interpretability of biology-inspired deep neural networks publication-title: NPJ Systems Biology and Applications – volume: 52 start-page: D672 issue: D1 year: 2024 end-page: D678 ident: 2024.12.17.628906v1.26 article-title: The reactome pathway knowledgebase 2024 publication-title: Nucleic Acids Research – start-page: 1 year: 2024 end-page: 11 ident: 2024.12.17.628906v1.43 article-title: Large-scale foundation model on single-cell transcriptomics publication-title: Nature Methods, pages – volume: 11 start-page: 6265 issue: 1 year: 2021 ident: 2024.12.17.628906v1.10 article-title: Integrated multi-omics analysis of ovarian cancer using variational autoencoders publication-title: Scientific Reports – start-page: 1096 year: 2008 end-page: 1103 ident: 2024.12.17.628906v1.35 article-title: Extracting and composing robust features with denoising autoencoders publication-title: Proceedings of the 25th international conference on Machine learning – volume: 18 start-page: 893 issue: 8 year: 2021 end-page: 902 ident: 2024.12.17.628906v1.51 article-title: A 4d single-cell protein atlas of transcription factors delineates spatiotemporal patterning during embryogenesis publication-title: Nature Methods – volume: 12 start-page: 2825 year: 2011 end-page: 2830 ident: 2024.12.17.628906v1.44 article-title: Scikit-learn: Machine learning in Python publication-title: Journal of Machine Learning Research – year: 2019 ident: 2024.12.17.628906v1.5 article-title: Multi-domain translation by learning uncoupled autoencoders publication-title: arXiv preprint arXiv:1902.03515 – volume: 20 start-page: 944 issue: Suppl 11 year: 2019 ident: 2024.12.17.628906v1.9 article-title: Integrate multi-omics data with biological interaction networks using multi-view factorization autoencoder (mae) publication-title: BMC Genomics – year: 2011 ident: 2024.12.17.628906v1.55 publication-title: Biomart: driving a paradigm change in biological data management – year: 2024 ident: 2024.12.17.628906v1.14 article-title: Robust feature learning using contractive autoencoders for multi-omics clustering in cancer subtyping publication-title: Methods – volume: 21 start-page: 921 issue: 10 year: 2019 ident: 2024.12.17.628906v1.24 article-title: The sampling distribution of the total correlation for multivariate gaussian random variables publication-title: Entropy – start-page: 1135 year: 2016 end-page: 1144 ident: 2024.12.17.628906v1.34 article-title: why should i trust you?" explaining the predictions of any classifier publication-title: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining – volume: 12 start-page: 2021 issue: 1 year: 5684 ident: 2024.12.17.628906v1.6 article-title: Vega is an interpretable generative model for inferring biological network activity in single-cell transcriptomics publication-title: Nature Communications – volume: 57 start-page: 28 issue: 2 year: 2024 ident: 2024.12.17.628906v1.4 article-title: Autoencoders and their applications in machine learning: a survey publication-title: Artificial Intelligence Review – volume: 19 start-page: 3735 year: 2021 end-page: 3746 ident: 2024.12.17.628906v1.1 article-title: Integration strategies of multi-omics data for machine learning analysis publication-title: Computational and Structural Biotechnology Journal – year: 2024 ident: 2024.12.17.628906v1.36 article-title: Self-supervised coherence-based denoising on cryoseismological distributed acoustic sensing data publication-title: Authorea Preprints – start-page: 1 year: 2024 end-page: 8 ident: 2024.12.17.628906v1.19 publication-title: 2024 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) |
| SSID | ssj0002961374 |
| Score | 1.7437001 |
| SecondaryResourceType | preprint |
| Snippet | Insights and discoveries in complex biological systems, e.g. for personalized medicine, are gained by the combination of large, feature-rich and... |
| SourceID | biorxiv proquest |
| SourceType | Open Access Repository Aggregation Database |
| SubjectTerms | Bioinformatics Deep learning Embedding Medical research Ontology Precision medicine Sensory integration |
| Title | A generalized and versatile framework to train and evaluate autoencoders for biological representation learning and beyond: AUTOENCODIX |
| URI | https://www.proquest.com/docview/3147576682 https://www.biorxiv.org/content/10.1101/2024.12.17.628906 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1bS8MwFA66KfjkHS9zRPC12qZJ0_giXiYKOouozKfSNKkMpJ3bHOof8G-bk2b6IPjkaxPCoSc5OZcv30FoTxDOJROBp2KeezSKBdjBzBOh1oKYcEzbVPbDFe92415PJO559MjBKqc20Rrqmu0ZcNvGCB-oKoeM-UEYUG485SgmR4MXD3pIQa3VNdSYRU0g3vIbqJlcXieP3zkXIszlZYmZjRTGEBCfuUKn2ZiQBqCQGgz4fgTlN3CJZb8avvUnvwy1vX3OF_9X7iUjbzbQw2U0o8sVNF93o3xfRZ_H-Kkmoe5_aIWzUmHAbBjVPWtcTEFceFxh21fCTnBk4Rpnr-MKODEBF42NI4xrdifYAtgSZ04fOZXYNap4sgtI-4LmEJtzctPpnt6cXfbW0P155-70wnN9GjwJFWIvNF4YDSNf5iJnOdVKMcYklCAVN58IjYX0JY0p1RGRZkYUZiYsKwhXpIiZCtdRo6xKvYEwCbmJx5ivGeM0KjKZSymFEgXlmZIZ30S7TiXpoGbjSEFtaUDSgKe12jZRa6qJ1B3IUfrz47f-Ht5GC7AiIFaI30KN8fBV76C5fDLuj4Zt1DzpdJPbNkBEk7bbX1_Rtdyu |
| linkProvider | ProQuest |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Lb9QwEB6VLag98VZfgCuVYyBx7DiuhBDqQ111u91DQcspxLFTrYSSZXf7_AP9N_xGZpwEDkjceuCaWJbs-Tyel78B2NFcKSN1FNhUFYFIUk16MA907Jzm6I45H8r-MlDDYToe69ES_OzewlBZZacTvaK2dUEx8vdxJBTaxknKP05_BNQ1irKrXQuNBhbH7uYKXbb5h_4-yvct54cHZ3tHQdtVIDCUzwxitBlEnISm0IUshLNWSmkoYWYVfuIi1SY0IhXCJdzgiCTO0YkoubK8TKWNcd4HsCwQ7GEPlkf9k9HX31EdrvF69NTPuE5UNTyUbSoVoU-BBkHBx0i9SyjBR0a3mdSz68nlX1eBv98OH_9vO_MEdySfutlTWHLVM3jUdNS8eQ53n9h5Q6Q9uXWW5ZVlVHeC8PvuWNkVorFFzXxvDD-gJTx3LL9Y1MTrSbXdDI151jBUEYyZJ__sHmpVrG22ce4nMP4V0C7Ds356MNw73e-PX8Dne1n7S-hVdeXWgPFYoU8pQyelEkmZm8IYo60uhcqtydU6bLdCz6YNo0hGwMginkUqa4CxDludrLNWqcyzP4Le-PfvN7BydHYyyAb94fEmrNLsVIHDwy3oLWYX7hU8LC4Xk_nsdYtfBt_uGxi_ADzWKQE |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+generalized+and+versatile+framework+to+train+and+evaluate+autoencoders+for+biological+representation+learning+and+beyond%3A+AUTOENCODIX&rft.jtitle=bioRxiv&rft.au=Joas%2C+Maximilian&rft.au=Jurenaite%2C+Neringa&rft.au=Pra%C5%A1%C4%8Devi%C4%87%2C+Du%C5%A1an&rft.au=Scherf%2C+Nico&rft.date=2024-12-20&rft.pub=Cold+Spring+Harbor+Laboratory+Press&rft.issn=2692-8205&rft.eissn=2692-8205&rft_id=info:doi/10.1101%2F2024.12.17.628906 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2692-8205&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2692-8205&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2692-8205&client=summon |