A generalized and versatile framework to train and evaluate autoencoders for biological representation learning and beyond: AUTOENCODIX

Insights and discoveries in complex biological systems, e.g. for personalized medicine, are gained by the combination of large, feature-rich and high-dimensional data with powerful computational methods uncovering patterns and relationships. In recent years, autoencoders, a family of deep learning-b...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:bioRxiv
Hlavní autori: Joas, Maximilian, Jurenaite, Neringa, Praščević, Dušan, Scherf, Nico, Ewald, Jan
Médium: Paper
Jazyk:English
Vydavateľské údaje: Cold Spring Harbor Cold Spring Harbor Laboratory Press 20.12.2024
Cold Spring Harbor Laboratory
Vydanie:1.1
Predmet:
ISSN:2692-8205, 2692-8205
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Insights and discoveries in complex biological systems, e.g. for personalized medicine, are gained by the combination of large, feature-rich and high-dimensional data with powerful computational methods uncovering patterns and relationships. In recent years, autoencoders, a family of deep learning-based methods for representation learning, are advancing data-driven research due to their variability and non-linear power of multi-modal data integration. Despite their success, current implementations lack standardization, versatility, comparability, and generalizability preventing a broad application. To fill the gap, we present AUTOENCODIX (https://github.com/jan-forest/autoencodix), an open-source framework, designed as a standardized and flexible pipeline for preprocessing, training, and evaluation of autoencoder architectures. These architectures, like ontology-based and cross-modal autoencoders, provide key advantages over traditional methods via explainability of embeddings or the ability to translate across data modalities. We show the value of our framework by its application to data sets from pan-cancer studies (TCGA), single-cell sequencing as well as in combination with imaging. Our studies provide important user-centric insights and recommendations to navigate through architectures, hyperparameters, and important trade-offs in representation learning. Those include reconstruction capability of input data, the quality of embedding for downstream machine learning models, or the reliability of ontology-based embeddings for explainability. In summary, our versatile and generalizable framework allows multi-modal data integration in biomedical research and any other data-driven fields of research. Hence, it can serve as a open-source platform for several major trends and research using autoencoders including architectural improvements, explainability, or training of large-scale pre-trained models.Competing Interest StatementThe authors have declared no competing interest.Footnotes* https://github.com/jan-forest/autoencodix
Bibliografia:SourceType-Working Papers-1
ObjectType-Working Paper/Pre-Print-1
content type line 50
Competing Interest Statement: The authors have declared no competing interest.
ISSN:2692-8205
2692-8205
DOI:10.1101/2024.12.17.628906