Structure-preserving deep embedded clustering algorithm for incomplete gene expression data

Missing values inevitably appear in gene expression data, making it impossible to directly apply clustering algorithms to incomplete gene expression data to identify co-expressed genes. Deep autoencoders are often used for feature learning of data in clustering incomplete data due to their powerful...

Full description

Saved in:

Bibliographic Details
Published in:	Chinese Control Conference pp. 8255 - 8261
Main Authors:	Wang, Zhencheng, Li, Dan
Format:	Conference Proceeding
Language:	English
Published:	Technical Committee on Control Theory, Chinese Association of Automation 28.07.2024
Subjects:	Analytical models Biology Clustering algorithms Correlation Data models Deep embedded clustering Gene expression data Imputation Joint optimization Missing value Representation learning Structure preserving
ISSN:	1934-1768
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Missing values inevitably appear in gene expression data, making it impossible to directly apply clustering algorithms to incomplete gene expression data to identify co-expressed genes. Deep autoencoders are often used for feature learning of data in clustering incomplete data due to their powerful ability to learn representations. Existing deep autoencoder-based clustering algorithms for incomplete data are two-stage algorithms that perform feature learning before clustering, ignoring the correlation between the two tasks. In order to ensure that the feature representations learned by the network are oriented to the clustering task, and the mapped features can preserve the inherent structure information of the input data, this paper proposes a deep embedded clustering algorithm for incomplete gene expression data based on structure-preserving autoencoder. On the one hand, the proposed algorithm applies joint optimization to the clustering process of incomplete data, alternately performing feature learning and clustering optimization of the imputed data iteratively. On the other hand, distinguishing from preserving the geometric structure of the input data only in the feature space where the clustering task is performed, we define Sammon's stress between the inputs and outputs so that the data can preserve the inherent geometric structure information throughout the mapping process. Experimental results on several gene expression datasets show that the proposed algorithm achieves better results in terms of both clustering effect and biological significance
ISSN:	1934-1768
DOI:	10.23919/CCC63176.2024.10661868