Structure-preserving deep embedded clustering algorithm for incomplete gene expression data
Missing values inevitably appear in gene expression data, making it impossible to directly apply clustering algorithms to incomplete gene expression data to identify co-expressed genes. Deep autoencoders are often used for feature learning of data in clustering incomplete data due to their powerful...
Saved in:
| Published in: | Chinese Control Conference pp. 8255 - 8261 |
|---|---|
| Main Authors: | , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
Technical Committee on Control Theory, Chinese Association of Automation
28.07.2024
|
| Subjects: | |
| ISSN: | 1934-1768 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Missing values inevitably appear in gene expression data, making it impossible to directly apply clustering algorithms to incomplete gene expression data to identify co-expressed genes. Deep autoencoders are often used for feature learning of data in clustering incomplete data due to their powerful ability to learn representations. Existing deep autoencoder-based clustering algorithms for incomplete data are two-stage algorithms that perform feature learning before clustering, ignoring the correlation between the two tasks. In order to ensure that the feature representations learned by the network are oriented to the clustering task, and the mapped features can preserve the inherent structure information of the input data, this paper proposes a deep embedded clustering algorithm for incomplete gene expression data based on structure-preserving autoencoder. On the one hand, the proposed algorithm applies joint optimization to the clustering process of incomplete data, alternately performing feature learning and clustering optimization of the imputed data iteratively. On the other hand, distinguishing from preserving the geometric structure of the input data only in the feature space where the clustering task is performed, we define Sammon's stress between the inputs and outputs so that the data can preserve the inherent geometric structure information throughout the mapping process. Experimental results on several gene expression datasets show that the proposed algorithm achieves better results in terms of both clustering effect and biological significance |
|---|---|
| ISSN: | 1934-1768 |
| DOI: | 10.23919/CCC63176.2024.10661868 |