Structure-preserving deep embedded clustering algorithm for incomplete gene expression data

Missing values inevitably appear in gene expression data, making it impossible to directly apply clustering algorithms to incomplete gene expression data to identify co-expressed genes. Deep autoencoders are often used for feature learning of data in clustering incomplete data due to their powerful...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Chinese Control Conference s. 8255 - 8261
Hlavní autoři: Wang, Zhencheng, Li, Dan
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: Technical Committee on Control Theory, Chinese Association of Automation 28.07.2024
Témata:
ISSN:1934-1768
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Missing values inevitably appear in gene expression data, making it impossible to directly apply clustering algorithms to incomplete gene expression data to identify co-expressed genes. Deep autoencoders are often used for feature learning of data in clustering incomplete data due to their powerful ability to learn representations. Existing deep autoencoder-based clustering algorithms for incomplete data are two-stage algorithms that perform feature learning before clustering, ignoring the correlation between the two tasks. In order to ensure that the feature representations learned by the network are oriented to the clustering task, and the mapped features can preserve the inherent structure information of the input data, this paper proposes a deep embedded clustering algorithm for incomplete gene expression data based on structure-preserving autoencoder. On the one hand, the proposed algorithm applies joint optimization to the clustering process of incomplete data, alternately performing feature learning and clustering optimization of the imputed data iteratively. On the other hand, distinguishing from preserving the geometric structure of the input data only in the feature space where the clustering task is performed, we define Sammon's stress between the inputs and outputs so that the data can preserve the inherent geometric structure information throughout the mapping process. Experimental results on several gene expression datasets show that the proposed algorithm achieves better results in terms of both clustering effect and biological significance
ISSN:1934-1768
DOI:10.23919/CCC63176.2024.10661868