Structure-preserving deep embedded clustering algorithm for incomplete gene expression data

Missing values inevitably appear in gene expression data, making it impossible to directly apply clustering algorithms to incomplete gene expression data to identify co-expressed genes. Deep autoencoders are often used for feature learning of data in clustering incomplete data due to their powerful...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Chinese Control Conference s. 8255 - 8261
Hlavní autoři: Wang, Zhencheng, Li, Dan
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: Technical Committee on Control Theory, Chinese Association of Automation 28.07.2024
Témata:
ISSN:1934-1768
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Missing values inevitably appear in gene expression data, making it impossible to directly apply clustering algorithms to incomplete gene expression data to identify co-expressed genes. Deep autoencoders are often used for feature learning of data in clustering incomplete data due to their powerful ability to learn representations. Existing deep autoencoder-based clustering algorithms for incomplete data are two-stage algorithms that perform feature learning before clustering, ignoring the correlation between the two tasks. In order to ensure that the feature representations learned by the network are oriented to the clustering task, and the mapped features can preserve the inherent structure information of the input data, this paper proposes a deep embedded clustering algorithm for incomplete gene expression data based on structure-preserving autoencoder. On the one hand, the proposed algorithm applies joint optimization to the clustering process of incomplete data, alternately performing feature learning and clustering optimization of the imputed data iteratively. On the other hand, distinguishing from preserving the geometric structure of the input data only in the feature space where the clustering task is performed, we define Sammon's stress between the inputs and outputs so that the data can preserve the inherent geometric structure information throughout the mapping process. Experimental results on several gene expression datasets show that the proposed algorithm achieves better results in terms of both clustering effect and biological significance
AbstractList Missing values inevitably appear in gene expression data, making it impossible to directly apply clustering algorithms to incomplete gene expression data to identify co-expressed genes. Deep autoencoders are often used for feature learning of data in clustering incomplete data due to their powerful ability to learn representations. Existing deep autoencoder-based clustering algorithms for incomplete data are two-stage algorithms that perform feature learning before clustering, ignoring the correlation between the two tasks. In order to ensure that the feature representations learned by the network are oriented to the clustering task, and the mapped features can preserve the inherent structure information of the input data, this paper proposes a deep embedded clustering algorithm for incomplete gene expression data based on structure-preserving autoencoder. On the one hand, the proposed algorithm applies joint optimization to the clustering process of incomplete data, alternately performing feature learning and clustering optimization of the imputed data iteratively. On the other hand, distinguishing from preserving the geometric structure of the input data only in the feature space where the clustering task is performed, we define Sammon's stress between the inputs and outputs so that the data can preserve the inherent geometric structure information throughout the mapping process. Experimental results on several gene expression datasets show that the proposed algorithm achieves better results in terms of both clustering effect and biological significance
Author Li, Dan
Wang, Zhencheng
Author_xml – sequence: 1
  givenname: Zhencheng
  surname: Wang
  fullname: Wang, Zhencheng
  organization: Dalian University of Technology,School of Control Science and Engineering,Dalian,P. R. China,116024
– sequence: 2
  givenname: Dan
  surname: Li
  fullname: Li, Dan
  email: Idan@dlut.edu.cn
  organization: Dalian University of Technology,School of Control Science and Engineering,Dalian,P. R. China,116024
BookMark eNo1kM1KxDAUhaMoODP6BoJ5gY65TXqTLKX4BwMunJ2LIW1ua6R_pBnRt3cGdXUW3-GDc5bsbBgHYuwGxDqXFuxtWZYoQeM6F7lag0AEg-aELa0xujBQmOKULcBKlR1a5oIt5_lDCBQW5IK9vaa4r9M-UjZFmil-hqHlnmji1FfkPXled_s5UTwC17VjDOm9580YeRjqsZ86SsRbGojT19Exh3Hg3iV3yc4b18109Zcrtn2435ZP2ebl8bm822TBQsoqRGN8o7VCJwhVRaIyskbMhXLaoIWq9lpLBNEIqNxhn5DCe6tAISgrV-z6VxuIaDfF0Lv4vfs_Qv4ACjNV8A
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.23919/CCC63176.2024.10661868
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library (IEL) (UW System Shared)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Biology
EISBN 9887581585
9789887581581
EISSN 1934-1768
EndPage 8261
ExternalDocumentID 10661868
Genre orig-research
GroupedDBID 29B
6IE
6IF
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
ID FETCH-LOGICAL-i91t-b6688df7746a0e64be0b83c66204a78691bcd773610f01ba661030dd941461493
IEDL.DBID RIE
IngestDate Wed Aug 27 02:00:20 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i91t-b6688df7746a0e64be0b83c66204a78691bcd773610f01ba661030dd941461493
PageCount 7
ParticipantIDs ieee_primary_10661868
PublicationCentury 2000
PublicationDate 2024-July-28
PublicationDateYYYYMMDD 2024-07-28
PublicationDate_xml – month: 07
  year: 2024
  text: 2024-July-28
  day: 28
PublicationDecade 2020
PublicationTitle Chinese Control Conference
PublicationTitleAbbrev CCC
PublicationYear 2024
Publisher Technical Committee on Control Theory, Chinese Association of Automation
Publisher_xml – name: Technical Committee on Control Theory, Chinese Association of Automation
SSID ssj0060913
Score 2.2633147
Snippet Missing values inevitably appear in gene expression data, making it impossible to directly apply clustering algorithms to incomplete gene expression data to...
SourceID ieee
SourceType Publisher
StartPage 8255
SubjectTerms Analytical models
Biology
Clustering algorithms
Correlation
Data models
Deep embedded clustering
Gene expression data
Imputation
Joint optimization
Missing value
Representation learning
Structure preserving
Title Structure-preserving deep embedded clustering algorithm for incomplete gene expression data
URI https://ieeexplore.ieee.org/document/10661868
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEF5sUdCLWiu-2YPX1Dz3cQ4WT6VgDwUPJZudrYU2LTUV_PfObFofBw_eQkJIMpudx-4338fYvdXCYplcBKZMsEARUgc6IXptk0nrnEujyHqxCTkYqPFYD7fN6r4XBgA8-Ax6dOj38u2y3NBSGc5w4endW6wlpWiatXZuVxDBZQPgihMd6Yc8zwUGR4IhxGlvd-svERUfQ_rH_3z6Cet-d-Px4VecOWV7UHXYQSMi-dFhRz8oBc_Yy7MnhN2sISCIK3mCasotwIrDwgC6GcvL-YboEehCMZ8u17P6dcExeeXE1EBswTVw_K-A41s2MNmKE5K0y0b9x1H-FGwFFIKZjurACKGUdZjgiSIEkRoIjUpKQRT0hVRCR6a0UiaYQbkwMgV-Hk55a3VKYt-pTs5Zu1pWcMG4ThWAsKDCAhMoLFFiApC7rDSZcFHiLlmXDDZZNRQZk52trv44f80OaVhokTRWN6yNpoFbtl--17O39Z0f2E_hbaRM
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELaggIAFKEW88cCakofj2HNEVUSpKtGhEkMVx-dSqU2rkiLx7_E5LY-BgS1KFCU5x_ewv_s-Qm615NqWyZmn8sgWKDyRnoyQXlvFiTbGsCDQTmwi6XbFYCB7q2Z11wsDAA58Bk08dHv5epYvcanMznDu6N03yVbMWOhX7Vprx8uR4rKCcIWRDORdmqbchkcEIoSsub75l4yKiyKtg38-_5A0vvvxaO8r0hyRDSjqZKeSkfyok_0fpILH5OXZUcIuF-AhyBV9QTGiGmBOYarAOhpN88kSCRLwQjYZzRbj8nVKbfpKkasB-YJLoPbPAmrfsgLKFhSxpA3Sb93307a3klDwxjIoPcW5ENrYFI9nPnCmwFciyjmS0GeJ4DJQuU6SyOZQxg9UZj_PTnqtJUO5byajE1IrZgWcEiqZAOAahJ_ZFMoWKSFCyE2cq5ibIDJnpIEGG84rkozh2lbnf5y_Ibvt_lNn2HnoPl6QPRwiXDINxSWpWTPBFdnO38vx2-LaDfInmymnkw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Chinese+Control+Conference&rft.atitle=Structure-preserving+deep+embedded+clustering+algorithm+for+incomplete+gene+expression+data&rft.au=Wang%2C+Zhencheng&rft.au=Li%2C+Dan&rft.date=2024-07-28&rft.pub=Technical+Committee+on+Control+Theory%2C+Chinese+Association+of+Automation&rft.eissn=1934-1768&rft.spage=8255&rft.epage=8261&rft_id=info:doi/10.23919%2FCCC63176.2024.10661868&rft.externalDocID=10661868