MSC-CSMC: A multi-objective semi-supervised clustering algorithm based on constraints selection and multi-source constraints for gene expression data

Many clustering techniques have been proposed to group genes based on gene expression data. Among these methods, semi-supervised clustering techniques aim to improve clustering performance by incorporating supervisory information in the form of pairwise constraints. However, noisy constraints inevit...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Frontiers in genetics Ročník 14; s. 1135260
Hlavní autoři: Wang, Zeyuan, Gu, Hong, Zhao, Minghui, Li, Dan, Wang, Jia
Médium: Journal Article
Jazyk:angličtina
Vydáno: Switzerland Frontiers Media S.A 27.02.2023
Témata:
ISSN:1664-8021, 1664-8021
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Many clustering techniques have been proposed to group genes based on gene expression data. Among these methods, semi-supervised clustering techniques aim to improve clustering performance by incorporating supervisory information in the form of pairwise constraints. However, noisy constraints inevitably exist in the constraint set obtained on the practical unlabeled dataset, which degenerates the performance of semi-supervised clustering. Moreover, multiple information sources are not integrated into multi-source constraints to improve clustering quality. To this end, the research proposes a new multi-objective semi-supervised clustering algorithm based on constraints selection and multi-source constraints (MSC-CSMC) for unlabeled gene expression data. The proposed method first uses the gene expression data and the gene ontology (GO) that describes gene annotation information to form multi-source constraints. Then, the multi-source constraints are applied to the clustering by improving the constraint violation penalty weight in the semi-supervised clustering objective function. Furthermore, the constraints selection and cluster prototypes are put into the multi-objective evolutionary framework by adopting a mixed chromosome encoding strategy, which can select pairwise constraints suitable for clustering tasks through synergistic optimization to reduce the negative influence of noisy constraints. The proposed MSC-CSMC algorithm is testified using five benchmark gene expression datasets, and the results show that the proposed algorithm achieves superior performance.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
This article was submitted to Computational Genomics, a section of the journal Frontiers in Genetics
Edited by: Suyan Tian, Jilin University, China
Changjing Zhuge, Beijing University of Technology, China
Reviewed by: Guojun Liu, Xi’an University of Finance and Economics, China
ISSN:1664-8021
1664-8021
DOI:10.3389/fgene.2023.1135260