MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering
Background Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus...
Saved in:
| Published in: | BMC bioinformatics Vol. 10; no. 1; p. 260 |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
London
BioMed Central
22.08.2009
BioMed Central Ltd Springer Nature B.V BMC |
| Subjects: | |
| ISSN: | 1471-2105, 1471-2105 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Background
Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance.
Results
We present a cluster-number-based ensemble clustering algorithm, called
MULTI-K
, for microarray sample classification, which demonstrates remarkable accuracy. The method amalgamates multiple
k
-means runs by varying the number of clusters and identifies clusters that manifest the most robust co-memberships of elements. In addition to the original algorithm, we newly devised the
entropy-plot
to control the separation of singletons or small clusters. MULTI-K, unlike the simple
k
-means or other widely used methods, was able to capture clusters with complex and high-dimensional structures accurately. MULTI-K outperformed other methods including a recently developed ensemble clustering algorithm in tests with five simulated and eight real gene-expression data sets.
Conclusion
The geometric complexity of clusters should be taken into account for accurate classification of microarray data, and ensemble clustering applied to the number of clusters tackles the problem very well. The C++ code and the data sets tested are available from the authors. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Feature-1 content type line 23 |
| ISSN: | 1471-2105 1471-2105 |
| DOI: | 10.1186/1471-2105-10-260 |