Fuzzy Clustering and Aggregation of Relational Data With Instance-Level Constraints

In this paper, we introduce a semisupervised approach for clustering and aggregating relational data (SS-CARD). We assume that data is available in a relational form, where information only about the degrees to which pairs of objects in the dataset are related is available. Moreover, we assume that...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on fuzzy systems Ročník 16; číslo 6; s. 1565 - 1581
Hlavní autoři: Frigui, H., Cheul Hwang
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.12.2008
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:1063-6706, 1941-0034
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:In this paper, we introduce a semisupervised approach for clustering and aggregating relational data (SS-CARD). We assume that data is available in a relational form, where information only about the degrees to which pairs of objects in the dataset are related is available. Moreover, we assume that the relational information is represented by multiple dissimilarity matrices. These matrices could have been generated using different features, different mappings, or even different sensors. SS-CARD is designed to aggregate pairwise distances from multiple relational matrices, partition the data into clusters, and learn a relevance weight for each matrix in each cluster simultaneously. These weights have two main advantages. First, they help in partitioning the data into more meaningful clusters. Second, they can be used as part of a more complex learning system to enhance its learning behavior. SS-CARD uses partial supervision information that consists of a small set of constraints on which instances ( should link ) or ( should not link ) reside in the same cluster. This additional information can guide the algorithm in learning the optimal relevance weights and in generating a better partition. The performance of the proposed algorithm is illustrated by using it in two different applications. The first one consists of categorizing the discrete nominal-valued mushroom data. The second application consists of categorizing a collection of images where each image is represented by several continuous features. For both applications, we represent the pairwise image dissimilarities by multiple relational matrices extracted from different feature sets. The results are compared with those obtained by three traditional relational clustering methods. We show that the partial supervision information and the learned aggregation weights can improve the results significantly.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ObjectType-Article-2
ObjectType-Feature-1
content type line 23
ISSN:1063-6706
1941-0034
DOI:10.1109/TFUZZ.2008.2005692