Mixed Fuzzy Clustering for Misaligned Time Series

Data mining in medical databases often involves the comparison of time series, which represent the evolution of a physiological variable. Temporal misalignment of physiological variables can conceal the discovery of patterns and trends shared between different patients. To address this problem, this...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on fuzzy systems Vol. 25; no. 6; pp. 1777 - 1794
Main Authors: Salgado, Catia M., Ferreira, Marta C., Vieira, Susana M.
Format: Journal Article
Language:English
Published: New York IEEE 01.12.2017
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:1063-6706, 1941-0034
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Data mining in medical databases often involves the comparison of time series, which represent the evolution of a physiological variable. Temporal misalignment of physiological variables can conceal the discovery of patterns and trends shared between different patients. To address this problem, this paper proposes the mixed fuzzy clustering (MFC) algorithm with the dynamic time-warping (DTW) distance. We developed the MFC algorithm by 1) incorporating the DTW distance into the standard fuzzy c-means to handle misaligned time series; 2) introducing a new dimension into the spatiotemporal clustering algorithm to handle P time-variant features; and 3) incorporating unsupervised learning of cluster-dependent attribute weights. The algorithm is designed to simultaneously cluster time-variant and time-invariant data. We demonstrate the advantages of the proposed algorithm in four synthetic datasets and in two real-world applications in intensive care units. The first application is the classification of patients who will need the administration of vasopressors, and the second is the classification of patients with a high risk of mortality. Time-variant features consist of physiological variables collected with different sampling rates at different points in time. Time-invariant features consist of patients' demographics and score records. The performance is evaluated using cluster validity measures, showing that the proposed algorithm outperforms fuzzy c-means.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1063-6706
1941-0034
DOI:10.1109/TFUZZ.2016.2633375