Multi-modal mixed-type structural equation modeling with structured sparsity for subgroup discovery from heterogeneous health data
The increasing availability of health data from resources such as large biobanks, electronic healthcare records, medical tests, and wearable sensors, has set the stage for the development of novel machine learning (ML) models for multi-modal mixed-type data to capture the complexity of human health...
Uloženo v:
| Vydáno v: | IISE transactions Ročník 57; číslo 12; s. 1497 - 1511 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
United States
01.12.2025
|
| Témata: | |
| ISSN: | 2472-5854, 2472-5862, 2472-5862 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | The increasing availability of health data from resources such as large biobanks, electronic healthcare records, medical tests, and wearable sensors, has set the stage for the development of novel machine learning (ML) models for multi-modal mixed-type data to capture the complexity of human health and disease. Clustering is a type of ML model that aims to identify homogenous subgroups from heterogeneous data, providing a data-driven solution to targeted, subgroup-specific study and intervention. While such data contain diverse and complementary information to facilitate decision making and improve population health, clustering of high-dimensional multi-modal mixed-type data poses major challenges to existing ML and statistical models. We propose a novel Multi-modal Mixed-type Structural Equation Model (M2-SEM) with structured sparsity to cluster heterogeneous health data for precise subgroup discovery. To accommodate a mix of continuous and categorical data modalities, we developed a novel Gauss-Hermite-enabled Expectation-Majorization-Minimization (GH-EMM) algorithm that integrates the GH quadrature and the Majorization Maximization (MM) algorithm within the Expectation Maximization (EM) framework for efficient model estimation. The proposed M2-SEM and GH-EMM are first tested in extensive simulation studies in comparison with benchmarks, and then applied to identify subgroups of individuals with low- and high-risk of developing adverse cardiometabolic (CM) outcomes based on a full spectrum of CM risk factors such as poor nutrition and mental health, physical inactivity, and sleep deprivation. These findings shed light on the promise of using multi-modal mixed-type health data for early identification and targeted intervention of at-risk individuals for health promotion at the population level. |
|---|---|
| Bibliografie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 2472-5854 2472-5862 2472-5862 |
| DOI: | 10.1080/24725854.2024.2445776 |