Multi-modal mixed-type structural equation modeling with structured sparsity for subgroup discovery from heterogeneous health data

The increasing availability of health data from resources such as large biobanks, electronic healthcare records, medical tests, and wearable sensors, has set the stage for the development of novel machine learning (ML) models for multi-modal mixed-type data to capture the complexity of human health...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IISE transactions Ročník 57; číslo 12; s. 1497 - 1511
Hlavní autoři: Ding, Yu, Somers, Virend K., Si, Bing
Médium: Journal Article
Jazyk:angličtina
Vydáno: United States 01.12.2025
Témata:
ISSN:2472-5854, 2472-5862, 2472-5862
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The increasing availability of health data from resources such as large biobanks, electronic healthcare records, medical tests, and wearable sensors, has set the stage for the development of novel machine learning (ML) models for multi-modal mixed-type data to capture the complexity of human health and disease. Clustering is a type of ML model that aims to identify homogenous subgroups from heterogeneous data, providing a data-driven solution to targeted, subgroup-specific study and intervention. While such data contain diverse and complementary information to facilitate decision making and improve population health, clustering of high-dimensional multi-modal mixed-type data poses major challenges to existing ML and statistical models. We propose a novel Multi-modal Mixed-type Structural Equation Model (M2-SEM) with structured sparsity to cluster heterogeneous health data for precise subgroup discovery. To accommodate a mix of continuous and categorical data modalities, we developed a novel Gauss-Hermite-enabled Expectation-Majorization-Minimization (GH-EMM) algorithm that integrates the GH quadrature and the Majorization Maximization (MM) algorithm within the Expectation Maximization (EM) framework for efficient model estimation. The proposed M2-SEM and GH-EMM are first tested in extensive simulation studies in comparison with benchmarks, and then applied to identify subgroups of individuals with low- and high-risk of developing adverse cardiometabolic (CM) outcomes based on a full spectrum of CM risk factors such as poor nutrition and mental health, physical inactivity, and sleep deprivation. These findings shed light on the promise of using multi-modal mixed-type health data for early identification and targeted intervention of at-risk individuals for health promotion at the population level.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2472-5854
2472-5862
2472-5862
DOI:10.1080/24725854.2024.2445776