Incorporating hierarchical information into multiple instance learning for patient phenotype prediction with single-cell RNA-sequencing data

Multiple instance learning (MIL) provides a structured approach to patient phenotype prediction with single-cell RNA-sequencing (scRNA-seq) data. However, existing MIL methods tend to overlook the hierarchical structure inherent in scRNA-seq data, especially the biological groupings of cells or cell...

Full description

Saved in:
Bibliographic Details
Published in:Bioinformatics (Oxford, England) Vol. 41; no. Supplement_1; pp. i96 - i104
Main Authors: Do, Chau, Lähdesmäki, Harri
Format: Journal Article
Language:English
Published: England Oxford Publishing Limited (England) 01.07.2025
Oxford University Press
Subjects:
ISSN:1367-4803, 1367-4811, 1367-4811
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Multiple instance learning (MIL) provides a structured approach to patient phenotype prediction with single-cell RNA-sequencing (scRNA-seq) data. However, existing MIL methods tend to overlook the hierarchical structure inherent in scRNA-seq data, especially the biological groupings of cells or cell types. This limitation may lead to suboptimal performance and poor interpretability at higher levels of cellular division. To address this gap, we present a novel approach to incorporate hierarchical information into the attention-based MIL framework. Specifically, our model applies the attention-based aggregation mechanism over both cells and cell types, thus enforcing a hierarchical structure on the flow of information throughout the model. Across extensive experiments, our proposed approach demonstrates highly competitive performance and shows robustness against limited sample sizes. Moreover, ablation test results show that simply applying the attention mechanism on cell types instead of cells leads to improved performance, underscoring the benefits of incorporating the hierarchical groupings. By identifying the critical cell types that are most relevant for prediction, we show that our model is capable of capturing biologically meaningful associations, suggesting its potential to facilitate biological discoveries. Our source code is available at https://github.com/minhchaudo/hier-mil. All datasets used in this study are publicly available online.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1367-4803
1367-4811
1367-4811
DOI:10.1093/bioinformatics/btaf241