SAM2Med3D: Leveraging video foundation models for 3D breast MRI segmentation

Foundation models such as the Segment Anything Model 2 (SAM2) have demonstrated impressive generalization across natural image domains. However, their potential in volumetric medical imaging remains largely underexplored, particularly under limited data conditions. In this paper, we present SAM2Med3...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Computers & graphics Ročník 132; s. 104341
Hlavní autoři: Chen, Ying, Cui, Wenjing, Dong, Xiaoyan, Zhou, Shuai, Wang, Zhongqiu
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Ltd 01.11.2025
Témata:
ISSN:0097-8493
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Foundation models such as the Segment Anything Model 2 (SAM2) have demonstrated impressive generalization across natural image domains. However, their potential in volumetric medical imaging remains largely underexplored, particularly under limited data conditions. In this paper, we present SAM2Med3D, a novel multi-stage framework that adapts a general-purpose video foundation model for accurate and consistent 3D breast MRI segmentation by treating 3D MRI scan as a sequence of images. Unlike existing image-based approaches (e.g., MedSAM) that require large-scale medical data for fine-tuning, our method combines a lightweight, task-specific segmentation network with a video foundation model, achieving strong performance with only modest training data. To guide the foundation model effectively, we introduce a novel spatial filtering strategy that identifies reliable slices from the initial segmentation to serve as high-quality prompts. Additionally, we propose a confidence-driven fusion mechanism that adaptively integrates coarse and refined predictions across the volume, mitigating segmentation drift and ensuring both local accuracy and global volumetric consistency. We validate SAM2Med3D on two multi-center breast MRI datasets, including both public and self-collected datasets. Experimental results demonstrate that our method outperforms both task-specific segmentation networks and recent foundation-model-based methods, achieving superior accuracy and inter-slice consistency. •Leverages a video foundation model and task-specific model for 3D MRI segmentation.•Proposes a spatial filtering strategy to select reliable initial segmentations as prompts.•Introduces confidence-driven fusion to ensure 3D consistency.•Achieves accurate 3D segmentation on multi-center datasets.
ISSN:0097-8493
DOI:10.1016/j.cag.2025.104341