DMFC-UFormer: Depthwise multi-scale factorized convolution transformer-based UNet for medical image segmentation
Medical image segmentation provides a crucial foundation for cancer diagnosis. Transformers are adept at understanding global context and complex dependencies. CNNs, meanwhile, are efficient for local feature extraction and hierarchical learning but struggle with long-range dependencies. In this pap...
Saved in:
| Published in: | Biomedical signal processing and control Vol. 101; p. 107200 |
|---|---|
| Main Authors: | , , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier Ltd
01.03.2025
|
| Subjects: | |
| ISSN: | 1746-8094 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Medical image segmentation provides a crucial foundation for cancer diagnosis. Transformers are adept at understanding global context and complex dependencies. CNNs, meanwhile, are efficient for local feature extraction and hierarchical learning but struggle with long-range dependencies. In this paper, we combine the benefits of both methodologies. We propose DMFC-UFormer, an advanced fusion of Depthwise Multi-Scale Factorized Convolution-based transformers (DMFC-Transformer) with UNet. The DMFC-Transformer integrates two sub-transformer blocks. The first employs Multi-Scale Factorized Feature Extraction (MFFE) to enhance diversity in feature representation across different levels. The second uses Depthwise Multi-Scale Factorized Convolution (DMFC) to capture a broader range of patterns and variations. An Enhanced Contextual Feature Integration (ECFI) block is incorporated after each transition. This emulates contextual features and facilitates segmentation at each phase. The Spatial-Channel Partitioned Feature Attention (SCPFA) bottleneck module replaces stacking modules. This expands the receptive field and augments feature diversity. An Attention-based Feature Stabilization (AFS) module is integrated into skip connections. It ensures global interaction and highlights important semantic features from the encoder to the decoder. To assess the versatility of the network, we evaluated DMFC-UFormer across a range of medical image segmentation datasets. These include diverse imaging modalities such as wireless capsule endoscopy (WCE), colonoscopy, and dermoscopic images. DMFC-UFormer achieves Dice coefficients (DCs) of 92.14%, 89.99%, 90.47%, and 82.39% on the MICCAI 2017 (Red Lesion), PH2, CVC-ClinicalDB, and ISIC 2017 datasets, respectively. It outperforms the second-ranked methods by margins of 0.83%, 1.22%, 0.57%, and 0.13% in DC on the respective datasets.
•We propose a DMFC-UFormer architecture for Medical Image segmentation.•DMFC-UFormer captures both local and global context for precise segmentation.•SCPFA module expands the receptive field, boosting feature diversity in segmentation.•ECFI integrates after transitions, effectively capturing contextual features.•AFS enhances global interaction and highlights semantic features in skip connections. |
|---|---|
| ISSN: | 1746-8094 |
| DOI: | 10.1016/j.bspc.2024.107200 |