Bibliographic Details
| Title: |
Superior Image Segmentation Performance with a Hybrid CNN-ViT Architecture |
| Authors: |
Yethiraj N G, Dr Sidappa M |
| Publisher Information: |
Acta Scientiae Journal, Q2 Scopus Indexed, 2025. |
| Publication Year: |
2025 |
| Subject Terms: |
Vision Transformers, Deep Learning, Convolutional Neural Networks, Semantic Segmentation, Hybrid Architecture, Image Segmentation |
| Description: |
Image segmentation is a foundational task in computer vision, involving the precise pixel-wise delineation of objects. While deep convolutional neural networks (CNNs) have shown remarkable success in this domain, their inherent local receptive fields often limit their ability to capture long-range dependencies and global context, which is critical for accurate segmentation in complex scenes. Conversely, the recently introduced Vision Transformer (ViT) architecture excels at modeling global relationships but may struggle to capture the fine-grained, local features essential for precise boundaries. This paper introduces a novel hybrid deep learning architecture that synergistically combines a multi-scale CNN-based encoder with a Transformer-based contextual module. Our proposed model, the Hybrid Segmentation Network (HSN), effectively fuses the rich hierarchical features extracted by the CNN with the global contextual understanding provided by the ViT. Through extensive experimentation on two diverse datasets—the ISIC 2017 medical image dataset and the Pascal VOC 2012 natural scene dataset—we demonstrate that our HSN model outperforms state-of-the-art models, including pure CNNs (U-Net, DeepLabV3+) and pure ViTs. The results show a significant improvement in key metrics such as the Dice Coefficient and Mean Intersection over Union (mIoU), highlighting the efficacy of our approach in achieving more precise and robust segmentation, with broad implications for applications in autonomous driving, medical imaging, and industrial automation. |
| Document Type: |
Article |
| Language: |
English |
| DOI: |
10.5281/zenodo.17179391 |
| DOI: |
10.5281/zenodo.17179392 |
| Rights: |
CC BY |
| Accession Number: |
edsair.doi.dedup.....04c1949ce51c0e47b439e234bcc4dcec |
| Database: |
OpenAIRE |