INet: Convolutional Networks for Biomedical Image Segmentation

Encoder-decoder networks are state-of-the-art approaches to biomedical image segmentation, but have two problems: i.e., the widely used pooling operations may discard spatial information, and therefore low-level semantics are lost. Feature fusion methods can mitigate these problems but feature maps...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE access Ročník 9; s. 16591 - 16603
Hlavní autoři:	Weng, Weihao, Zhu, Xin
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Piscataway IEEE 2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:	Biomedical image Biomedical imaging Coders Convolution convolutional networks encoder–decoder networks Feature extraction Feature maps Image segmentation Kernel Kernels Medical imaging semantic segmentation Semantics Spatial data Spatial resolution Tumors
ISSN:	2169-3536, 2169-3536
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Encoder-decoder networks are state-of-the-art approaches to biomedical image segmentation, but have two problems: i.e., the widely used pooling operations may discard spatial information, and therefore low-level semantics are lost. Feature fusion methods can mitigate these problems but feature maps of different scales cannot be easily fused because down- and upsampling change the spatial resolution of feature map. To address these issues, we propose INet, which enlarges receptive fields by increasing the kernel sizes of convolutional layers in steps (e.g., from <inline-formula> <tex-math notation="LaTeX">3\times 3 </tex-math></inline-formula> to <inline-formula> <tex-math notation="LaTeX">7\times 7 </tex-math></inline-formula> and then <inline-formula> <tex-math notation="LaTeX">15\times 15 </tex-math></inline-formula>) instead of downsampling. Inspired by an Inception module, INet extracts features by kernels of different sizes through concatenating the output feature maps of all preceding convolutional layers. We also find that the large kernel makes the network feasible for biomedical image segmentation. In addition, INet uses two overlapping max-poolings, i.e., max-poolings with stride 1, to extract the sharpest features. Fixed-size and fixed-channel feature maps enable INet to concatenate feature maps and add multiple shortcuts across layers. In this way, INet can recover low-level semantics by concatenating the feature maps of all preceding layers and expedite the training by adding multiple shortcuts. Because INet has additional residual shortcuts, we compare INet with a UNet system that also has residual shortcuts (ResUNet). To confirm INet as a backbone architecture for biomedical image segmentation, we implement dense connections on INet (called DenseINet) and compare it to a DenseUNet system with residual shortcuts (ResDenseUNet). INet and DenseINet require 16.9% and 37.6% fewer parameters than ResUNet and ResDenseUNet, respectively. In comparison with six encoder-decoder approaches using nine public datasets, INet and DenseINet demonstrate efficient improvements in biomedical image segmentation. INet outperforms DeepLabV3, which implementing atrous convolution instead of downsampling to increase receptive fields. INet also outperforms two recent methods (named HRNet and MS-NAS) that maintain high-resolution representations and repeatedly exchange the information across resolutions.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2021.3053408