Automatic building extraction on high-resolution remote sensing imagery using deep convolutional encoder-decoder with spatial pyramid pooling

Automatic extraction of buildings from remote sensing imagery plays a significant role in many applications, such as urban planning and monitoring changes to land cover. Various building segmentation methods have been proposed for visible remote sensing images, especially state-of-the-art methods ba...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access Vol. 7; p. 1
Main Authors: Liu, Yaohui, Gross, Lutz, Li, Zhiqiang, Li, Xiaoli, Fan, Xiwei, Qi, Wenhua
Format: Journal Article
Language:English
Published: Piscataway IEEE 01.01.2019
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:2169-3536, 2169-3536
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Automatic extraction of buildings from remote sensing imagery plays a significant role in many applications, such as urban planning and monitoring changes to land cover. Various building segmentation methods have been proposed for visible remote sensing images, especially state-of-the-art methods based on convolutional neural networks (CNNs). However, high-accuracy building segmentation from high-resolution remote sensing imagery is still a challenging task due to the potentially complex texture of buildings in general and image background. Repeated pooling and striding operations used in CNNs reduce feature resolution causing a loss of detailed information. To address this issue, we propose a light-weight deep learning model integrating spatial pyramid pooling with an encoder-decoder structure. The proposed model takes advantage of a spatial pyramid pooling module to capture and aggregate multi-scale contextual information and of the ability of encoder-decoder networks to restore losses of information. The proposed model is evaluated on two publicly available datasets; the Massachusetts roads and buildings dataset and the INRIA Aerial Image Labeling Dataset. The experimental results on these datasets show qualitative and quantitative improvement against established image segmentation models, including SegNet, FCN, U-Net, Tiramisu, and FRRN. For instance, compared to the standard U-Net, the overall accuracy gain is 1.0% (0.913 vs. 0.904) and 3.6% (0.909 vs. 0.877) with a maximal increase of 3.6% in model-training time on these two datasets. These results demonstrate that the proposed model has the potential to deliver automatic building segmentation from high-resolution remote sensing images at an accuracy that makes it a useful tool for practical application scenarios.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2019.2940527