兰州大学机构库 >资源环境学院
基于CSENet的高分辨率影像道路提取方法研究
Alternative TitleResearch on Road Extraction Method from High-Resolution Images Based on CSENet
钟萧俊
Subtype硕士
Thesis Advisor刘勇
2023-05-26
Degree Grantor兰州大学
Place of Conferral兰州
Degree Name理学硕士
Degree Discipline地图学与地理信息系统
KeywordCSENet CSENet ConvNeXt ConvNeXt 边缘学习 Edge learning 连通性学习 Connectivity learning Transformer Transformer
Abstract

随着卫星遥感技术的快速发展,高分辨率遥感影像得到普及,传统的中低分辨率道路提取方法不适用于高分辨率遥感影像。同时,深度学习技术不断进步,使得各类卷积神经网络、图神经网络、Transformer网络、生成对抗网络等得到蓬勃发展。在计算机视觉领域,深度卷积神经网络因其较好的归纳偏置和高效计算,已经成为主流。如何充分利用深度卷积网络的优势,实现高精度、高效率和高实时性的高分辨率遥感影像道路提取,服务于国家政治经济生活,成为当前的重要问题。

本文分析了深度卷积网络在语义分割的现状,研究了基于深度卷积网络提取高分辨率遥感影像中道路的难点。具体来说,这些难点包括提取的道路连通性不好、提取道路的精度不够高、网络的泛化性较差等。为了解决这些问题,本文提出了CSENet模型。CSENet模型采用编码器-解码器结构,编码器部分使用ConvNeXt替代ResNet,以LinkNet的解码器为共享解码器,并将道路提取任务划分为3个模块:道路语义分割、道路连通性分割和边缘分割。在3个分割模块中有4个分支输出:道路语义分割输出、距离为1的连通性分割输出、距离为2的连通性分割输出和边缘分割输出。本文采用Transformer结构将4个输出融合,并使用不同权重调节各输出对最终结果的贡献。经过多组实验,本文的主要研究结论如下:

(1)在马萨诸塞州道路数据集和深度地球道路数据集上进行了定量分析。在马萨诸塞州道路数据集,CSENet模型取得80.22%查准率、79.23%的查全率、79.05%的F1值和66.45%的交并比;在深度地球道路数据集中,CSENet模型取得80.97%的查准率、84.12%的查全率、81.62%的F1值和70.18%的交并比。

(2)解码器中的主干网对模型的特征提取能力非常重要。在将解码器从ResNet34替换为ConvNeXt-T后,在马萨诸塞州道路数据集和深度地球道路数据集获得了较大的提升,分别在交并比评估指标获得1.17%和4.05%的提升。

(3)向网络中添加更多学习模块作为辅助损失取得了较好的结果。相比常规道路提取任务,加入边缘学习模块和连通性学习模块作为辅助损失,采用Transformer编码器层融合各模块输出的CSENet取得了更好的道路提取效果;在马萨诸塞州道路数据集上,F1值提高了0.29%,交并比提高了0.41%;在深度地球道路数据集上,F1值提高了0.74%,交并比提高了1%;提取的道路连通性更好,细节更加丰富。

(4)在降低过拟合时,使用随机覆盖影像的数据增强方法比使用随机失活神经元的正则化方法效果更好。相比随机失活神经元方法,使用0值矩阵随机覆盖影像,能够在不降低激活神经元数量的前提下,强迫网络去学习被覆盖区域的特征,从而强化网络的推理能力,降低模型过拟合风险。

(5)在影像的推理尺寸大于影像训练尺寸时,使用重叠的策略并结合大津法自适应选取阈值,能够得到更好的结果。重叠的推理策略能够减少边缘像素对推理结果的影响,大津法能够为每张待推理影像选取合适的分割阈值。

(6)使用重采样方法,将待推理影像的分辨率匹配至模型训练时分辨率,再将推理的结果重采样至原始分辨率,能够获得更好推理结果。在将马萨诸塞州道路数据集和深度地球道路数据集训练的最佳模型迁移至渥太华道路数据集时,分别获得了54.86%的交并比和73.66%的交并比;而迁移至民勤道路数据集则获得了较差的结果。

Other Abstract

As satellite remote sensing technology rapidly develops, high-resolution remote sensing images are becoming increasingly popular. However, traditional methods for road extraction at medium to low resolutions are not suitable for high-resolution remote sensing images. At the same time, the continuous advancement of deep learning technology has led to the flourishing development of various convolutional neural networks, graph neural networks, transformer networks, generative adversarial networks, and so on. In the field of computer vision, convolutional neural networks have become mainstream due to their good inductive bias and high efficiency. How to fully utilize the advantages of deep convolutional networks to achieve high-precision, high-efficiency, and real-time road extraction from high-resolution remote sensing images, serving national political and economic life, has become a important issue.

This article analyzes the current status of deep convolutional networks in semantic segmentation and studies the challenges of extracting roads from high-resolution images using deep convolutional networks. Specifically, these challenges include poor connectivity of extracted roads, insufficient accuracy of extracted roads, and poor generalization of the network. To address these issues, we propose a model called CSENet. CSENet adopts an encoder-decoder structure, where the encoder part uses ConvNeXt instead of ResNet, and a shared decoder of LinkNet architecture is used for road extraction. The road extraction is composed of three parts: road semantic segmentation, road connectivity segmentation, and edge segmentation. There are four branches that output in the three segmentation tasks: road semantic segmentation output, connectivity segmentation output at a distance of 1, connectivity segmentation output at a distance of 2, and edge segmentation output. We use a transformer encoder layer to fuse the four outputs and adjust the contribution of each output to the final result with different weights. Based on multiple experiments, the main research findings of this paper are as follows:

(1) Quantitative analyses were conducted on the Massachusetts Roads Dataset and the DeepGlobe Road Extraction Dataset. On the Massachusetts Roads Dataset, the CSENet model achieved a precision of 80.22%, a recall of 79.23%, an F1 score of 79.05%, and an intersection over union (IOU) of 66.45%. On the DeepGlobe Road Extraction Dataset, the CSENet model achieved a precision of 80.97%, a recall of 84.12%, an F1 score of 81.62%, and an IOU of 70.18%.

(2) The feature extraction capability of the backbone network in the decoder is crucial for the model. Replacing the ResNet34 backbone network with ConvNeXt-T led to significant improvements in both datasets, with increases of 1.17% and 4.05% in the IOU metric for the Massachusetts Roads Dataset and the DeepGlobe Road Extraction Dataset, respectively.

(3) Adding more learning modules as auxiliary losses to the network yielded better results. Compared with conventional road extraction tasks, adding edge learning modules and connectivity learning modules as auxiliary losses and fusing the outputs of all modules using the transformer encoder layer improved the road extraction results. On the Massachusetts Roads Dataset, the F1 score and IOU increased by 0.29% and 0.41%, respectively. On the DeepGlobe Road Extraction Dataset, the F1 score and IOU increased by 0.74% and 1%, respectively. The extracted road connectivity was improved, and the details were more abundant.

(4) The data augmentation method of random image covering was more effective than the regularized method of random dropout. Compared with the random dropout method, using a zero-matrix to randomly cover parts of the image could force the network to learn the features of the covered area without reducing the number of activated neurons, thereby enhancing the inference ability of the network and reducing overfitting.

(5) When the inference size of the image was larger than its training size, using the overlapping strategy and the Otsu method to adaptively select the threshold could produce better results. The overlapping inference strategy could reduce the impact of edge pixels on the inference results, while the Otsu method could select an appropriate segmentation threshold for each image to be inferred.

(6) Using resampling to match the resolution of the image to be inferred with the resolution of the training data, and then resampling the inference results to the original resolution, could produce better results. The best models trained on the Massachusetts Roads Dataset and the DeepGlobe Road Extraction Dataset were transferred to the Ottawa Road Dataset, achieving IOU values of 54.86% and 73.66%, respectively. However, the transferred models performed poorly on the Minqin Road Dataset.

Subject Area基于深度学习的遥感影像信息提取
MOST Discipline Catalogue理学 - 地理学 - 地图学与地理信息系统
URL查看原文
Language中文
Other Code262010_220200944731
Document Type学位论文
Identifierhttps://ir.lzu.edu.cn/handle/262010/538904
Collection资源环境学院
Affiliation
兰州大学资源环境学院
Recommended Citation
GB/T 7714
钟萧俊. 基于CSENet的高分辨率影像道路提取方法研究[D]. 兰州. 兰州大学,2023.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Altmetrics Score
Google Scholar
Similar articles in Google Scholar
[钟萧俊]'s Articles
Baidu academic
Similar articles in Baidu academic
[钟萧俊]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[钟萧俊]'s Articles
Terms of Use
No data!
Social Bookmark/Share
No comment.
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.