超分辨率遥感图像去云的扩散增强训练

news2025/4/9 10:03:35

GitHub - littlebeen/Cloud-removal-model-collection: A collection of the existing end-to-end cloud removal model
readme

云恢复的扩散增强

基于ADM的超分辨率遥感图像去云扩散增强算法。

几种传统的CR模型可以参考https://github.com/littlebeen/Cloud-removal-model-collection!

使用

训练

纯扩散 respace.py: gaussian_diffusion;unet.py: UnetModel

锁定扩散+训练WA:gaussian_diffusion_enhance;unet.py: UnetModel256;锁定在train_util.py的第74行

全部更改 train_util.py第74行

测试

将预训练模型放入` pre_train `中
python super_res_sample.py
权重

在带有mn和mdsa的RICE2上进行预训练的模型被上传。百度网盘请输入提取码密码bean

CUHK-CR
一个新的多光谱云去除数据集
下载链接百度网盘请输入提取码密码bean

-CUHK-CR1(薄云数据集CUHK-CR1的RGB图像)

-CUHK-CR2 (厚云数据集CUHK-CR2的RGB图像)

-近红外(CUHK-CR1及CUHK-CR2的近红外图像)
如果你需要4个波段(RGB + 近红外)的图像，你可以加载RGB数据集和近红外数据集中的图像，并将4个通道组合在一起。

  File "super_res_train.py", line 124, in <module>
    main()
  File "super_res_train.py", line 27, in main
    dist_util.setup_dist()
  File "D:\learn\txhf\DDPM-Enhancement-for-Cloud-Removal-main\guided_diffusion\dist_util.py", line 42, in setup_dist
    dist.init_process_group(backend=backend, init_method="env://")
  File "D:\an\anaconda\envs\inpaint\lib\site-packages\torch\distributed\distributed_c10d.py", line 602, in init_process_group
    default_pg = _new_process_group_helper(
  File "D:\an\anaconda\envs\inpaint\lib\site-packages\torch\distributed\distributed_c10d.py", line 727, in _new_process_group_helper
    raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in

首先你要有一个cuda环境，然后安装资源包

pip install blobfile
pip install mpi4py

运行一个是训练一个是测试，先运行训练。

python super_res_train.py
python super_res_sample.py

运行报错

ImportError: DLL load failed while importing MPI: 找不到指定的模块。

因为本机缺乏MPI程序，直接此处下载 https://www.microsoft.com/en-us/download/details.aspx?id=57467，安装到默认C盘地方因为也不大。

运行报错

RuntimeError: No CUDA GPUs are available

super_res_train.py中的os.environ["CUDA_VISIBLE_DEVICES"] = "1" 改为0，电脑默认的cuda是0

运行报错

   raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in

windows不支持NCCL backend 原代码可能用的linux系统
super_res_train.py中加入

import os
os.environ["PL_TORCH_DISTRIBUTED_BACKEND"] = "gloo"

同时将报错的地方 guided_diffusion/dist_util.py路径下的

dist.init_process_group(backend=backend, init_method="env://")
修改为     dist.init_process_group(backend='gloo', init_method="env://")

运行报错

FileNotFoundError: [Errno 2] No such file or directory: './guided_diffusion/cloudnet/mn/pretrain/mn2.pth'

找不到预训练权重我们 到百度网盘下载同时 路径放到这个下面并修改文件名称(模型+数据) 保证路径一致 './guided_diffusion/cloudnet/mn/pretrain/mn2.pth'
我直接将guided_diffusion/diff/gaussian_diffusion_enhance.py路径下的  self.cloudnet.load_state_dict(th.load('./guided_diffusion/cloudnet/'+model+'/pretrain/'+model+'2.pth'),strict=True)直接r+绝对路径self.cloudnet.load_state_dict(th.load(r'weight/ema_0.9999_010000.pt.pth'),strict=True) pt和pth本质一样后缀名随意改

运行报错

raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for MPRNet:

load_state_dict方法参数的官方说明 strict 参数默认是true，他的含义是是否严格要求state_dict中的键与该模块的键返回的键匹配。在报错代码guided_diffusion/diff/gaussian_diffusion_enhance.py路径下，将strict=True改为strict=False

if(data=='RICE1'):
                    # self.cloudnet.load_state_dict(th.load('./guided_diffusion/cloudnet/'+model+'/pretrain/'+model+'rice2.pth'),strict=True)
                    self.cloudnet.load_state_dict(
                        th.load('./guided_diffusion/cloudnet/' + 'pretrain/' + model + '_rice2.pth'),
                        strict=True)#改为False
                    print(model+'1 is load')
                elif(data=='RICE2'):
                    # self.cloudnet.load_state_dict(th.load('./guided_diffusion/cloudnet/'+model+'/pretrain/'+model+'rice2.pth'),strict=True)
                    self.cloudnet.load_state_dict(
                        th.load('./guided_diffusion/cloudnet/' + 'pretrain/' + model + '_rice2.pth'),
                        strict=True)#改为False
                    print(model+'2 is load')

改参数为False即可：

下载的这两个模型和数据集都加载不了 全换成别的

FileNotFoundError: The system cannot find the path specified: './pre_train'

在DDPM-Enhancement-for-Cloud-Removal-main路径下创建一个pre_train文件夹

RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

出现报错

    work.wait()
RuntimeError: a leaf Variable that requires grad is being used in an in-place operat

这个可能是多卡运行的问题，注释掉报错代码，路径在guided_diffusion/train_util.py

       self._load_and_sync_parameters()

找不到数据集，修改数据集路径

  File "D:\an\anaconda\envs\inpaint\lib\site-packages\blobfile\_context.py", line 353, in scandir
    raise FileNotFoundError(f"The system cannot find the path specified: '{path}'")
FileNotFoundError: The system cannot find the path specified: '../data/RICE2/train/cloud'

DDPM-Enhancement-for-Cloud-Removal-main根目录下，新建data文件夹，将数据按照格式放进去。