YOLOv11改进 | 上采样篇 | YOLOv11引入DySample轻量级动态上采样器

1. DySample介绍

1.1 摘要：我们提出了DySample，一个超轻量和有效的动态上采样器。虽然最近的基于内核的动态上采样器（如CARAFE、FADE和SAPA）的性能提升令人印象深刻，但它们引入了大量工作负载，主要是由于耗时的动态卷积和用于生成动态内核的额外子网。此外，FADE和SAPA对高分辨率功能指导的需求在某种程度上限制了它们的应用场景。为了解决这些问题，我们绕过了动态卷积，从点采样的角度制定了上采样，这更节省资源，而且可以很容易地用PyTorch中的标准内置函数实现。我们首先展示一个简单的设计，然后演示如何逐步增强其上采样行为，以实现我们的新上采样器DySample。与以前的基于内核的动态上采样器相比，DySample不需要定制的CUDA包，具有更少的参数、FLOP、GPU内存和延迟。除了轻量级的特性外，DySample在五个密集预测任务中的表现优于其他上采样器，包括语义分割、对象检测、实例分割、全景分割和单目深度估计。

官方论文地址：https://arxiv.org/pdf/2308.15085

1.2 简单介绍:

DySample模块是一种超轻量级且高效的动态上采样器，通过从点采样的角度重新定义了上采样过程。DySample模块的设计旨在解决现有动态上采样器的高计算开销和复杂性问题，同时提供更好的性能和效率。以下是对DySample模块的描述：

(1). 基本设计

点采样的基础概念：DySample模块的核心思想是将上采样过程视为点采样。不同于传统的核方法，DySample通过生成偏移量来重新采样输入特征图中的连续区域。这种方法被称为“动态采样”，其目的是在保持高效性的同时实现灵活的上采样操作。

初步实现：在初步实现中，首先使用线性层生成偏移量，然后通过像素混洗（Pixel Shuffle）函数将这个偏移量与原始网格位置相加，从而生成新的采样点集合。这个过程被称为“静态偏移范围因子”。

(2). 改进设计

初始采样位置调整：研究发现，共享相同的初始偏移位置会导致采样点之间缺乏位置关系，这会影响模型的性能。因此，提出了“双线性初始化”策略，即将初始采样位置分布得更加均匀，从而提高模型的准确性。

偏移范围控制：由于标准化层的存在，输出特征值通常在[−1, 1]范围内。为了减少偏移范围的重叠，采用了“静态偏移范围因子”，将偏移范围限制在[−0.25, 0.25]内。此外，还引入了动态偏移范围因子，通过线性投影和Sigmoid函数进一步调整偏移范围。

分组上采样：为了进一步提高灵活性和性能，DySample采用了多组上采样策略。具体来说，将特征图沿着通道维度划分为多个组，每个组内共享相同的采样集。实验结果表明，分组上采样能够显著提升模型性能。

(3). 动态上采样过程

偏移生成方式：DySample模块有两种偏移生成方式：“线性+像素混洗”（LP）和“像素混洗+线性”（PL）。LP方式需要更多的参数和内存开销，但更灵活；PL方式则相反，具有更少的参数和更快的推理速度。

上采样过程可视化：通过对局部区域的放大显示，可以看到DySample如何将一个边缘点分成四个点进行上采样，以增强边界的清晰度。这种设计确保了上采样后的图像细节丰富且边界清晰。

综上DySample模块通过一系列改进设计和优化措施，实现了高效的动态上采样功能。其独特的点采样方法和灵活的偏移控制机制使其在多个任务中表现出色，同时具备较低的计算资源消耗和较高的推理速度。这些特点使得DySample成为现有动态上采样器中极具竞争力的一种选择。

1.3 模块结构图

2. 核心代码

import torch
import torch.nn as nn
import torch.nn.functional as F

__all__ = ['Dy_Sample']


def normal_init(module, mean=0, std=1, bias=0):
    if hasattr(module, 'weight') and module.weight is not None:
        nn.init.normal_(module.weight, mean, std)
    if hasattr(module, 'bias') and module.bias is not None:
        nn.init.constant_(module.bias, bias)


def constant_init(module, val, bias=0):
    if hasattr(module, 'weight') and module.weight is not None:
        nn.init.constant_(module.weight, val)
    if hasattr(module, 'bias') and module.bias is not None:
        nn.init.constant_(module.bias, bias)


class Dy_Sample(nn.Module):
    def __init__(self, in_channels, scale=2, style='lp', groups=4, dyscope=False):
        super().__init__()
        self.scale = scale
        self.style = style
        self.groups = groups
        assert style in ['lp', 'pl']
        if style == 'pl':
            assert in_channels >= scale ** 2 and in_channels % scale ** 2 == 0
        assert in_channels >= groups and in_channels % groups == 0

        if style == 'pl':
            in_channels = in_channels // scale ** 2
            out_channels = 2 * groups
        else:
            out_channels = 2 * groups * scale ** 2

        self.offset = nn.Conv2d(in_channels, out_channels, 1)
        normal_init(self.offset, std=0.001)
        if dyscope:
            self.scope = nn.Conv2d(in_channels, out_channels, 1)
            constant_init(self.scope, val=0.)

        self.register_buffer('init_pos', self._init_pos())

    def _init_pos(self):
        h = torch.arange((-self.scale + 1) / 2, (self.scale - 1) / 2 + 1) / self.scale
        return torch.stack(torch.meshgrid([h, h])).transpose(1, 2).repeat(1, self.groups, 1).reshape(1, -1, 1, 1)

    def sample(self, x, offset):
        B, _, H, W = offset.shape
        offset = offset.view(B, 2, -1, H, W)
        coords_h = torch.arange(H) + 0.5
        coords_w = torch.arange(W) + 0.5
        coords = torch.stack(torch.meshgrid([coords_w, coords_h])
                             ).transpose(1, 2).unsqueeze(1).unsqueeze(0).type(x.dtype).to(x.device)
        normalizer = torch.tensor([W, H], dtype=x.dtype, device=x.device).view(1, 2, 1, 1, 1)
        coords = 2 * (coords + offset) / normalizer - 1
        coords = F.pixel_shuffle(coords.view(B, -1, H, W), self.scale).view(
            B, 2, -1, self.scale * H, self.scale * W).permute(0, 2, 3, 4, 1).contiguous().flatten(0, 1)
        return F.grid_sample(x.reshape(B * self.groups, -1, H, W), coords, mode='bilinear',
                             align_corners=False, padding_mode="border").view(B, -1, self.scale * H, self.scale * W)

    def forward_lp(self, x):
        if hasattr(self, 'scope'):
            offset = self.offset(x) * self.scope(x).sigmoid() * 0.5 + self.init_pos
        else:
            offset = self.offset(x) * 0.25 + self.init_pos
        return self.sample(x, offset)

    def forward_pl(self, x):
        x_ = F.pixel_shuffle(x, self.scale)
        if hasattr(self, 'scope'):
            offset = F.pixel_unshuffle(self.offset(x_) * self.scope(x_).sigmoid(), self.scale) * 0.5 + self.init_pos
        else:
            offset = F.pixel_unshuffle(self.offset(x_), self.scale) * 0.25 + self.init_pos
        return self.sample(x, offset)

    def forward(self, x):
        if self.style == 'pl':
            return self.forward_pl(x)
        return self.forward_lp(x)


if __name__ == '__main__':
    x = torch.rand(2, 64, 4, 7)
    dys = Dy_Sample(64)
    print(dys(x).shape)

3. YOLOv11中添加DySample

3.1 在ultralytics/nn下新建Extramodule

3.2 在Extramodule里创建DySample

在DySample.py文件里添加给出的DySample代码

添加完DySample代码后，在ultralytics/nn/Extramodule/__init__.py文件中引用

3.3 在tasks.py里引用

在ultralytics/nn/tasks.py文件里引用Extramodule

在tasks.py找到parse_model（ctrl+f可以直接搜索parse_model位置）

添加如下代码：

        elif m in {Dy_Sample}:
            c2 = ch[f]
            args = [c2, *args]

4. 新建一个yolo11DySample.yaml文件

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 1 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs

# YOLO11n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  - [-1, 2, C3k2, [256, False, 0.25]]
  - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  - [-1, 2, C3k2, [512, False, 0.25]]
  - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  - [-1, 2, C3k2, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  - [-1, 2, C3k2, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]] # 9
  - [-1, 2, C2PSA, [1024]] # 10

# YOLO11n head
head:
  - [-1, 1, Dy_Sample, []]
  - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  - [-1, 2, C3k2, [512, False]] # 13

  - [-1, 1, Dy_Sample, []]
  - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  - [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 13], 1, Concat, [1]] # cat head P4
  - [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 10], 1, Concat, [1]] # cat head P5
  - [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)

  - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)

大家根据自己的数据集实际情况，修改nc大小。

5. 模型训练

import warnings
warnings.filterwarnings('ignore')
from ultralytics import YOLO

if __name__ == '__main__':
    model = YOLO(r'D:\yolo\yolov11\ultralytics-main\datasets\yolo11DySample.yaml')
    model.train(data=r'D:\yolo\yolov11\ultralytics-main\datasets\data.yaml',
                cache=False,
                imgsz=640,
                epochs=100,
                single_cls=False,  # 是否是单类别检测
                batch=4,
                close_mosaic=10,
                workers=0,
                device='0',
                optimizer='SGD',
                amp=True,
                project='runs/train',
                name='exp',
                )

模型结构打印，成功运行：