Bottleneck、CSP、DP结构详细介绍

news2025/11/4 7:20:08

文章目录

前言
一、Bottleneck
- DarknetBottleneck
二、CSP
- CSP思想
- pp-picodet中的CSPLayer
DP卷积

前言

本篇文章详细介绍了三种神经网络中常见的结构，bottleneck、CSP、DP，并附上了代码加深理解。

一、Bottleneck

Bottleneck出现在ResNet50/101/152这种深层网络中，基本思想就是先用1x1减少通道数再进行卷积最后再通过1x1升维减少计算量，最后和输入相加形成残差基本结构。
在这里插入图片描述

DarknetBottleneck

残差网络具有深远影响，有很多类似的结构，比如darknetBottleneck，这是在CSP里面用的一个子模块，是yolo里面使用的。下面看下代码，其实就是将3x3卷积和1x1升维放到了一起，理论上计算量是要大一点。

class DarknetBottleneck(nn.Layer):
    """The basic bottleneck block used in Darknet.

    Each Block consists of two ConvModules and the input is added to the
    final output. Each ConvModule is composed of Conv, BN, and act.
    The first convLayer has filter size of 1x1 and the second one has the
    filter size of 3x3.

    Args:
        in_channels (int): The input channels of this Module.
        out_channels (int): The output channels of this Module.
        expansion (int): The kernel size of the convolution. Default: 0.5
        add_identity (bool): Whether to add identity to the out.
            Default: True
        use_depthwise (bool): Whether to use depthwise separable convolution.
            Default: False
    """

    def __init__(self,
                 in_channels,
                 out_channels,
                 kernel_size=3,
                 expansion=0.5,
                 add_identity=True,
                 use_depthwise=False,
                 act="leaky_relu"):
        super(DarknetBottleneck, self).__init__()
        hidden_channels = int(out_channels * expansion)
        conv_func = DPModule if use_depthwise else ConvBNLayer
        self.conv1 = ConvBNLayer(
            in_channel=in_channels,
            out_channel=hidden_channels,
            kernel_size=1,
            act=act)
        self.conv2 = conv_func(
            in_channel=hidden_channels,
            out_channel=out_channels,
            kernel_size=kernel_size,
            stride=1,
            act=act)
        self.add_identity = \
            add_identity and in_channels == out_channels

    def forward(self, x):
        identity = x
        out = self.conv1(x)
        out = self.conv2(out)

        if self.add_identity:
            return out + identity
        else:
            return out

二、CSP

CSP思想

CSP结构出自CSPNet
下面两张图估计也是老网红了。CSP的核心思想就是把输入按照通道一分为二，只对其中一部分进行卷积等操作，另外一部分和第二部分的输出concat，也有点残差的味道。后来的yolov4的cspdarknet53就采用了这种结构，影响很广泛。
在这里插入图片描述

pp-picodet中的CSPLayer

以picodet里面的一段代码学习一下这个层的实现。输入分层，这里使用了两个1x1卷积，然后part2部分进行了darknetbottleneck操作，最后concat，非常简单易懂。

class CSPLayer(nn.Layer):
    """Cross Stage Partial Layer.

    Args:
        in_channels (int): The input channels of the CSP layer.
        out_channels (int): The output channels of the CSP layer.
        expand_ratio (float): Ratio to adjust the number of channels of the
            hidden layer. Default: 0.5
        num_blocks (int): Number of blocks. Default: 1
        add_identity (bool): Whether to add identity in blocks.
            Default: True
        use_depthwise (bool): Whether to depthwise separable convolution in
            blocks. Default: False
    """

    def __init__(self,
                 in_channels,
                 out_channels,
                 kernel_size=3,
                 expand_ratio=0.5,
                 num_blocks=1,
                 add_identity=True,
                 use_depthwise=False,
                 act="leaky_relu"):
        super().__init__()
        mid_channels = int(out_channels * expand_ratio)
        self.main_conv = ConvBNLayer(in_channels, mid_channels, 1, act=act)
        self.short_conv = ConvBNLayer(in_channels, mid_channels, 1, act=act)
        self.final_conv = ConvBNLayer(
            2 * mid_channels, out_channels, 1, act=act)

        self.blocks = nn.Sequential(* [
            DarknetBottleneck(
                mid_channels,
                mid_channels,
                kernel_size,
                1.0,
                add_identity,
                use_depthwise,
                act=act) for _ in range(num_blocks)
        ])

    def forward(self, x):
        x_short = self.short_conv(x)

        x_main = self.main_conv(x)
        x_main = self.blocks(x_main)

        x_final = paddle.concat((x_main, x_short), axis=1)
        return self.final_conv(x_final)

DP卷积

DP卷积就是Depth-wise and point-wise，即深度可分离卷积。假如我们见到这种结构，就是妥妥的DP卷积。
在这里插入图片描述
所谓dw卷积就是将卷积group参数设为输出通道数量。一般的卷积我们默认group=1，假设输入特征是 81x81x256，有256个通道，所需输出通道数为128，则卷积核尺寸为3x3x256，总共128个卷积核。如果group=2，则卷积核尺寸为3x3x128，总共64个卷积核。。以此类推，如果group=128，卷积核尺寸为3x3x2，只需一个卷积核。因为一个group就可以输出group数量的featuremap，因此所需卷积核数量=输出通道数量/group。一般我们所需的输出通道数与输入通道相等，即256，因此卷积核尺寸为3x3x1，group=256，也就是每个通道做一次2D卷积即可。
所谓pw卷积很简单，就是1x1卷积，目的是融合通道信息。
你会发现dw卷积之后，通道与通道之间没有任何交互，因此需要一个pw卷积进行跨通道信息融合。使用DP卷积可以大幅减少计算量，是很多轻量级网络的必备结构。
下面贴一个实现代码：

class DPModule(nn.Layer):
    """
    Depth-wise and point-wise module.
     Args:
        in_channel (int): The input channels of this Module.
        out_channel (int): The output channels of this Module.
        kernel_size (int): The conv2d kernel size of this Module.
        stride (int): The conv2d's stride of this Module.
        act (str): The activation function of this Module,
                   Now support `leaky_relu` and `hard_swish`.
    """

    def __init__(self,
                 in_channel=96,
                 out_channel=96,
                 kernel_size=3,
                 stride=1,
                 act='leaky_relu',
                 use_act_in_out=True):
        super(DPModule, self).__init__()
        initializer = nn.initializer.KaimingUniform()
        self.use_act_in_out = use_act_in_out
        self.dwconv = nn.Conv2D(
            in_channels=in_channel,
            out_channels=out_channel,
            kernel_size=kernel_size,
            groups=out_channel,
            padding=(kernel_size - 1) // 2,
            stride=stride,
            weight_attr=ParamAttr(initializer=initializer),
            bias_attr=False)
        self.bn1 = nn.BatchNorm2D(out_channel)
        self.pwconv = nn.Conv2D(
            in_channels=out_channel,
            out_channels=out_channel,
            kernel_size=1,
            groups=1,
            padding=0,
            weight_attr=ParamAttr(initializer=initializer),
            bias_attr=False)
        self.bn2 = nn.BatchNorm2D(out_channel)
        if act == "hard_swish":
            act = 'hardswish'
        self.act = act

    def forward(self, x):
        x = self.bn1(self.dwconv(x))
        if self.act:
            x = getattr(F, self.act)(x)
        x = self.bn2(self.pwconv(x))
        if self.use_act_in_out and self.act:
            x = getattr(F, self.act)(x)
        return x