【PyTorch】常用网络层layers总结

news2025/4/6 21:11:12

文章目录

前言
一、Convolution Layers
二、Pooling Layers
三、Padding Layers
总结

前言

PyTorch中网络搭建主要是通过调用layers实现的，这篇文章总结了putorch中最常用的几个网络层接口及其参数。

一、Convolution Layers

pytorch官方文档介绍了众多卷积层算法，以最新的pytorch2.4为例，针对处理的数据维度不同，有如下卷积层layers：

卷积层名	描述
nn.Conv1d	Applies a 1D convolution over an input signal composed of several input planes.
nn.Conv2d	Applies a 2D convolution over an input signal composed of several input planes.
nn.Conv3d	Applies a 3D convolution over an input signal composed of several input planes.
nn.ConvTranspose1d	Applies a 1D transposed convolution operator over an input image composed of several input planes.
nn.ConvTranspose2d	Applies a 2D transposed convolution operator over an input image composed of several input planes.
nn.ConvTranspose3d	Applies a 3D transposed convolution operator over an input image composed of several input planes.
……	……
这里以使用最多的二维卷积为例，介绍卷积层的基本算法。

torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)

Parameters：

in_channels (int) – 输入通道数
out_channels (int) – 输出通道数
kernel_size (int or tuple) – 卷积核大小
stride (int or tuple, optional) – 卷积步长
padding (int, tuple or str, optional) – 对于输入图像的四周进行填充的数量进行控制，可指定填充像素数量，也可以指定填充模式，如"same", “valid”
padding_mode (str, optional) – 填充类型
dilation (int or tuple, optional) – 孔洞卷积的孔洞大小
groups (int, optional) – 分组卷积的分组
bias (bool, optional) – 是否采用偏置

对于卷积层的输入输出维度换算，官方文档给出了如下公式：

输入尺度： $N,C_{in},H_{in},W_{in})$
输出尺度： $N,C_{out},H_{out},W_{out})$
换算关系：
$H_{out}=\left\lfloor\frac{H_{in}+2\times\text{padding}[0]-\text{dilation}[0]\times(\text{kernel size}[0]-1)-1}{\text{stride}[0]}+1\right\rfloor\\W_{out}=\left\lfloor\frac{W_{in}+2\times\text{padding}[1]-\text{dilation}[1]\times(\text{kernel size}[1]-1)-1}{\text{stride}[1]}+1\right\rfloor$

kernel_size可以以int或tuple传入，前者时公式中的kernel size[0]和kernel size[1]都为传参的int值，其他参数也类似。

二、Pooling Layers

池化层的作用是信息降维，针对二维信息，pytorch提供了以下池化算法

池化层名	描述
nn.MaxPool2d	Applies a 2D max pooling over an input signal composed of several input planes.
nn.MaxUnpool2d	Computes a partial inverse of MaxPool2d.
nn.AvgPool2d	Applies a 2D average pooling over an input signal composed of several input planes.
nn.FractionalMaxPool2d	Applies a 2D fractional max pooling over an input signal composed of several input planes.
nn.LPPool2d	Applies a 2D power-average pooling over an input signal composed of several input planes.
nn.AdaptiveMaxPool2d	Applies a 2D adaptive max pooling over an input signal composed of several input planes.
nn.AdaptiveAvgPool2d	Applies a 2D adaptive average pooling over an input signal composed of several input planes.

这里以常见的最大池化maxpool为例介绍池化层（其他类似）

torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)

Parameters:

kernel_size (Union[int, Tuple[int, int]]) – 池化窗口大小
stride (Union[int, Tuple[int, int]]) – 池化步长
padding (Union[int, Tuple[int, int]]) – 填充大小
dilation (Union[int, Tuple[int, int]]) – 空洞大小
return_indices (bool) – 是否返回最大值所在位置
ceil_mode (bool) – 如果无法整除，选择向下取整还是向上取整，默认向下取整

池化层的输出大小计算和卷积是一样的。

输入尺度： $N,C_{in},H_{in},W_{in})$
输出尺度： $N,C_{out},H_{out},W_{out})$
换算关系：
$H_{out}=\left\lfloor\frac{H_{in}+2\times\text{padding}[0]-\text{dilation}[0]\times(\text{kernel size}[0]-1)-1}{\text{stride}[0]}+1\right\rfloor\\W_{out}=\left\lfloor\frac{W_{in}+2\times\text{padding}[1]-\text{dilation}[1]\times(\text{kernel size}[1]-1)-1}{\text{stride}[1]}+1\right\rfloor$

三、Padding Layers

pytorch提供了ReflectionPad、ReplicationPad、ZeroPad、ConstantPad、CircularPad多种padding策略。以二维信号处理为例，各自的介绍如下：

填充层名	描述
nn.ReflectionPad2d	Pads the input tensor using the reflection of the input boundary.
nn.ReplicationPad2d	Pads the input tensor using replication of the input boundary.
nn.ZeroPad2d	Pads the input tensor boundaries with zero.
nn.ConstantPad2d	Pads the input tensor boundaries with a constant value.
nn.CircularPad2d	Pads the input tensor using circular padding of the input boundary.

padding层代码比较简单，只有一个参数需要传递。以nn.ZeroPad2d为例：

torch.nn.ZeroPad2d(padding)

Parameters:

padding (int, tuple) – the size of the padding. If is int, uses the same padding in all boundaries. If a 4-tuple, uses (padding_left, padding_right, padding_top, padding_bottom)
当padding参数传入整数，则在张量四周都填充相同维度的数值，如果以有四个元素的元组传入，则元组各元素分别控制左边、右边、上边、下边的维度。

不妨看一下官网示例：

>>> m = nn.ZeroPad2d(2)
>>> input = torch.randn(1, 1, 3, 3)
>>> input
tensor([[[[-0.1678, -0.4418,  1.9466],
          [ 0.9604, -0.4219, -0.5241],
          [-0.9162, -0.5436, -0.6446]]]])
>>> m(input)
tensor([[[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
          [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
          [ 0.0000,  0.0000, -0.1678, -0.4418,  1.9466,  0.0000,  0.0000],
          [ 0.0000,  0.0000,  0.9604, -0.4219, -0.5241,  0.0000,  0.0000],
          [ 0.0000,  0.0000, -0.9162, -0.5436, -0.6446,  0.0000,  0.0000],
          [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
          [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]]]])
>>> # using different paddings for different sides
>>> m = nn.ZeroPad2d((1, 1, 2, 0))
>>> m(input)
tensor([[[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
          [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
          [ 0.0000, -0.1678, -0.4418,  1.9466,  0.0000],
          [ 0.0000,  0.9604, -0.4219, -0.5241,  0.0000],
          [ 0.0000, -0.9162, -0.5436, -0.6446,  0.0000]]]])