文章目录
- 前言
- 一、Convolution Layers
- 二、Pooling Layers
- 三、Padding Layers
- 总结
前言
PyTorch中网络搭建主要是通过调用layers实现的,这篇文章总结了putorch中最常用的几个网络层接口及其参数。
一、Convolution Layers
pytorch官方文档介绍了众多卷积层算法,以最新的pytorch2.4为例,针对处理的数据维度不同,有如下卷积层layers:
卷积层名 | 描述 |
---|---|
nn.Conv1d | Applies a 1D convolution over an input signal composed of several input planes. |
nn.Conv2d | Applies a 2D convolution over an input signal composed of several input planes. |
nn.Conv3d | Applies a 3D convolution over an input signal composed of several input planes. |
nn.ConvTranspose1d | Applies a 1D transposed convolution operator over an input image composed of several input planes. |
nn.ConvTranspose2d | Applies a 2D transposed convolution operator over an input image composed of several input planes. |
nn.ConvTranspose3d | Applies a 3D transposed convolution operator over an input image composed of several input planes. |
…… | …… |
这里以使用最多的二维卷积为例,介绍卷积层的基本算法。 |
torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)
Parameters:
- in_channels (int) – 输入通道数
- out_channels (int) – 输出通道数
- kernel_size (int or tuple) – 卷积核大小
- stride (int or tuple, optional) – 卷积步长
- padding (int, tuple or str, optional) – 对于输入图像的四周进行填充的数量进行控制,可指定填充像素数量,也可以指定填充模式,如"same", “valid”
- padding_mode (str, optional) – 填充类型
- dilation (int or tuple, optional) – 孔洞卷积的孔洞大小
- groups (int, optional) – 分组卷积的分组
- bias (bool, optional) – 是否采用偏置
对于卷积层的输入输出维度换算,官方文档给出了如下公式:
- 输入尺度: ( N , C i n , H i n , W i n ) (N,C_{in},H_{in},W_{in}) (N,Cin,Hin,Win)
- 输出尺度: ( N , C o u t , H o u t , W o u t ) (N,C_{out},H_{out},W_{out}) (N,Cout,Hout,Wout)
- 换算关系:
H o u t = ⌊ H i n + 2 × padding [ 0 ] − dilation [ 0 ] × ( kernel size [ 0 ] − 1 ) − 1 stride [ 0 ] + 1 ⌋ W o u t = ⌊ W i n + 2 × padding [ 1 ] − dilation [ 1 ] × ( kernel size [ 1 ] − 1 ) − 1 stride [ 1 ] + 1 ⌋ H_{out}=\left\lfloor\frac{H_{in}+2\times\text{padding}[0]-\text{dilation}[0]\times(\text{kernel size}[0]-1)-1}{\text{stride}[0]}+1\right\rfloor\\W_{out}=\left\lfloor\frac{W_{in}+2\times\text{padding}[1]-\text{dilation}[1]\times(\text{kernel size}[1]-1)-1}{\text{stride}[1]}+1\right\rfloor Hout=⌊stride[0]Hin+2×padding[0]−dilation[0]×(kernel size[0]−1)−1+1⌋Wout=⌊stride[1]Win+2×padding[1]−dilation[1]×(kernel size[1]−1)−1+1⌋
kernel_size可以以int或tuple传入,前者时公式中的kernel size[0]和kernel size[1]都为传参的int值,其他参数也类似。
二、Pooling Layers
池化层的作用是信息降维,针对二维信息,pytorch提供了以下池化算法
池化层名 | 描述 |
---|---|
nn.MaxPool2d | Applies a 2D max pooling over an input signal composed of several input planes. |
nn.MaxUnpool2d | Computes a partial inverse of MaxPool2d. |
nn.AvgPool2d | Applies a 2D average pooling over an input signal composed of several input planes. |
nn.FractionalMaxPool2d | Applies a 2D fractional max pooling over an input signal composed of several input planes. |
nn.LPPool2d | Applies a 2D power-average pooling over an input signal composed of several input planes. |
nn.AdaptiveMaxPool2d | Applies a 2D adaptive max pooling over an input signal composed of several input planes. |
nn.AdaptiveAvgPool2d | Applies a 2D adaptive average pooling over an input signal composed of several input planes. |
这里以常见的最大池化maxpool为例介绍池化层(其他类似)
torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
Parameters:
- kernel_size (Union[int, Tuple[int, int]]) – 池化窗口大小
- stride (Union[int, Tuple[int, int]]) – 池化步长
- padding (Union[int, Tuple[int, int]]) – 填充大小
- dilation (Union[int, Tuple[int, int]]) – 空洞大小
- return_indices (bool) – 是否返回最大值所在位置
- ceil_mode (bool) – 如果无法整除,选择向下取整还是向上取整,默认向下取整
池化层的输出大小计算和卷积是一样的。
- 输入尺度: ( N , C i n , H i n , W i n ) (N,C_{in},H_{in},W_{in}) (N,Cin,Hin,Win)
- 输出尺度: ( N , C o u t , H o u t , W o u t ) (N,C_{out},H_{out},W_{out}) (N,Cout,Hout,Wout)
- 换算关系:
H o u t = ⌊ H i n + 2 × padding [ 0 ] − dilation [ 0 ] × ( kernel size [ 0 ] − 1 ) − 1 stride [ 0 ] + 1 ⌋ W o u t = ⌊ W i n + 2 × padding [ 1 ] − dilation [ 1 ] × ( kernel size [ 1 ] − 1 ) − 1 stride [ 1 ] + 1 ⌋ H_{out}=\left\lfloor\frac{H_{in}+2\times\text{padding}[0]-\text{dilation}[0]\times(\text{kernel size}[0]-1)-1}{\text{stride}[0]}+1\right\rfloor\\W_{out}=\left\lfloor\frac{W_{in}+2\times\text{padding}[1]-\text{dilation}[1]\times(\text{kernel size}[1]-1)-1}{\text{stride}[1]}+1\right\rfloor Hout=⌊stride[0]Hin+2×padding[0]−dilation[0]×(kernel size[0]−1)−1+1⌋Wout=⌊stride[1]Win+2×padding[1]−dilation[1]×(kernel size[1]−1)−1+1⌋
三、Padding Layers
pytorch提供了ReflectionPad、ReplicationPad、ZeroPad、ConstantPad、CircularPad多种padding策略。以二维信号处理为例,各自的介绍如下:
填充层名 | 描述 |
---|---|
nn.ReflectionPad2d | Pads the input tensor using the reflection of the input boundary. |
nn.ReplicationPad2d | Pads the input tensor using replication of the input boundary. |
nn.ZeroPad2d | Pads the input tensor boundaries with zero. |
nn.ConstantPad2d | Pads the input tensor boundaries with a constant value. |
nn.CircularPad2d | Pads the input tensor using circular padding of the input boundary. |
padding层代码比较简单,只有一个参数需要传递。以nn.ZeroPad2d为例:
torch.nn.ZeroPad2d(padding)
Parameters:
- padding (int, tuple) – the size of the padding. If is int, uses the same padding in all boundaries. If a 4-tuple, uses (padding_left, padding_right, padding_top, padding_bottom)
当padding参数传入整数,则在张量四周都填充相同维度的数值,如果以有四个元素的元组传入,则元组各元素分别控制左边、右边、上边、下边的维度。
不妨看一下官网示例:
>>> m = nn.ZeroPad2d(2)
>>> input = torch.randn(1, 1, 3, 3)
>>> input
tensor([[[[-0.1678, -0.4418, 1.9466],
[ 0.9604, -0.4219, -0.5241],
[-0.9162, -0.5436, -0.6446]]]])
>>> m(input)
tensor([[[[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, -0.1678, -0.4418, 1.9466, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.9604, -0.4219, -0.5241, 0.0000, 0.0000],
[ 0.0000, 0.0000, -0.9162, -0.5436, -0.6446, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]]]])
>>> # using different paddings for different sides
>>> m = nn.ZeroPad2d((1, 1, 2, 0))
>>> m(input)
tensor([[[[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[ 0.0000, -0.1678, -0.4418, 1.9466, 0.0000],
[ 0.0000, 0.9604, -0.4219, -0.5241, 0.0000],
[ 0.0000, -0.9162, -0.5436, -0.6446, 0.0000]]]])
总结
本文对pytorch使用最多的layers进行了介绍,重点介绍了网络层接口以及参数。由于篇幅限制,其余众多网络层建议查看PyTorch官方文档学习。