文章目录
- 1.卷积层-Convolution Layers
- 1.1 1d/2d/3d卷积
- 1.2卷积--nn.Conv2d
- 1.3转置卷积(实现上采样)
- 2.池化层
- 3.线性层—Linear Layer
- 4.激活函数层—Activate Layer
1.卷积层-Convolution Layers
卷积运算:卷积运算在输入信号(图像)上滑动,相应位置上进行乘加.
卷积核:又称过滤器,可认为是某种形式,某种特征.
卷积过程:类似于用一个模板去图形上寻找与它相似的区域,与卷积核模式越相似,激活值越高,从而实现特征提取.
1.1 1d/2d/3d卷积
卷积维度:一般情况下,卷积核在几个维度上滑动,就是几维卷积
(1)一维卷积
卷积
(2)二维卷积
(3)三维卷积
1.2卷积–nn.Conv2d
functions: 对多个二维信号进行二维卷积
params:
Args:
in_channels (int): Number of channels in the input image
out_channels (int): Number of channels produced by the convolution
kernel_size (int or tuple): Size of the convolving kernel
stride (int or tuple, optional): Stride of the convolution. Default: 1
padding (int, tuple or str, optional): Padding added to all four sides of
the input. Default: 0
padding_mode (str, optional): ``'zeros'``, ``'reflect'``,
``'replicate'`` or ``'circular'``. Default: ``'zeros'``
dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
groups (int, optional): Number of blocked connections from input
channels to output channels. Default: 1
bias (bool, optional): If ``True``, adds a learnable bias to the
output. Default: ``True``
Shape:
- Input: :math:`(N, C_{in}, H_{in}, W_{in})` or :math:`(C_{in}, H_{in}, W_{in})`
- Output: :math:`(N, C_{out}, H_{out}, W_{out})` or :math:`(C_{out}, H_{out}, W_{out})`, where
.. math::
H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] - \text{dilation}[0]
\times (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor
.. math::
W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] - \text{dilation}[1]
\times (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor
Attributes:
weight (Tensor): the learnable weights of the module of shape
:math:`(\text{out\_channels}, \frac{\text{in\_channels}}{\text{groups}},`
:math:`\text{kernel\_size[0]}, \text{kernel\_size[1]})`.
The values of these weights are sampled from
:math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
:math:`k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}`
bias (Tensor): the learnable bias of the module of shape
(out_channels). If :attr:`bias` is ``True``,
then the values of these weights are
sampled from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
:math:`k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}`
examples:
Examples:
>>> # With square kernels and equal stride
>>> m = nn.Conv2d(16, 33, 3, stride=2)
>>> # non-square kernels and unequal stride and with padding
>>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
>>> # non-square kernels and unequal stride and with padding and dilation
>>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1))
>>> input = torch.randn(20, 16, 50, 100)
>>> output = m(input)
output_size calculation formula:
Conv2d运算原理:
主要代码段如下:
(1)加载图片,将图片处理成张量的形式:
# ================================= load img ==================================
path_img = os.path.join(os.path.dirname(os.path.abspath(__file__)), "pig.jpeg")
print(path_img)
img = Image.open(path_img).convert('RGB') # 0~255
# convert to tensor compose封装图像预处理方法
img_transform = transforms.Compose([transforms.ToTensor()])
img_tensor = img_transform(img)
# 添加 batch 维度
img_tensor.unsqueeze_(dim=0) # C*H*W to B*C*H*W
(2) 进行卷积操作:
# =============== create convolution layer ==================
# ================ 2d
flag = 1
#flag = 0
if flag:
#定义一个卷积层
conv_layer = nn.Conv2d(3, 1, 3) # input:(i, o, size) weights:(o, i , h, w)
# 初始化卷积层权值
nn.init.xavier_normal_(conv_layer.weight.data)
# nn.init.xavier_uniform_(conv_layer.weight.data)
# 卷积运算
img_conv = conv_layer(img_tensor)
result followed:
# ================================= visualization ==================================
print("卷积前尺寸:{}\n卷积后尺寸:{}".format(img_tensor.shape, img_conv.shape))
img_conv = transform_invert(img_conv[0, 0:1, ...], img_transform)
img_raw = transform_invert(img_tensor.squeeze(), img_transform)
plt.subplot(122).imshow(img_conv, cmap='gray')
plt.subplot(121).imshow(img_raw)
plt.show()
1.3转置卷积(实现上采样)
nn.ConvTranspose2d
Args:
in_channels (int): Number of channels in the input image
out_channels (int): Number of channels produced by the convolution
kernel_size (int or tuple): Size of the convolving kernel
stride (int or tuple, optional): Stride of the convolution. Default: 1
padding (int or tuple, optional): ``dilation * (kernel_size - 1) - padding`` zero-padding
will be added to both sides of each dimension in the input. Default: 0
output_padding (int or tuple, optional): Additional size added to one side
of each dimension in the output shape. Default: 0
groups (int, optional): Number of blocked connections from input channels to output channels. Default: 1
bias (bool, optional): If ``True``, adds a learnable bias to the output. Default: ``True``
dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
""".format(**reproducibility_notes, **convolution_notes) + r"""
Shape:
- Input: :math:`(N, C_{in}, H_{in}, W_{in})` or :math:`(C_{in}, H_{in}, W_{in})`
- Output: :math:`(N, C_{out}, H_{out}, W_{out})` or :math:`(C_{out}, H_{out}, W_{out})`, where
.. math::
H_{out} = (H_{in} - 1) \times \text{stride}[0] - 2 \times \text{padding}[0] + \text{dilation}[0]
\times (\text{kernel\_size}[0] - 1) + \text{output\_padding}[0] + 1
.. math::
W_{out} = (W_{in} - 1) \times \text{stride}[1] - 2 \times \text{padding}[1] + \text{dilation}[1]
\times (\text{kernel\_size}[1] - 1) + \text{output\_padding}[1] + 1
Attributes:
weight (Tensor): the learnable weights of the module of shape
:math:`(\text{in\_channels}, \frac{\text{out\_channels}}{\text{groups}},`
:math:`\text{kernel\_size[0]}, \text{kernel\_size[1]})`.
The values of these weights are sampled from
:math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
:math:`k = \frac{groups}{C_\text{out} * \prod_{i=0}^{1}\text{kernel\_size}[i]}`
bias (Tensor): the learnable bias of the module of shape (out_channels)
If :attr:`bias` is ``True``, then the values of these weights are
sampled from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
:math:`k = \frac{groups}{C_\text{out} * \prod_{i=0}^{1}\text{kernel\_size}[i]}`
Examples::
>>> # With square kernels and equal stride
>>> m = nn.ConvTranspose2d(16, 33, 3, stride=2)
>>> # non-square kernels and unequal stride and with padding
>>> m = nn.ConvTranspose2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
>>> input = torch.randn(20, 16, 50, 100)
>>> output = m(input)
>>> # exact output size can be also specified as an argument
>>> input = torch.randn(1, 16, 12, 12)
>>> downsample = nn.Conv2d(16, 16, 3, stride=2, padding=1)
>>> upsample = nn.ConvTranspose2d(16, 16, 3, stride=2, padding=1)
>>> h = downsample(input)
>>> h.size()
torch.Size([1, 16, 6, 6])
>>> output = upsample(h, output_size=input.size())
>>> output.size()
torch.Size([1, 16, 12, 12])
output size calculation formular
2.池化层
池化运算:对信号进行“收集”并“总结”, 类似水池收集水资源, 因而叫作池化层。
“收集”:多变少
“总结”:最大值 or 平均值
如图用2×2的窗口进行池化操作,最大池化用最大值代替这个窗口,平均池化用平均值代替这个窗口。
(1)nn.MaxPool2d
功能:对二维信号(图像)进行最大值池化
主要参数:
kernel_size:卷积核尺寸
stride:步长
padding:填充个数
dilation:池化间隔大小
ceil_mode:尺寸向上取整,默认为False
return_indices:记录池化像素索引
注意:stride一般设置的与窗口大小一致,以避免重叠
具体代码如下:
数据预处理:
set_seed(1) # 设置随机种子
# ================================= load img ==================================
path_img = os.path.join(os.path.dirname(os.path.abspath(__file__)), "pig.jpeg")
img = Image.open(path_img).convert('RGB') # 0~255
# convert to tensor
img_transform = transforms.Compose([transforms.ToTensor()])
img_tensor = img_transform(img)
img_tensor.unsqueeze_(dim=0) # C*H*W to B*C*H*W
最大池化代码:
# ================ maxpool
flag = 1
#flag = 0
if flag:
maxpool_layer = nn.MaxPool2d((2, 2), stride=(2, 2)) # input:(i, o, size) weights:(o, i , h, w)
img_pool = maxpool_layer(img_tensor)
the change of ouput size:
(2)nn.AvgPool2d 自己研究
功能:对二维信号(图像)进行平均值池化
主要参数:
kernel_size:卷积核尺寸
stride:步长
padding:填充个数
dilation:池化间隔大小
count_include_pad :填充值用于计算
divisor_override:除法因子(自定义分母)
平均池化代码:
# ================ avgpool
flag = 1
#flag = 0
if flag:
avgpoollayer = nn.AvgPool2d((2, 2), stride=(2, 2)) # input:(i, o, size) weights:(o, i , h, w)
img_pool = avgpoollayer(img_tensor)
effect result:
最大值池化和平均池化的差别:最大池化的亮度会稍微亮一些,毕竟它都是取的最大值,而平均池化是取平均值。
(3)nn.MaxUnpool2d
功能:对二维信号(图像)进行最大值池化上采样(反池化:将大尺寸图像变为小尺寸图像)
主要参数:
kernel_size:卷积核尺寸
stride:步长
padding:填充个数
这里的参数与池化层是类似的。唯一的不同就是前向传播的时候我们需要传进一个indices, 我们的索引值,要不然不知道把输入的元素放在输出的哪个位置上。
反池化代码:
# ================ max unpool
flag = 1
#flag = 0
if flag:
# pooling
img_tensor = torch.randint(high=5, size=(1, 1, 4, 4), dtype=torch.float)
#最大值池化保留索引
maxpool_layer = nn.MaxPool2d((2, 2), stride=(2, 2), return_indices=True)
img_pool, indices = maxpool_layer(img_tensor)
# unpooling
img_reconstruct = torch.randn_like(img_pool, dtype=torch.float)
#反池化操作
maxunpool_layer = nn.MaxUnpool2d((2, 2), stride=(2, 2))
img_unpool = maxunpool_layer(img_reconstruct, indices)
print("raw_img:\n{}\nimg_pool:\n{}".format(img_tensor, img_pool))
print("img_reconstruct:\n{}\nimg_unpool:\n{}".format(img_reconstruct, img_unpool))
3.线性层—Linear Layer
线性层又称为全连接层,其每个神经元与上一层所有神经元相连实现对前一层的线性组合,线性变换。
nn.Linear
功能:对一维信号(向量)进行线性组合
主要参数:
in_features:输入结点数
out_features:输出结点数
bias:是否需要偏置
计算公式:y = 𝒙𝑾𝑻 + 𝒃𝒊𝒂𝒔
具体代码如下:
# ================ linear
flag = 1
# flag = 0
if flag:
inputs = torch.tensor([[1., 2, 3]])
linear_layer = nn.Linear(3, 4)
linear_layer.weight.data = torch.tensor([[1., 1., 1.],
[2., 2., 2.],
[3., 3., 3.],
[4., 4., 4.]])
#设置偏置
linear_layer.bias.data.fill_(0)
output = linear_layer(inputs)
print(inputs, inputs.shape)
print(linear_layer.weight.data, linear_layer.weight.data.shape)
print(output, output.shape)
4.激活函数层—Activate Layer
激活函数对特征进行非线性变换,赋予多层神经网络具有深度的意义
(1)nn.Sigmoid
m = nn.Sigmoid()
input = torch.randn(2)
output = m(input)
(2)nn.tanh
m = nn.Tanh()
input = torch.randn(2)
output = m(input)
…