【实验10】卷积神经网络（1）卷积算子

1 自定义二维卷积算子

2 自定义带步长和零填充的二维卷积算子

3 实现图像边缘检测

4 自定义卷积层算子和汇聚层算子

4.1卷积层：

4.2 汇聚层：

5 学习torch.nn.Conv2d()、torch.nn.MaxPool2d()；torch.nn.avg_pool2d()，简要介绍使用方法。

5.1torch.nn.Conv2d()：

5.2 torch.nn.MaxPool2d()

5.3 torch.nn.AvgPool2d()

6 分别用自定义卷积算子和torch.nn.Conv2d()编程实现下面的卷积运算

参考连接

1 自定义二维卷积算子

卷积的主要功能是在一个图像（或特征图）上滑动一个卷积核，通过卷积操作得到一组新的特征。

实现自定义的二维卷积算子，最关键的一步为通过两个嵌套循环遍历输出矩阵的每个位置，并计算对应位置的卷积值。循环内部使用X[:, i:i + u, j:j + v]提取输入矩阵中当前卷积核覆盖的子矩阵，并与卷积核self.weight相乘，然后使用torch.sum在指定维度上求和，得到当前位置的卷积结果。

'''
@Author: lxy
@Function: Implement Conv2D Operator
@Date: 2024/11/06
'''
import torch
import torch.nn as nn
class Conv2D(nn.Module):
    def __init__(self, kernel_size, weight=None):
        super(Conv2D, self).__init__()
        # 如果没有传入 weight 参数，则初始化为默认值
        if weight is None:
            weight = torch.tensor([[0., 1.], [2., 3.]], dtype=torch.float32) # 定义卷积核
        else:
            weight = torch.tensor(weight, dtype=torch.float32)
        # 创建卷积核的参数
        self.weight = nn.Parameter(weight, requires_grad=True)

    def forward(self, X):
        """
        输入：
            - X：输入矩阵，shape=[B, M, N]，B为样本数量 ，M表示矩阵高度，N表示矩阵宽度
        输出：
            - output：输出矩阵 形状为 [B, output_height, output_width]
        """
        u, v = self.weight.shape # 获取卷积核形状
        B, M, N = X.shape # 获取数据的形状
        # 计算输出矩阵的高度和宽度，默认步长为1,没有填充
        output_height = M - u + 1
        output_width = N - v + 1
        # 初始化输出矩阵， output 用来存储卷积操作的结果
        output = torch.zeros((B, output_height, output_width), dtype=X.dtype)
        # 遍历输出矩阵的每个位置，计算对应位置的卷积值
        for i in range(output_height):
            for j in range(output_width):
                # 手动实现卷积运算
                '''
                X[:, i:i + u, j:j + v]：提取 X 中当前卷积核位置覆盖的子矩阵
                X[:, i:i + u, j:j + v] * self.weight：将提取的子矩阵与卷积核元素逐个相乘
                torch.sum(..., dim=(1, 2))：求和，计算卷积核在当前位置的卷积结果
                '''
                output[:, i, j] = torch.sum(X[:, i:i + u, j:j + v] * self.weight, dim=(1, 2))
        return output


# 随机构造一个二维输入矩阵
torch.manual_seed(100)
inputs = torch.tensor([[[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]]])
# 创建卷积算子实例
conv2d = Conv2D(kernel_size=2)
outputs = conv2d(inputs)
print("Input: \n", inputs)
print("Output: \n", outputs)

i:i + u：这部分表示从当前行的索引i开始，选取u个连续的元素，其中u是卷积核的高度。这样，就从当前行中提取了一个高度为u的子矩阵。

j:j + v：这部分表示从当前列的索引j开始，选取v个连续的元素，其中v是卷积核的宽度。就从当前列中提取了一个宽度为v的子矩阵。

运行结果：

Input: 
 tensor([[[1., 2., 3.],
         [4., 5., 6.],
         [7., 8., 9.]]])
Output: 
 tensor([[[25., 31.],
         [43., 49.]]], grad_fn=<CopySlices>)

Process finished with exit code 0

由输出可知运行结果与下图（图源--卷积神经网络理论解读）手动计算的结果一样，成功实现了卷积运算。

2 自定义带步长和零填充的二维卷积算子

二维卷积运算中，零填充是指在输入矩阵周围对称地补上P个0。

对输入用零进行填充使得其尺寸变大。根据卷积的定义，如果不进行填充，当卷积核尺寸大于1时，输出特征会缩减。对输入进行零填充则可以对卷积核的宽度和输出的大小进行独立的控制。

实现带步长和零填充的二维卷积算子，在第一问的基础上加上步长和零填充参数默认stride=1, padding=0，先对输入矩阵X进行零填充。创建一个新的矩阵new_X，其尺寸为[B, M + 2 * padding, N + 2 * padding]，再根据填充后的输入矩阵，按照公式2-1、2-2计算输出矩阵的高度和宽度。在每次循环中，根据步长计算子矩阵的起始位置，并从填充后的输入矩阵new_X中提取与卷积核大小相同的子矩阵。将这个子矩阵与卷积核进行元素乘法操作，然后对结果进行求和，得到当前位置的卷积值。

'''
@Author: lxy
@Function: Implement Conv2D Operator with Stride and Padding
@Date: 2024/11/06
'''
import torch
import torch.nn as nn
class Conv2D(nn.Module):
    def __init__(self, kernel_size, stride=1, padding=0, weight=None):
        super(Conv2D, self).__init__()
        # 初始化卷积核
        if weight is None:
            weight = torch.tensor([[0., 1., 2.], [3., 4. ,5.],[6.,7.,8.]], dtype=torch.float32)
        else:
            weight = torch.tensor(weight, dtype=torch.float32)
        # 创建卷积核参数,权重被初始化为Parameter，表明在训练过程中进行优化
        self.weight = nn.Parameter(weight, requires_grad=True)
        self.stride = stride  # 步长
        self.padding = padding  # 填充

    def forward(self, X):
        """
        输入：
            - X：输入矩阵，shape=[B, M, N]，B为样本数量 ，M表示矩阵高度，N表示矩阵宽度
        输出：
            - output：输出矩阵 形状为 [B, output_height, output_width]
        """
        # 零填充
        new_X = torch.zeros([X.shape[0], X.shape[1] + 2 * self.padding, X.shape[2] + 2 * self.padding])
        # 将原始输入X放置在new_X的中央区域，在new_X的左侧和顶部添加了填充
        new_X[:, self.padding:X.shape[1] + self.padding, self.padding:X.shape[2] + self.padding] = X
        u, v = self.weight.shape  # 获取卷积核形状
        B, M, N = new_X.shape  # 获取填充后数据的形状
        # 计算输出矩阵的高度和宽度
        output_height = (M - u) // self.stride + 1
        output_width = (N - v) // self.stride + 1
        output = torch.zeros((B, output_height, output_width), dtype=X.dtype)  # 初始化输出矩阵
        # 遍历输出矩阵的每个位置，计算卷积值
        for i in range(output_height):
            for j in range(output_width):
                # 通过步长控制子矩阵的位置
                row_start = i * self.stride
                col_start = j * self.stride
                # 提取 X 中当前卷积核位置覆盖的子矩阵
                region = new_X[:, row_start:row_start + u, col_start:col_start + v]
                # 计算卷积操作A
                output[:, i, j] = torch.sum(region * self.weight, dim=(1, 2))
        return output


# 测试代码
torch.manual_seed(100)
inputs = torch.randn(size=[2, 8, 8])  # 随机生成特征图 2个样本，形状为8*8
print(f'inputs为:{inputs}')
conv2d_padding = Conv2D(kernel_size=3,stride=1, padding=1)  # 将填充设置为1 进行卷积
outputs = conv2d_padding(inputs)
print(f"When kernel_size=3, padding=1 stride=1, input's shape: {inputs.shape}, output's shape: {outputs.shape}")
conv2d_stride = Conv2D(kernel_size=3, stride=2, padding=1)
outputs = conv2d_stride(inputs)
print(f"When kernel_size=3, padding=1 stride=2, input's shape: {inputs.shape}, output's shape: {outputs.shape}")

运行结果：

inputs为:tensor([[[ 0.1268,  1.3564,  0.5632, -0.1039, -0.3575,  0.3917, -0.6801,
           0.2409],
         [ 0.4698,  1.2426,  0.5403, -1.1454, -1.4592, -1.6281,  0.3834,
          -0.1718],
         [-3.1896,  1.5914, -0.0247, -0.8466,  0.0293, -0.5721, -1.2546,
           0.0486],
         [ 1.1705, -0.5410, -0.7116,  0.0575,  0.6263, -1.7736, -0.2205,
           2.7467],
         [-1.7599,  1.0230, -0.1107, -0.3899, -1.0300, -1.5446,  0.5730,
          -2.0956],
         [-0.1806,  0.2346, -0.1477,  0.5893,  2.2533, -0.2555,  0.1651,
          -0.1629],
         [-0.8039, -0.9174,  0.8986,  0.8262, -0.3668, -0.4251, -1.2455,
           1.1245],
         [-2.0157,  0.9926, -0.6084, -1.3856,  1.0412, -0.8043, -0.6244,
          -0.5882]],

        [[ 1.6700, -0.9275, -0.9759,  1.3312,  0.9007, -0.6585, -0.9327,
          -1.5749],
         [ 1.4861, -1.4092,  1.4330,  0.3899, -0.1152, -0.2361, -2.2235,
           0.0788],
         [ 0.0416,  1.2813, -0.8262,  0.0231,  1.9301,  0.7803,  0.3180,
          -0.6992],
         [-0.3921,  2.1955,  0.3312,  0.1417, -1.5268,  0.2521,  0.6541,
           2.1024],
         [ 0.6331,  1.9332, -0.2463, -0.7009,  0.6362, -0.5659,  1.0318,
          -1.0371],
         [ 0.1374, -1.1312,  0.6471, -0.7183, -1.1984, -0.8838,  0.6430,
           0.0720],
         [-0.5723,  1.6078,  0.1001, -1.0746, -0.1092,  0.2463, -0.9944,
          -0.6886],
         [ 1.2039, -0.2519, -1.9443, -1.9203,  1.1464,  2.3850, -0.0355,
          -0.3179]]])
When kernel_size=3, padding=1 stride=1, input's shape: torch.Size([2, 8, 8]), output's shape: torch.Size([2, 8, 8])
When kernel_size=3, padding=1 stride=2, input's shape: torch.Size([2, 8, 8]), output's shape: torch.Size([2, 4, 4])

由输出结果可知，使用3×3大小卷积，padding为1，当stride=1时，模型的输出特征图可以与输入特征图保持一致；当padding为1，stride=2时，输出特征图的宽和高都缩小一倍。

3 实现图像边缘检测

用拉普拉斯算子对物体边缘进行提取，拉普拉斯算子为一个大小为3×3的卷积核，中心元素值是8，其余元素值是−1，如图3-1所示：

我用到的图像为“carmeramen.tif”灰度图对其进行边缘检测关键的步骤就是把图像转换为符合卷积操作的输入矩阵：使用图像处理库（如PIL）读取图像文件---->将其大小调整为256x256---->将图像数据转换为NumPy数组，以便进行数值计算--->将NumPy数组转换为PyTorch张量，因为PyTorch的卷积操作是基于张量的---->添加批次维度

'''
@Author: lxy
@Function: Edge dection by CNN
@Date: 2024/11/06
'''
import torch
import torch.nn as nn
import numpy as np
from PIL import Image
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt
class Conv2D(nn.Module):
    def __init__(self, kernel_size, stride=1, padding=0, weight=None):
        super(Conv2D, self).__init__()
        # 初始化卷积核
        if weight is None:
            weight = torch.tensor([[0., 1., 2.], [3., 4. ,5.],[6.,7.,8.]], dtype=torch.float32)
        else:
            weight = torch.tensor(weight, dtype=torch.float32)
        # 创建卷积核参数,权重被初始化为Parameter，表明在训练过程中进行优化
        self.weight = nn.Parameter(weight, requires_grad=True)
        self.stride = stride  # 步长
        self.padding = padding  # 填充

    def forward(self, X):
        """
        输入：
            - X：输入矩阵，shape=[B, M, N]，B为样本数量 ，M表示矩阵高度，N表示矩阵宽度
        输出：
            - output：输出矩阵 形状为 [B, output_height, output_width]
        """
        # 零填充
        new_X = torch.zeros([X.shape[0], X.shape[1] + 2 * self.padding, X.shape[2] + 2 * self.padding])
        # 将原始输入X放置在new_X的中央区域，在new_X的左侧和顶部添加了填充
        new_X[:, self.padding:X.shape[1] + self.padding, self.padding:X.shape[2] + self.padding] = X
        u, v = self.weight.shape  # 获取卷积核形状
        B, M, N = new_X.shape  # 获取填充后数据的形状
        # 计算输出矩阵的高度和宽度
        output_height = (M - u) // self.stride + 1
        output_width = (N - v) // self.stride + 1
        output = torch.zeros((B, output_height, output_width), dtype=X.dtype)  # 初始化输出矩阵
        # 遍历输出矩阵的每个位置，计算卷积值
        for i in range(output_height):
            for j in range(output_width):
                # 通过步长控制子矩阵的位置
                row_start = i * self.stride
                col_start = j * self.stride
                # 提取 X 中当前卷积核位置覆盖的子矩阵
                region = new_X[:, row_start:row_start + u, col_start:col_start + v]
                # 计算卷积操作A
                output[:, i, j] = torch.sum(region * self.weight, dim=(1, 2))
        return output

# 读取图片
img = Image.open('Cameraman.tif').resize((256, 256))
img = np.array(img, dtype='float32')  # 将图像转为numpy数组
# 设置卷积核参数
weight = torch.tensor([[-1,-1,-1], [-1,8,-1], [-1,-1,-1]], dtype=torch.float32)
conv = Conv2D(kernel_size=3, stride=1, padding=0, weight=weight)

# 将读入的图片转化为float32类型的numpy.ndarray
inputs = img.astype('float32')

# 将图片转为Tensor
inputs = torch.tensor(inputs)
# 使用unsqueeze方法在轴0的位置添加额外维度，使得输入数据的形状变为(1, height, weight)
inputs = torch.unsqueeze(inputs, axis=0)
# 进行卷积操作
outputs = conv(inputs)
# 可视化结果
plt.subplot(121).set_title('Input Image', fontsize=15)
plt.imshow(img.astype('uint8'), cmap='gray', vmin=0, vmax=255)
plt.subplot(122).set_title('Output Feature Map', fontsize=15)
plt.imshow(outputs.squeeze().detach().numpy(), cmap='gray', vmin=0, vmax=255)
plt.show()

运行结果:

由输出可以看到，使用拉普拉斯卷积核很好的提取了边缘特征，这一实验在之前的作业六：卷积也实现过，所以在这次实验就很轻松的完成了边缘提取。【需要注意的就是最后 imshow 调用中添加 vmin=0 和 vmax=255 参数】

4 自定义卷积层算子和汇聚层算子

4.1卷积层：

实现多通道的卷积层，在之前二维卷积的基础上添加了single_forward方法来实现对每个单个输入通道的卷积操作，然后在forward方法中，对每个输出通道，遍历每个输入通道，并对每个输入通道执行single_forward方法，得到每个输入通道的卷积结果，并将所有输入通道的卷积结果相加，最后将所有输出通道的特征图堆叠起来，形成最终的输出张量。

'''
@Author: lxy
@Function: Implement multi-channel convolution
@Date: 2024/11/06
'''
import torch
import torch.nn as nn

class Conv2D(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0,weight =None,bias= None):
        super(Conv2D, self).__init__()
        # 初始化卷积核
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.stride = stride
        self.padding = padding
        # 权重维度为 [out_channels, in_channels, kernel_height, kernel_width]
        if weight is None:
            weight = torch.randn((out_channels, in_channels, kernel_size, kernel_size), dtype=torch.float32)
        else:
            weight = torch.tensor(weight, dtype=torch.float32)
        if bias is None:
            bias = torch.zeros(out_channels,1)
        else:
            bias = torch.tensor(bias, dtype=torch.float32)
        # 创建卷积核参数
        self.weight = nn.Parameter(weight, requires_grad=True)
        self.bias = nn.Parameter(bias,requires_grad=True)

    # 单个通道卷积操作
    def single_forward(self, X, weight):
        # 零填充输入
        new_X = torch.zeros((X.shape[0], X.shape[1] + 2 * self.padding, X.shape[2] + 2 * self.padding))
        new_X[:, self.padding:X.shape[1] + self.padding, self.padding:X.shape[2] + self.padding] = X

        u, v = weight.shape  # 卷积核形状
        output_h = (new_X.shape[1] - u) // self.stride + 1
        output_w = (new_X.shape[2] - v) // self.stride + 1
        output = torch.zeros((X.shape[0], output_h, output_w))

        for i in range(output_h):
            for j in range(output_w):
                output[:, i, j] = torch.sum(
                    new_X[:, i * self.stride:i * self.stride + u, j * self.stride:j * self.stride + v] * weight,
                    dim=(1, 2)
                )
        return output

    def forward(self, X):
        """
        输入：
            - X：输入张量，shape=[B, C_in, H, W]，
              其中 B 是批大小，C_in 是输入通道数，H 是高度，W 是宽度
        输出：
            - output：输出张量，shape=[B, C_out, output_height, output_width]
        """
        feature_maps = []
        for w, b in zip(self.weight, self.bias):  # 遍历每个输出通道
            multi_outs = []
            for i in range(self.in_channels):  # 对每个输入通道计算卷积
                single = self.single_forward(inputs[:, i, :, :], w[i])
                multi_outs.append(single)
            # 将各通道卷积结果相加并添加偏置
            feature_map = torch.sum(torch.stack(multi_outs), dim=0) + b
            feature_maps.append(feature_map)
            # 将所有输出通道的结果堆叠
        out = torch.stack(feature_maps, dim=1)
        return out

# 测试代码
torch.manual_seed(100)
inputs = torch.tensor(
    [[[[0.0, 1.0, 2.0],
       [3.0, 4.0, 5.0],
       [6.0, 7.0, 8.0]],

      [[1.0, 2.0, 3.0],
       [4.0, 5.0, 6.0],
       [7.0, 8.0, 9.0]]]]
)
conv2d = Conv2D(in_channels=2, out_channels=3, kernel_size=2)
'''
输出通道数 = 卷积核个数 out_channels
输出高度 = (输入高度 - 卷积核高度) / 步长 + 1 = (3 - 2) / 1 + 1 = 2
输出宽度 = (输入宽度 - 卷积核宽度) / 步长 + 1 = (3 - 2) / 1 + 1 = 2
'''
print("inputs shape:",inputs.shape)
outputs = conv2d(inputs)
print("Conv2D outputs shape:",outputs.shape)
# 比较与pytorch API运算结果
conv2d_pytorch = nn.Conv2d(in_channels=2, out_channels=3, kernel_size = 2)
outputs_pytorch = conv2d_pytorch(inputs)
# 自定义算子运算结果
print('Conv2D outputs:', outputs)
# pytorch API运算结果
print('nn.Conv2D outputs:', outputs_pytorch)

运行结果：

inputs shape: torch.Size([1, 2, 3, 3])
Conv2D outputs shape: torch.Size([1, 3, 2, 2])
Conv2D outputs: tensor([[[[ 1.5407,  3.0784],
          [ 6.1537,  7.6914]],

         [[ 8.9462, 10.4326],
          [13.4054, 14.8918]],

         [[ 4.7900,  5.0379],
          [ 5.5338,  5.7817]]]], grad_fn=<StackBackward0>)
nn.Conv2D outputs: tensor([[[[-1.4994, -2.2460],
          [-3.7392, -4.4858]],

         [[ 1.0521,  1.5089],
          [ 2.4226,  2.8794]],

         [[ 3.7235,  4.6236],
          [ 6.4239,  7.3240]]]], grad_fn=<ConvolutionBackward0>)

由输出结果可以看出，当输入特征图为2个通道的3*3图像时候，使用3个2*2的卷积核得到输出的形状为3个通道的2*2特征图，与我们使用公式2-1、2-2计算得到的特征图大小一致（这里使用自定义算子和调用API的结果是不同的，我查了一下资料，主要是因为API自带的卷积核权重参数初始化方法是Kaiming初始化与我们自定义的初始化方法不同，所以最后得到的输出结果也不同）【pytorch中卷积操作的初始化方法(kaiming_uniform_详解) 需要会员，在这里我截取了文章中关键的一部分，如下图】

4.2 汇聚层：

汇聚层的作用是进行特征选择，降低特征数量，从而减少参数数量，汇聚之后特征图会变得更小，可以有效地减小神经元的个数，节省存储空间并提高计算效率。常用的汇聚方法有是：平均汇聚和最大汇聚。汇聚层输出的计算尺寸与卷积层一致，即公式2-1、2-2所示。

平均汇聚：将输入特征图划分为2×2大小的区域，对每个区域内的神经元活性值取平均值作为这个区域的表示；

最大汇聚：使用输入特征图的每个子区域内所有神经元的最大活性值作为这个区域的表示。

实现汇聚层，我在构造函数__init__中先定义接了收三个参数：size（池化窗口的大小，默认为2x2），mode（池化模式，'max'或'avg'，默认为'max'），stride（步长，默认为1），再在forward函数中实现汇聚：计算出汇聚后的特征图像高和宽--->使用两个嵌套循环遍历输出特征图的每个位置并根据mode参数执行最大池化或平均池化。

'''
@Author: lxy
@Functon: Implement pool2D Operator
@date: 2024/11/7
'''
import torch
import torch.nn as nn
class Pool2D(nn.Module):
    def __init__(self, size=(2, 2), mode='max', stride=1):
        super(Pool2D, self).__init__()
        # 汇聚方式
        self.mode = mode
        self.h, self.w = size
        self.stride = stride
    def forward(self, x):
        output_h = (x.shape[2] - self.h) // self.stride + 1
        output_w = (x.shape[3] - self.w) // self.stride + 1
        output = torch.zeros([x.shape[0], x.shape[1], output_h, output_w], device=x.device)

        # 汇聚
        for i in range(output_h):
            for j in range(output_w):
                # 最大汇聚
                if self.mode == 'max':
                    output[:, :, i, j] = torch.max(
                        x[:, :, self.stride*i:self.stride*i+self.h, self.stride*j:self.stride*j+self.w],
                        dim=2, keepdim=False)[0].max(dim=2)[0]
                # 平均汇聚
                elif self.mode == 'avg':
                    output[:, :, i, j] = torch.mean(
                        x[:, :, self.stride*i:self.stride*i+self.h, self.stride*j:self.stride*j+self.w],
                        dim=[2,3])
        return output

# 测试自定义汇聚层
inputs = torch.tensor([[[[1., 2., 3., 4.],
                         [5., 6., 7., 8.],
                         [9., 10., 11., 12.],
                         [13., 14., 15., 16.]]]], dtype=torch.float32)

pool2d = Pool2D(stride=2)
outputs = pool2d(inputs)
print("input: {}, \noutput: {}".format(inputs.shape, outputs.shape))

# 比较Maxpool2D与PyTorch API运算结果
maxpool2d_pytorch = nn.MaxPool2d(kernel_size=(2, 2), stride=2)
outputs_pytorch = maxpool2d_pytorch(inputs)
print('Maxpool2D outputs:', outputs)
print('nn.MaxPool2d outputs:', outputs_pytorch)

# 比较Avgpool2D与PyTorch API运算结果
avgpool2d_pytorch = nn.AvgPool2d(kernel_size=(2, 2), stride=2)
outputs_pytorch_avg = avgpool2d_pytorch(inputs)
pool2d_avg = Pool2D(mode='avg', stride=2)
outputs_avg = pool2d_avg(inputs)
print('Avgpool2D outputs:', outputs_avg)
print('nn.AvgPool2d outputs:', outputs_pytorch_avg)

运行结果：

input: torch.Size([1, 1, 4, 4]), 
output: torch.Size([1, 1, 2, 2])
Maxpool2D outputs: tensor([[[[ 6.,  8.],
          [14., 16.]]]])
nn.MaxPool2d outputs: tensor([[[[ 6.,  8.],
          [14., 16.]]]])
Avgpool2D outputs: tensor([[[[ 3.5000,  5.5000],
          [11.5000, 13.5000]]]])
nn.AvgPool2d outputs: tensor([[[[ 3.5000,  5.5000],
          [11.5000, 13.5000]]]])

由输出结果可以看到，自定义实现的汇聚层与使用pytorch自带的汇聚操作输出结果是一样的，与手动计算的结果也一样（如下图图源--卷积神经网络理论解读）

5 学习torch.nn.Conv2d()、torch.nn.MaxPool2d()；torch.nn.avg_pool2d()，简要介绍使用方法。

5.1torch.nn.Conv2d()：

pytorch官方文档-Convl2d

功能：Conv2d 是二维卷积层，将输入数据与滤波器（卷积核）进行卷积，生成特征图。

参数（parameters）：

in_channels：输入数据的通道数。

out_channels：输出数据的通道数，即卷积核的数量。

kernel_size：卷积核的大小。

stride：卷积的步长，默认为1。

padding：边缘填充，默认为0。

（除此之外，还有dilation、groups、bias、padding_mode等参数）

变量（Variables）：在训练过程中会被调整

weight: 模块可以学习的权重

bias:模块可以学习的偏差

使用方法：导入torch.nn-->创建Conv2d层实例-->准备输入数据-->将输入数据传入创建实例

import torch
import torch.nn as nn

# 创建一个Conv2d层实例
conv_layer = nn.Conv2d(in_channels=1, out_channels=3, kernel_size=3, stride=1, padding=1)
# 准备输入数据
input_data = torch.randn(1, 1, 2, 2)  # 假设输入是一个2x2的单通道图像
# 应用卷积层
output_data = conv_layer(input_data)
# 查看输出数据的形状
print(output_data.shape)
# 访问权重
weights = conv_layer.weight
# 访问偏置
biases = conv_layer.bias
print(f"权重是{weights}\n偏置是{biases}")

输出：

torch.Size([1, 3, 2, 2])
权重是Parameter containing:
tensor([[[[-0.0108,  0.1053,  0.1481],
          [-0.2397,  0.2453,  0.3193],
          [-0.1925,  0.2957, -0.1676]]],


        [[[-0.0294, -0.0354, -0.0680],
          [ 0.2461, -0.0301, -0.3216],
          [-0.3042,  0.2487, -0.1013]]],


        [[[-0.3092,  0.0519,  0.1740],
          [ 0.1984, -0.0373,  0.1381],
          [-0.0346, -0.2134, -0.2715]]]], requires_grad=True)
偏置是Parameter containing:
tensor([ 0.1406,  0.0162, -0.1567], requires_grad=True)

5.2 torch.nn.MaxPool2d()

pytorch官方文档--MaxPool2d

功能：二维最大池化层，用于下采样，减少特征图的空间维度，保留最重要的特征

参数（parameters）：

kernel_size：池化窗口的大小。

stride：池化的步长，默认等于kernel_size。

padding：边缘填充，默认为0。

（除此之外，还有dilation、return_indices、ceil_mode等参数）

变量：无，不包含可学习的参数，因此没有像卷积层那样的权重（weights）和偏置（biases）变量。

使用方法：导入torch.nn-->创建MaxPool2d层实例-->通常输入数据为卷积层输出-->将输入数据传入最大池化层

# 创建一个MaxPool2d层
maxpool_layer = nn.MaxPool2d(kernel_size=2, stride=2)
# 假设输入数据是卷积层的输出
input_data = output_data
# 应用最大池化层
output_data_maxpool = maxpool_layer(input_data)
print(f"池化前{output_data}\n最大池化后{output_data_maxpool}")

输出：

池化前tensor([[[[-0.2330,  0.2478],
          [-0.2796,  0.6367]],

         [[ 0.4354,  0.9828],
          [ 0.3563, -0.0494]],

         [[-0.4478,  0.4982],
          [ 0.6387,  0.1688]]]], grad_fn=<ConvolutionBackward0>)
最大池化后tensor([[[[0.6367]],

         [[0.9828]],

         [[0.6387]]]], grad_fn=<MaxPool2DWithIndicesBackward0>)

5.3 torch.nn.AvgPool2d()

pytorch官方文档--AvgPool2d

功能：二维平均池化层，与最大池化类似，但是它计算池化窗口内所有元素的平均值

参数（parameters）：

kernel_size：池化窗口的大小。

stride：池化的步长，默认等于kernel_size。

padding：边缘填充，默认为0。

（除此之外，还有dilation、return_indices、ceil_mode等参数）

变量：无，不包含可学习的参数，因此没有像卷积层那样的权重（weights）和偏置（biases）变量。

使用方法：导入torch.nn-->创建Pool2d层实例-->通常输入数据为卷积层输出-->将输入数据传入平均池化层

# 创建一个AvgPool2d层
avgpool_layer = nn.AvgPool2d(kernel_size=2, stride=2)
# 假设输入数据是卷积层的输出
input_data = output_data
# 应用平均池化层
output_data_avgpool = avgpool_layer(input_data)
print(f"池化前{output_data}\n平均池化后{output_data_avgpool}")

输出：

池化前tensor([[[[-0.2330,  0.2478],
          [-0.2796,  0.6367]],

         [[ 0.4354,  0.9828],
          [ 0.3563, -0.0494]],

         [[-0.4478,  0.4982],
          [ 0.6387,  0.1688]]]], grad_fn=<ConvolutionBackward0>)
平均池化后tensor([[[[0.0930]],

         [[0.4313]],

         [[0.2145]]]], grad_fn=<AvgPool2DBackward0>)

6 分别用自定义卷积算子和torch.nn.Conv2d()编程实现下面的卷积运算

使用问题4创建的卷积层算子，按照下图定义输入特征图-->定义卷积核权重-->定义卷积核偏置-->分别调用自定义的卷积层算子和pytorch自带的Conv2d卷积操作实现卷积操作。

'''
@Author: lxy
@Function: 分别用自定义卷积算子和torch.nn.Conv2d()编程实现特定卷积运算
@Date: 2024/11/07
'''
import torch
import torch.nn as nn

class Conv2D(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0,weight =None,bias= None):
        super(Conv2D, self).__init__()
        # 初始化卷积核
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.stride = stride
        self.padding = padding
        # 权重维度为 [out_channels, in_channels, kernel_height, kernel_width]
        if weight is None:
            weight = torch.randn((out_channels, in_channels, kernel_size, kernel_size), dtype=torch.float32)
        else:
            weight = torch.tensor(weight, dtype=torch.float32)
        if bias is None:
            bias = torch.zeros(out_channels,1)
        else:
            bias = torch.tensor(bias, dtype=torch.float32)
        # 创建卷积核参数
        self.weight = nn.Parameter(weight, requires_grad=True)
        self.bias = nn.Parameter(bias,requires_grad=True)

    # 单个通道卷积操作
    def single_forward(self, X, weight):
        # 零填充输入
        new_X = torch.zeros((X.shape[0], X.shape[1] + 2 * self.padding, X.shape[2] + 2 * self.padding))
        new_X[:, self.padding:X.shape[1] + self.padding, self.padding:X.shape[2] + self.padding] = X

        u, v = weight.shape  # 卷积核形状
        output_h = (new_X.shape[1] - u) // self.stride + 1
        output_w = (new_X.shape[2] - v) // self.stride + 1
        output = torch.zeros((X.shape[0], output_h, output_w))

        for i in range(output_h):
            for j in range(output_w):
                output[:, i, j] = torch.sum(
                    new_X[:, i * self.stride:i * self.stride + u, j * self.stride:j * self.stride + v] * weight,
                    dim=(1, 2)
                )
        return output

    def forward(self, X):
        """
        输入：
            - X：输入张量，shape=[B, C_in, H, W]，
              其中 B 是批大小，C_in 是输入通道数，H 是高度，W 是宽度
        输出：
            - output：输出张量，shape=[B, C_out, output_height, output_width]
        """
        feature_maps = []
        for w, b in zip(self.weight, self.bias):  # 遍历每个输出通道
            multi_outs = []
            for i in range(self.in_channels):  # 对每个输入通道计算卷积
                single = self.single_forward(inputs[:, i, :, :], w[i])
                multi_outs.append(single)
            # 将各通道卷积结果相加并添加偏置
            feature_map = torch.sum(torch.stack(multi_outs), dim=0) + b
            feature_maps.append(feature_map)
            # 将所有输出通道的结果堆叠
        out = torch.stack(feature_maps, dim=1)
        return out

# 测试代码
torch.manual_seed(100)
# 输入特征图
inputs = torch.tensor([[[[0, 1, 1, 0, 2],
                         [2, 2, 2, 2, 1],
                         [1, 0, 0, 2, 0],
                         [0, 1, 1, 0, 0],
                         [1, 2, 0, 0, 2]],

                        [[1, 0, 2, 2, 0],
                         [0, 0, 0, 2, 0],
                         [1, 2, 1, 2, 1],
                         [1, 0, 0, 0, 0],
                         [1, 2, 1, 1, 1]],

                        [[2, 1, 2, 0, 0],
                         [1, 0, 0, 1, 0],
                         [0, 2, 1, 0, 1],
                         [0, 1, 2, 2, 2],
                         [2, 1, 0, 0, 1]]]], dtype=torch.float32)
# 自定义卷积核权重
weight = torch.tensor([[[[-1, 1, 0],  # 第一组
                         [0, 1, 0],
                         [0, 1, 1]],

                        [[-1, -1, 0],
                         [0, 0, 0],
                         [0, -1, 0]],

                        [[0, 0, -1],
                         [0, 1, 0],
                         [1, -1, -1]]],

                       [[[1, 1, -1],  # 第二组
                         [-1, -1, 1],
                         [0, -1, 1]],

                        [[0, 1, 0],
                         [-1, 0, -1],
                         [-1, 1, 0]],

                        [[-1, 0, 0],
                         [-1, 0, 1],
                         [-1, 0, 0]]]], dtype=torch.float32)

# 自定义卷积核偏置
bias = torch.tensor([1., 0.])
# 创建一个卷积层，出入通道3 输出通道2 卷积核大小3*3
conv2d = Conv2D(in_channels=3, out_channels=2, kernel_size=3,stride=2, padding=1)
# 为卷积层设置偏置项，第一组卷积核的bias=1,第二组卷积核的bias=0
conv2d.bias = nn.Parameter(bias)
# 为卷积层设置权重(为卷积核矩阵赋值)
conv2d.weight = nn.Parameter(weight)
'''
输出通道数 = 卷积核个数 out_channels
输出高度 = (输入高度 - 卷积核高度) / 步长 + 1 = (3 - 2) / 1 + 1 = 2
输出宽度 = (输入宽度 - 卷积核宽度) / 步长 + 1 = (3 - 2) / 1 + 1 = 2
'''
print("inputs shape:",inputs.shape)
outputs = conv2d(inputs)
print("Conv2D outputs shape:",outputs.shape)

# 比较与pytorch API运算结果
conv2d_pytorch = nn.Conv2d(in_channels=2, out_channels=3, kernel_size = 2,stride=2, padding=1)
# 手动修改权重
with torch.no_grad():  # 在不计算梯度的情况下修改权重
    conv2d_pytorch.weight = nn.Parameter(weight)  # 重新定义权重
    # 手动设置偏置
    conv2d_pytorch.bias = nn.Parameter(bias)  # 重新定义偏置
outputs_pytorch = conv2d_pytorch(inputs)

# 自定义算子运算结果
print('Conv2D outputs:', outputs)
# pytorch API运算结果
print('nn.Conv2D outputs:', outputs_pytorch)

运行结果：

inputs shape: torch.Size([1, 3, 5, 5])
Conv2D outputs shape: torch.Size([1, 2, 3, 3])
Conv2D outputs: tensor([[[[ 6.,  7.,  5.],
          [ 3., -1., -1.],
          [ 2., -1.,  4.]],

         [[ 2., -5., -8.],
          [ 1., -4., -4.],
          [ 0., -5., -5.]]]], grad_fn=<StackBackward0>)
nn.Conv2D outputs: tensor([[[[ 6.,  7.,  5.],
          [ 3., -1., -1.],
          [ 2., -1.,  4.]],

         [[ 2., -5., -8.],
          [ 1., -4., -4.],
          [ 0., -5., -5.]]]], grad_fn=<ConvolutionBackward0>)

由输出结果可以看出自定义的卷积算子和调研API最后的输出结果和手动计算的结果是一样的。这里我学到了如何修改pytorch的卷积运算的权重和偏置参数。（之前一直以为是不能修改的....）