机器学习的入门笔记(第十六周)

本周观看了B站up主霹雳吧啦Wz的图像处理的课程，

课程链接：霹雳吧啦Wz的个人空间-霹雳吧啦Wz个人主页-哔哩哔哩视频

下面是本周的所看的课程总结。

MobileNet V2的代码实现

1、定义ConvBNReLU类，将卷积操作，批量归一化操作，以及ReLU6激活函数封装到类中，padding的计算是保证输入和输出的图片大小不变

class ConvBNReLU(nn.Sequential):
    def __init__(self, in_channel, out_channel, kernel_size=3, stride=1, groups=1):
        padding = (kernel_size - 1) // 2  # 保证输入和输出的图片大小不变
        super(ConvBNReLU, self).__init__(
            nn.Conv2d(in_channel, out_channel, kernel_size, stride, padding, groups=groups, bias=False),
            nn.BatchNorm2d(out_channel),
            nn.ReLU6(inplace=True)
        )

2、定义InvertedResidual类，为到残差结构，首先通过扩展因子，计算出隐藏层的深度，当步长为1且输入通道和输出通道相等时，才会有shortcut连接，紧接着，在当层extendDW卷积，DW卷积就是输入通道与输出通道一致，分组的个数也与通道个数相同，再进行PW卷积，PW卷积后为Linear激活函数，所以不用写入激活函数的代码，最后进行批量归一化操作。

在进行正向传播的过程中，若步长为1且输入通道和输出通道一致，则进行shortcut连接，进行加法操作；反之，直接输出倒残差结构。

class InvertedResidual(nn.Module):
    # expand ratio 扩展因子
    def __init__(self, in_channel, out_channel, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        hidden_channel = in_channel * expand_ratio  # 第一层卷积层卷积核的个数
        self.use_shortcut = stride == 1 and in_channel == out_channel

        layers = []
        if expand_ratio != 1:
            layers.append(ConvBNReLU(in_channel, hidden_channel, kernel_size=1))
        layers.extend([
            # 3*3 DW 卷积
            ConvBNReLU(hidden_channel, hidden_channel, stride=stride, groups=hidden_channel),
            # 1*1 PW 卷积(Linear激活函数==不添加激活函数)
            nn.Conv2d(hidden_channel, out_channel, kernel_size=1, bias=False),
            nn.BatchNorm2d(out_channel)
        ])
        self.conv = nn.Sequential(*layers)

    def forward(self, x):
        if self.use_shortcut:
            return x + self.conv(x)
        else:
            return self.conv(x)

3、定义Mobilenetv2类，定义神经网络，首先计算最开始的输出通道与最后的输出通道，经过_make_divisible函数，使得将输入通道数目调整为8的倍数最近的整数值，定义每个block结构的扩展因子，输出通道数目，个数以及stride步长，具体数值如下所示：

紧接着，定义features空列表，先进行第一层卷积操作，再进行循环遍历每个block的扩展因子，输出通道数目，个数以及stride步长，注意，stride步长为2时是只针对该block的第一层，其它层都为1，最后，定义全局平均池化层和全连接层，输出通道数目为需要进行图像分类的类别数目。

class MobileNetV2(nn.Module):
    # alpha 卷积核个数的倍率
    def __init__(self, num_classes=1000, alpha=1.0, round_nearest=8):
        super(MobileNetV2, self).__init__()
        block = InvertedResidual
        # 将输入通道数调整为8的倍数最近的整数值
        input_channel = _make_divisible(32 * alpha, round_nearest)  # 第一层卷积层所使用卷积核的个数，下一层的深度
        last_channel = _make_divisible(1280 * alpha, round_nearest)

        inverted_residual_setting = [
            # t,c,n,s
            [1, 16, 1, 1],
            [6, 24, 2, 2],
            [6, 32, 3, 2],
            [6, 64, 4, 2],
            [6, 96, 3, 1],
            [6, 160, 3, 2],
            [6, 320, 1, 1],
        ]

        features = []
        # conv1 layer
        features.append(ConvBNReLU(3, input_channel, stride=2)) # kernel size 默认为3
        # building inverted residual setting
        for t, c, n, s in inverted_residual_setting:
            output_channel = _make_divisible(c * alpha, round_nearest)
            for i in range(n):
                stride = s if i == 0 else 1
                features.append(block(input_channel, output_channel, stride, expand_ratio=t))
                input_channel = output_channel

        # building last several layer
        features.append(ConvBNReLU(input_channel, last_channel, 1))
        # combine feature layers
        self.features = nn.Sequential(*features)

        # building classifier
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.classifier = nn.Sequential(
            nn.Dropout(0.2),
            nn.Linear(last_channel, num_classes),
        )

        # weight initialization
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

4、进行训练，前面的数据预处理，训练数据集，测试数据集与之前一样，不一样的是将MobileNet V2网络实例化，并冻结除全连接层之外的所有权重

首先先下载MobileNet V2网络的预训练权重，链接：
https://download.pytorch.org/models/mobilenet_v2-7ebf99e0.pth

net = MobileNetV2(num_classes=5)
model_weight_path = 'mobilenet_v2-pre.pth'
pre_weights = torch.load(model_weight_path)
pre_dict = {k: v for k, v in pre_weights.items() if "classifier" not in k}
# 加载除了最后一层的权重
missing_keys, unexpected_keys = net.load_state_dict(pre_dict, strict=False)
print('missing_keys: {}, unexpected_keys: {}'.format(missing_keys, unexpected_keys))

# 冻结特征提取的权重
for param in net.features.parameters():
    param.requires_grad = False

net.to(device)

'''
missing_keys: ['classifier.1.weight', 'classifier.1.bias'], unexpected_keys: []
'''

5、定义优化器，优化器需要优化的参数只有全连接的参数

params = [p for p in net.parameters() if p.requires_grad]
optimizer = torch.optim.Adam(params, lr=0.0001)

6、进行模型训练的代码与前一致，训练5轮后的结果如下所示

'''
epoch [1/5], loss: 1.369: 100%|██████████| 207/207 [00:28<00:00,  7.20it/s]
epoch [1/5], acc: 280.0]: 100%|██████████| 23/23 [00:03<00:00,  7.39it/s]
epoch [1/5], train loss: 1.4112, val acc: 0.7692
Saving new best model
epoch [2/5], loss: 1.058: 100%|██████████| 207/207 [00:15<00:00, 13.05it/s]
epoch [2/5], acc: 299.0]: 100%|██████████| 23/23 [00:01<00:00, 11.91it/s]
epoch [2/5], train loss: 1.1245, val acc: 0.8214
Saving new best model
epoch [3/5], loss: 0.790: 100%|██████████| 207/207 [00:16<00:00, 12.75it/s]
epoch [3/5], acc: 297.0]: 100%|██████████| 23/23 [00:01<00:00, 13.01it/s]
epoch [3/5], train loss: 0.9590, val acc: 0.8159
epoch [4/5], loss: 1.184: 100%|██████████| 207/207 [00:16<00:00, 12.90it/s]
epoch [4/5], acc: 307.0]: 100%|██████████| 23/23 [00:01<00:00, 12.48it/s]
epoch [4/5], train loss: 0.8449, val acc: 0.8434
Saving new best model
epoch [5/5], loss: 0.813: 100%|██████████| 207/207 [00:15<00:00, 13.59it/s]
epoch [5/5], acc: 301.0]: 100%|██████████| 23/23 [00:02<00:00, 11.07it/s]
epoch [5/5], train loss: 0.7603, val acc: 0.8269
Finished Training
'''

7、训练后进行预测，加载模型的参数，预测一张图片

model = MobileNetV2(num_classes=5)
model_weight_path = './MobileNetV2.pth'
model.load_state_dict(torch.load(model_weight_path, map_location=torch.device('cpu')))

model.eval()

with torch.no_grad():
    output = model(img)
    output = torch.squeeze(output, dim=0)
    predict = torch.softmax(output, dim=-1)
    idx = torch.argmax(predict, dim=-1).numpy()

print('classes:{}, predict result:{:.3f}'.format(json_str[str(idx)], predict[idx].item()))

'''
classes:tulips, predict result:0.872
'''

MobileNet V3 网络实现图像分类

MobileNet V3是MobileNet系列的第三代模型，进一步优化了计算效率和性能，它相比MobileNet V2的两点为：

更新了Block(论文中将MobileNet V3的block称为bneck)。
使用NAS搜索参数。
重新设计耗时层结构。

并且经过实验测试，MobileNet V3相比MobileNet V2等网络，更加准确，更加高效。

MobileNet V3更新了block，比MobileNet V2中加入了SE模块，也称作注意力机制；并且更新了激活函数。

对于MobileNet V2的block，首先经过1*1的卷积核进行升维，再经过3*3的DW卷积，最后再经过1*1的卷积核进行降维。

对于MobileNet V3 的block，也是bneck，首先还是经过1*1的卷积核升维，再经过3*3的卷积核DW卷积，如果进行注意力机制处理的话，之后对每个通道进行池化处理，后面接两个全连接层，第一个全连接层的节点为输入通道的1/4,第二个全连接层的节点为输入通道，最后输出向量，得到的输出向量也就是相应的权重关系，再与之前DW卷积操作后的特征矩阵矩阵相乘，得到输出矩阵，最后再经过1*1的卷积核进行降维。

其SE模块的实现内容，如下所示，假设输入通道为2，第一步，对每个通道进行平均池化处理，输出两个向量；第二步，进行两个全连接层操作，第一个全连层的输出节点为输入通道的1/4,激活函数为ReLU,第二个全连接层的输出节点为输入通道，激活函数为H-sig，输出两个向量，为权重关系；第三步，将得到的两个输出向量与之前的输入特征矩阵分别进行乘积操作，得到最终的输出矩阵。

MobileNet V3的重新设计耗时层结构如下：

减少了第一个卷积层的卷积核个数，由32变为16。
精简Last Stage，减少了多余操作。

最终精简的操作，并提高了精度。

为了方便计算，求导，所以需要重新设计激活函数，sigmoid函数求导来说，比较麻烦，所以设计了h-sigmoid来代替sigmoid函数，也就有了h-swish激活函数，本质就是由高精度浮点数据转化为低精度整型数据，如图所示：

MobileNet V3-Large的架构如下，其中表中的

Input代表输入的shape
Operator代表需要的操作
exp size代表需要先升到几维
#out 代表要降到多少维
SE 代表是否使用SE模块，也就是注意力机制
NL 代表使用的非线性激活函数，HS代表h-swish激活函数，RE代表ReLU激活函数
s代表操作的步长

需要注意的，第一个bneck，升维和降维的维数一致，所以只有第一个bneck不经过1*1的卷积核，没有升维的操作，直接进行DW卷积操作。

当步长为1且输入维度等于输出维度时候，才有shortcut连接，这样保证了输入和输出的形状一致。

最后的NBN指的是不使用批量归一化结构。

如图所示：

MobileNet V3-Small的架构如下

MobileNet V3的代码实现

1、定义_make_divisible函数，将输入通道数调整为8的倍数最近的整数值

def _make_divisible(ch, divisor=8, min_ch=None):
    if min_ch is None:
        min_ch = divisor
    new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor)
    if new_ch < 0.9 * ch:
        new_ch += divisor
    return new_ch

2、定义ConvBNActivation类，进行卷积等一系列操作

class ConvBNActivation(nn.Sequential):
    def __init__(self, in_channel, out_channel, kernel_size=3, stride=1, groups=1, norm_layer=None,
                 activation_layer=None):
        padding = int((kernel_size - 1) // 2)
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        if activation_layer is None:
            activation_layer = nn.ReLU6
        super(ConvBNActivation, self).__init__(
            nn.Conv2d(in_channel, out_channel, kernel_size, stride, padding, groups=groups, bias=False),
            norm_layer(out_channel),
            activation_layer(inplace=True)
        )

3、定义SE注意力机制模块，由两个全连接层构成，第一个全连接层的输出节点个数是输入通道除以4，第二个全连接层的输出节点是输入通道，且第一个全连接层的激活函数是RELU，第二个全连接层的激活函数是h-sigmoid

class SqueezeExcitation(nn.Module):
    def __init__(self, input_c, squeeze_factor=4):
        super(SqueezeExcitation, self).__init__()
        squeeze_c = _make_divisible(input_c // squeeze_factor, 8)
        self.fc1 = nn.Conv2d(input_c, squeeze_c, kernel_size=1)
        self.fc2 = nn.Conv2d(squeeze_c, input_c, kernel_size=1)

    def forward(self, x):
        scale = F.adaptive_avg_pool2d(x, (1, 1))
        scale = self.fc1(scale)
        scale = F.relu(scale, inplace=True)
        scale = self.fc2(scale)
        scale = F.hardsigmoid(scale, inplace=True)

        return scale * x

4、定义InvertedResidualConfig类，创建MobileNetV3 中的每一个bneck结构的参数配置

class InvertedResidualConfig:
    def __init__(self, input_c, kernel, expand_c, out_c, use_se, activation, stride, width_multi=1.0):
        self.input_c = self.adjust_channels(input_c, width_multi)
        self.kernel = kernel
        self.expand_c = self.adjust_channels(expand_c, width_multi)
        self.out_c = self.adjust_channels(out_c, width_multi)
        self.use_se = use_se
        self.use_hs = activation == 'HS'
        self.stride = stride

    @staticmethod
    def adjust_channels(channels, width_multi):
        return _make_divisible(channels * width_multi, 8)

5、定义InvertedResidual类，创建block，MobileNet V3中的block为bneck

class InvertedRedisual(nn.Module):

    def __init__(self, cnf, norm_layer):
        super(InvertedRedisual, self).__init__()
        self.use_shortcut = cnf.stride == 1 and cnf.input_c == cnf.out_c

        layers = []
        activation_layer = nn.Hardswish if cnf.use_hs else nn.ReLU

        if cnf.expand_c != cnf.input_c:
            layers.append(ConvBNActivation(cnf.input_c, cnf.expand_c, kernel_size=1, norm_layer=norm_layer,
                                           activation_layer=activation_layer))

        layers.append(
            ConvBNActivation(cnf.expand_c, cnf.expand_c, kernel_size=cnf.kernel, stride=cnf.stride, groups=cnf.expand_c,
                             norm_layer=norm_layer, activation_layer=activation_layer))

        if cnf.use_se:
            layers.append(SqueezeExcitation(cnf.expand_c))

        layers.append(
            ConvBNActivation(cnf.expand_c, cnf.out_c, kernel_size=1, norm_layer=norm_layer,
                             activation_layer=nn.Identity))

        self.block = nn.Sequential(*layers)

    def forward(self, x):
        result = self.block(x)
        if self.use_shortcut:
            result += x
        return result

6、定义MobileNet V3类，完整的MobileNet V3架构，如下：

class MobileNetV3(nn.Module):
    def __init__(self, inverted_residual_setting, last_channel, num_classes=1000, block=None, norm_layer=None):
        super(MobileNetV3, self).__init__()

        if block is None:
            block = InvertedRedisual
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d


        layers = []

        first_output_c = inverted_residual_setting[0].input_c
        layers.append(
            ConvBNActivation(in_channel=3, out_channel=first_output_c, kernel_size=3, stride=2, norm_layer=norm_layer,
                             activation_layer=nn.Hardswish))

        for cnf in inverted_residual_setting:
            layers.append(block(cnf, norm_layer=norm_layer))

        lastconv_input_c = inverted_residual_setting[-1].out_c
        lastconv_output_c = 6 * lastconv_input_c
        layers.append(ConvBNActivation(lastconv_input_c, lastconv_output_c, kernel_size=1, norm_layer=norm_layer,
                                       activation_layer=nn.Hardswish))
        self.features = nn.Sequential(*layers)
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        self.classifier = nn.Sequential(
            nn.Linear(lastconv_output_c, last_channel),
            nn.Hardswish(inplace=True),
            nn.Dropout(0.2, inplace=True),
            nn.Linear(last_channel, num_classes)
        )

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)

        return x

7、定义mobilenet_v3_large函数，传入每个block所需的参数，将MobileNet V3实例化

def mobilenet_v3_large(num_classes=1000):
    bneck_conf = InvertedResidualConfig

    inverted_residual_setting = [
        # input_c, kernel, expanded_c, out_c, use_se, activation, stride
        bneck_conf(16, 3, 16, 16, False, "RE", 1),
        bneck_conf(16, 3, 64, 24, False, "RE", 2),  # C1
        bneck_conf(24, 3, 72, 24, False, "RE", 1),
        bneck_conf(24, 5, 72, 40, True, "RE", 2),  # C2
        bneck_conf(40, 5, 120, 40, True, "RE", 1),
        bneck_conf(40, 5, 120, 40, True, "RE", 1),
        bneck_conf(40, 3, 240, 80, False, "HS", 2),  # C3
        bneck_conf(80, 3, 200, 80, False, "HS", 1),
        bneck_conf(80, 3, 184, 80, False, "HS", 1),
        bneck_conf(80, 3, 184, 80, False, "HS", 1),
        bneck_conf(80, 3, 480, 112, True, "HS", 1),
        bneck_conf(112, 3, 672, 112, True, "HS", 1),
        bneck_conf(112, 5, 672, 160, True, "HS", 2),  # C4
        bneck_conf(160, 5, 960, 160, True, "HS", 1),
        bneck_conf(160, 5, 960, 160, True, "HS", 1),
    ]
    last_channel = 1280

    return MobileNetV3(inverted_residual_setting=inverted_residual_setting, last_channel=last_channel,
                       num_classes=num_classes)

8、进行训练，前面的数据预处理，训练数据集，测试数据集与之前一样，不一样的是将MobileNet V3网络实例化，并冻结除全连接层之外的所有权重

首先先下载MobileNet V3网络的预训练权重，链接：https://download.pytorch.org/models/mobilenet_v3_large-8738ca79.pth

net = mobilenet_v3_large(num_classes=5)
model_weight_path = '../mobilenet_v3_large-pre.pth'
pre_weights = torch.load(model_weight_path)
pre_dict = dict((k, v) for k, v in pre_weights.items() if 'classifier' not in k)
missing_keys, unexpected_keys = net.load_state_dict(pre_dict, strict=False)
print('missing keys:{}, unexpected keys:{}'.format(missing_keys, unexpected_keys))

for param in net.features.parameters():
    param.requires_grad = False

net.to(device)

'''
missing keys:['classifier.0.weight', 'classifier.0.bias', 'classifier.3.weight', 'classifier.3.bias'], unexpected keys:[]

'''

9、定义优化器，优化器需要优化的参数只有全连接的参数

loss_function = nn.CrossEntropyLoss()
params = [p for p in net.parameters() if p.requires_grad]
optimizer = torch.optim.Adam(params, lr=0.0001)

10、进行模型训练的代码与前一致，训练10轮后的结果如下所示

'''
epoch [1/10], loss: 0.245: 100%|██████████| 207/207 [00:26<00:00,  7.95it/s]
epoch [1/10], acc: 314.0]: 100%|██████████| 23/23 [00:02<00:00,  7.88it/s]
epoch [1/10], train loss: 0.9173, val acc: 0.8626
Saving new best model
epoch [2/10], loss: 0.528: 100%|██████████| 207/207 [00:15<00:00, 13.42it/s]
epoch [2/10], acc: 321.0]: 100%|██████████| 23/23 [00:01<00:00, 12.45it/s]
epoch [2/10], train loss: 0.5085, val acc: 0.8819
Saving new best model
epoch [3/10], loss: 0.571: 100%|██████████| 207/207 [00:15<00:00, 13.12it/s]
epoch [3/10], acc: 319.0]: 100%|██████████| 23/23 [00:01<00:00, 12.62it/s]
epoch [3/10], train loss: 0.4336, val acc: 0.8764
epoch [4/10], loss: 0.585: 100%|██████████| 207/207 [00:15<00:00, 13.30it/s]
epoch [4/10], acc: 322.0]: 100%|██████████| 23/23 [00:02<00:00, 10.67it/s]
epoch [4/10], train loss: 0.4244, val acc: 0.8846
Saving new best model
epoch [5/10], loss: 1.045: 100%|██████████| 207/207 [00:15<00:00, 13.34it/s]
epoch [5/10], acc: 325.0]: 100%|██████████| 23/23 [00:01<00:00, 12.89it/s]
epoch [5/10], train loss: 0.4052, val acc: 0.8929
Saving new best model
epoch [6/10], loss: 0.091: 100%|██████████| 207/207 [00:15<00:00, 13.32it/s]
epoch [6/10], acc: 328.0]: 100%|██████████| 23/23 [00:01<00:00, 12.69it/s]
epoch [6/10], train loss: 0.4073, val acc: 0.9011
Saving new best model
epoch [7/10], loss: 0.582: 100%|██████████| 207/207 [00:15<00:00, 13.22it/s]
epoch [7/10], acc: 328.0]: 100%|██████████| 23/23 [00:01<00:00, 12.50it/s]
epoch [7/10], train loss: 0.3733, val acc: 0.9011
epoch [8/10], loss: 0.182: 100%|██████████| 207/207 [00:16<00:00, 12.59it/s]
epoch [8/10], acc: 332.0]: 100%|██████████| 23/23 [00:01<00:00, 11.95it/s]
epoch [8/10], train loss: 0.3644, val acc: 0.9121
Saving new best model
epoch [9/10], loss: 0.269: 100%|██████████| 207/207 [00:15<00:00, 13.41it/s]
epoch [9/10], acc: 327.0]: 100%|██████████| 23/23 [00:01<00:00, 12.96it/s]
epoch [9/10], train loss: 0.3692, val acc: 0.8984
epoch [10/10], loss: 1.327: 100%|██████████| 207/207 [00:16<00:00, 12.73it/s]
epoch [10/10], acc: 331.0]: 100%|██████████| 23/23 [00:01<00:00, 11.84it/s]
epoch [10/10], train loss: 0.3705, val acc: 0.9093
Finished Training
'''

11、训练后进行预测，加载模型的参数，预测一张图片

model = mobilenet_v3_large(num_classes=5)
model_weight_path = './MobileNetV3.pth'
model.load_state_dict(torch.load(model_weight_path, map_location=torch.device('cpu')))

model.eval()

with torch.no_grad():
    output = model(img)
    output = torch.squeeze(output,dim=0)
    predict = torch.softmax(output,dim=-1)
    idx = torch.argmax(predict,dim=-1).numpy()

print('classes: {}, predict: {}'.format(class_indices[str(idx)],predict[idx]))

'''

classes: tulips, predict: 0.9999997615814209

'''

R-CNN

R-CNN是利用深度学习进行目标检测的开山之作,R-CNN全称(Region with CNN feature),RCNN算法流程分为4个步骤：

利用Selective Search方法将一张图片生成1k到2k个候选区域。
对每个候选区域，使用深度网络提取特征。
特征送入每一类的SVM 分类器，判别是否属于该类。
使用回归器精细修正候选框位置。

对于候选区域的生成，是利用Selective Search算法对图像分割得到一些原始区域，将一张照片生成多个候选区域

对每个候选区域，使用深度网络提取特征，将2000个候选区域使用Resize缩放到227*227像素，将候选区域输入事先训练好的AlexNet CNN网络，而AlexNet第一个FC层的维度为4096，获取4096维的特征，得到2000*4096维的特征矩阵。

其中这2000*4096的特征矩阵的每一行代表一个候选框通过CNN的一个特征向量。

将特征送入每一类的SVM分类器，判定类别，将2000*4096的特征矩阵与20个SVM组成的权值矩阵4096*20相乘，其中4096*20的SVM权值矩阵的每一列代表一个类别的权值向量，也是判断为某个种类的分类器，相乘后得到2000*20维的矩阵表示每个建议框是某个目标类别的得分，2000是框的个数，20是这每个框中对20个种类的预测得分，对每一列即每一类进行非极大值抑制剔除重叠建议框，得到该列即该类中得分最高的一些建议框。